Saturday, 4 September 2021

Crawling big-name websites. Some thoughts.

Over the last couple of weeks I've been crawling the websites of some less-popular* big names. 

I enjoy investigating websites, it gives me some interesting things to think about and comment on, and it allows me to test my software 'in the wild'.

Already I'm feeling disappointed with the general quality of these sites, and I'm noticing some common issues. 


The most common by far is the "image without alt text" warning. As someone with a history in website accessibility, this is disappointing, particularly as it's the easiest accessibility improvement and SEO opportunity. Above is a section of the warnings from the RBS site. Every page has a list of images without alt text, and I see this regularly on sites that I'm crawling.

Next are the issues which may be the result of blindly plugging plugins and modules into a CMS. Last week I saw the issue of multiple <head> tags, some of them nested in the Shell UK website. This showed up a small issue with Scrutiny (fixed in 10.4.2 and above). 

One of the sites I've crawled this week, Ryanair, showed a different problem which may also be the result of plugins that don't play nicely together. 

The content page has two meta descriptions. Only one of them is likely to be displayed on Google's search results page. Don't leave that to chance.

Before getting to that point, the first black mark to Ryanair is that the site can't be viewed without javascript rendering. It's all very well for js to make pretty effects on your page but if nothing is visible on the page without js doing its stuff in the browser, then that is bad accessibility and arguably could hinder search engines from being able to index the pages properly**

This is what the page looks like in a browser without JS enabled, or on any other user agent that doesn't do rendering. This is what Integrity and Scrutiny would see by default. To crawl this site we need to enable the 'run js' feature. 

This aspect of the site helps to mask the 'double-description' problem from a human - if you 'view source' in a browser (depending on the browser) you may not even see the second meta description because you may see the 'pre-rendered' page code.

 Scrutiny reported the problem and I had to look at the 'post-rendered' source to see the second one:

I hope you enjoy reading about this kind of thing. I enjoy doing the investigation. So far no-one from any of the companies I've mentioned on blog pages and tweets have made contact, but I'd welcome that. 




*less-popular with me.

** It used to be the case that no search engine would be able to index such a page. Now Google (but not all search engines) does render pages. To some extent. 

No comments:

Post a Comment