Saturday 28 August 2021

Problems with Shell UK website show up 'enhancement opportunity' for Scrutiny

 I'm used to seeing a lot of 'image without alt text' warnings. It now seems almost normal, despite the fact that it's a HTML validation error as well as an SEO black mark. It's a quick win, why are we so lax?

In this case there are many other html warnings; missing closing divs, p within h3, closing p with open span. (There are a lot of warnings, just a small section is shown in the screenshot below.)

Multi-headed monster

The SEO results showed up 'missing title' on many important pages. This was hard to believe and indeed it shouldn't be believed - the title tags are present. However (I might call this a bug in Scrutiny) some pages seem to have multiple <head> sections, some even nested! 

This unexpected issue tricks Scrutiny into 'body' mode rather than 'head' mode, and it's likely to miss other meta data and <link> tags. I will see that Scrutiny gets an update so that it handles this situation properly - correctly reports the multiple <head> tags and doesn't miss the titles.

Another important issue is links to insecure http:// pages from secure https:// pages - including an http version of the contact form.

The bad links report is juicy. Some are deceptive; it's not unusual for a redirect to a login page to give a 4xx status, we could find out the reason for that and adjust settings. But there are many links here that really are 404. Again, this is a quick win. 

This all adds up to very poor quality. It surprises me that one of the top brands is putting so little into its website upkeep. (site: https://www.shellenergy.co.uk crawled using Scrutiny 28 Aug 2021).

Friday 27 August 2021

The future for Integrity and Scrutiny

[update Feb 22 2022]: Integrity, Integrity Plus and Pro v12 are out of beta and available here.

[post originally written August 2021, updated in the mean time.]

It feels that the various flavours of Integrity and Scrutiny have reached a plateau, they do what they do and judging by their popularity, they're doing it well (all comments welcome).

That's not to say that they're dormant. Far from it. You can see from the release notes that they've all received frequent updates. But these now tend to be improvements and updates rather than new features.

The biggest news recently has been the HTML validation, and work on that will continue.

Work has already begun on v11 of Integrity and Scrutiny, and it'll necessarily be a deep rewrite of the engine. Which will of course be called the v12 engine, because who's heard of a v11 engine?!

Futureproofing is needed. Partly to keep up with changes in the MacOS system, partly to revise the internal structure of the data and partly to replace some tired stuff with newer stuff, for example our current 'sitesucker-like' archiving system.  

One feature of Integrity and Scrutiny that has been a bit slack is the archiving. Originally this was simply a dump of the html files during the crawl. It developed a bit, but the archiving and processing in Webarch and Website Watchman have left Integrity and Scrutiny behind, so Integrity and Scrutiny will be brought up to scratch with Webarch-style archiving.

There are long-standing issues that need deeper rewrites in order to fix properly. And parts of the interface that could do with a facelift, particular Scrutiny's website / config management screen.

On the business front, it's more than likely that there will be a price increase, but as usual, no upgrade fee for licence holders of v7 or above. (hint: now is a very good time to buy!)


[Update 28 Nov 2021]

I've just posted a video showing the new Integrity Pro in action.If you use Integrity Pro, this won't *look* tremendously different. The changes are as outlined above; much is under-the-hood, for efficiency or just to keep up-to-date with the changing system and web standards. There are one or two important features missing from the interface in this video.


Friday 20 August 2021

Many 'soft 404s' found on the KFC website

One way to 'fix' your bad links is to make your server send a 200 code with your custom error page.


Google frowns upon it as "bad practice" and so do I. It makes bad links difficult to find using a link checker. Even if a page says "File not found",  no crawling tool will understand that, will see the 200 and move on.  Maybe this is why the KFC UK website has so many of them.
The way that Integrity and Scrutiny handle this is to look for specified text on the page and in the title. Obviously it can't be pre-filled with all of the possible terms which might appear on anyone's custom error page, so if you know that you use soft 404s on your site, you must give Integrity / Scrutiny  a term that's likely to appear on the error page and that's unlikely to appear anywhere else. Fortunately with this site, WHOOPS!  fits the bill. The switch for the soft 404 search and the list of search terms is in Preferences (above).
And here we see them being reported with the status 'soft 404' in place of the actual (incorrect) 200 status returned by the server.

[update 29 Nov 2021] To be fair to KFC, that long list of bad links is now mostly cleared up, although the soft 404 problem still exists, which isn't going to make it easy to find bad links:


If anyone from KFC reads this, we offer a subscription-based monthly website report and would be very happy to include the 'soft 404' check at no extra charge.