PeacockMedia: html validation

Showing posts with label html validation. Show all posts

Saturday, 28 August 2021

Problems with Shell UK website show up 'enhancement opportunity' for Scrutiny

I'm used to seeing a lot of 'image without alt text' warnings. It now seems almost normal, despite the fact that it's a HTML validation error as well as an SEO black mark. It's a quick win, why are we so lax?

In this case there are many other html warnings; missing closing divs, p within h3, closing p with open span. (There are a lot of warnings, just a small section is shown in the screenshot below.)

Multi-headed monster

The SEO results showed up 'missing title' on many important pages. This was hard to believe and indeed it shouldn't be believed - the title tags are present. However (I might call this a bug in Scrutiny) some pages seem to have multiple <head> sections, some even nested!

This unexpected issue tricks Scrutiny into 'body' mode rather than 'head' mode, and it's likely to miss other meta data and <link> tags. I will see that Scrutiny gets an update so that it handles this situation properly - correctly reports the multiple <head> tags and doesn't miss the titles.

Another important issue is links to insecure http:// pages from secure https:// pages - including an http version of the contact form.

The bad links report is juicy. Some are deceptive; it's not unusual for a redirect to a login page to give a 4xx status, we could find out the reason for that and adjust settings. But there are many links here that really are 404. Again, this is a quick win.

This all adds up to very poor quality. It surprises me that one of the top brands is putting so little into its website upkeep. (site: https://www.shellenergy.co.uk crawled using Scrutiny 28 Aug 2021).

Friday, 5 February 2021

HTML validation of an entire website

Version 10 of Scrutiny and Integrity Pro contain built-in html validation. This means that they can make some important checks on every page as they crawl.

It's enabled by default but can be switched off (with very large sites it can be useful to switch off features that you don't need at the time, for reasons of speed or resources).

Simply scan the site as normal. When it's finished, the task selection screen contains "Warnings: HTML validation and other warnings >"

(NB Integrity Pro differs here, it doesn't have the task selection screen above, but a 'Warnings' tab in its main tabbed view.)

Warnings can be filtered, sorted and exported. If there's a type of warning that you don't need to deal with right now, you can "hide warnings like this" temporarily or until the next scan. (Right-click or ctrl-click for context menus.)

The description of the warning contains a line number and/or url where appropriate / possible.

In addition, links are coloured orange (by default) in the link-check results tables if there are warnings. Traditionally, orange meant a redirection, and it still does, but other warnings now colour that link orange. A double-click opens the link inspector and the warnings tab shows any reason(s) for the orange colouring. Note that while the link inspector is concerned with the link url, many of these warnings will apply to the target page of the link url.

The full list of potential warnings (to date) is at the end of this post. We're unsure whether this list will ever be as comprehensive as the w3c validator, and unsure whether it should be. At present it concentrates on many common and important mistakes; the ones that have consequences.

Should you wish to run a single page through the w3c validator, that option still exists in the context menu of the SEO table (the one table that lists all of your pages. The sitemap table excludes certain pages for good reasons.)

Full list of possible html validation warnings (so far):

unclosed div, p, form
extra closing div, p, form
extra closing a
p within h1/h2...h6
h1/h2...h6 within p
more than one doctype / body
no doctype / html / body /
no closing body / html
unterminated / nested link tag
script tag left unclosed
comment left unclosed
end p with open span
block level element XXX cannot be within inline element XXX (currently limited to div/footer/header/nav/p within a/script/span but will be expanded to recognise more elements )
'=' within unquoted src or href url
link url has mismatched or missing end quotes
image without alt text. (This is an accessibility, html validation and SEO issue. The full list of images without alt text can also be found in Scrutiny's SEO results.)
more than one canonical

more than one opening html tag
Badly nested <form> and <div>
Form element can't be nested

hanging comma at end of src list (w3 validator reports this as "empty image-candidate string")

more than one meta description is found in the head

warnings that are not html validation:

Type mismatch: Type attribute in html is xxx/yyy, content-type given in server response is aaa/bbb

The server has returned 429 and asked us to retry after a delay of x seconds
a link contains an anchor which hasn't been found on the target page
The page's canonical url is disallowed by robots.txt
link url is disallowed by robots.txt
The link url is a relative link with too many '../' which technically takes the url above the root domain.
(if 'flag blacklisted' option switched on) The link url is blacklisted by a blacklist / whitelist rule. (default is off) With this option on, the link is coloured red in the link views, even if warnings are totally disabled.

Sunday, 19 July 2020

A sneaky peeky at the current development project

Years ago I wrote a simple development environment to help me hand code small websites. (Please don't judge my code.)

Development environment is overblowing it. But it did one very important trick.

But first, why would anyone want to hand-code a website? (Does anyone else hand-code any more?)

Having a server-side website CMS hacked is frustrating
Having plain html files on the server makes your site faster
If you're a control-freak like me, you want to write your own clean code, rather than allowing a system to generate unnecessary unreadable guff.

The first thing you notice when you hand-code a site, even a small one, is that if you make a change to code that appears on all pages, it's a huge PITA to go through all pages making the same change.

Hence that trick I mentioned. Those blue links in the screenshot are where an 'object' is inserted. An object is stored once, you edit it once and when you 'compile' the site, those placeholders on all pages are expanded. In the same operation, the app uploads the compiled pages to the server by ftp or sftp. Obviously, clicking that blue link in the editor loads the object for editing. The editor has forward and back navigation buttons.

That's a very brief overview. I've been using this myself for years. But as with tools you write for yourself, it's not very well-refined.

I've been thinking that I may not be alone in my wish to manage small sites this way. I guess most people who don't want a server-side CSM will be using a visual website creator on their computer.

I've decided to improve this to the point where it's a credible commercial (initially free) app and see what interest there is.

It's not ready yet. The whole object / compiling / uploading trick works, I've been using that for a long time. I now have basic syntax colouring working and before any kind of release, I plan to build in these features:

A button to trigger 'Tidy', a unix html tidying utility (can also do some validation)
Preview of selected page, updated as you type
Compress (remove whitespace) before uploading

If you would like to read more, an early draft of the user guide is here.

[Update 8 Aug] A very early version of the application (which I'm not even calling Beta) is now available for download.

Is this of interest to you? Are there other features you'd like in such an app? Let me know.

Sunday, 29 November 2015

HTML Validation Runner

Earlier this year, the w3c validator switched to a new engine and although this returns results for html5, the service was no longer returning simple statistics in the http header. In short it broke Scrutiny's html validation feature. (Those using a local instance of the validator may have found that it kept working)

My options were to screen-scrape from the new w3c validator, find an alternative service (either web or something that could be included within Scrutiny) or to write my own.

In the absence of a 'quick fix', I replaced the full website validation feature with a 'single page validation' (context menu item in SEO table).

Feedback has been generally good, some users agreeing with my own feeling that because websites tend to be based on template(s), validating all pages isn't necessary. For this reason, it's unlikely that the new tool mentioned below will be part of Scrutiny, but I may include it in the dmg as a separate free app.

I have now found an alternative, that works well and doesn't rely on a web service or need anything extra installed. It does give slightly different results to the w3c validator, but I guess any two tools will give different results.

My first prototype is now working, if you are interested in being able to scan your site, validating the html for all pages, and would like to give this tool a whirl, please just ask.

All thoughts and comments are very welcome, email me or use the comments.