Friday, 5 February 2021

HTML validation of an entire website

Version 10 of Scrutiny and Integrity Pro contain  built-in html validation. This means that they can make some important checks on every page as they crawl. 

It's enabled by default but can be switched off (with very large sites it can be useful to switch off features that you don't need at the time, for reasons of speed or resources).

Simply scan the site as normal. When it's finished, the task selection screen contains "Warnings: HTML validation and other warnings >"
(NB Integrity Pro differs here, it doesn't have the task selection screen above, but a 'Warnings' tab in its main tabbed view.)

Warnings can be filtered, sorted and exported. If there's a type of warning that you don't need to deal with right now, you can "hide warnings like this" temporarily or until the next scan. (Right-click or ctrl-click for context menus.)

The description of the warning contains a line number and/or url where appropriate / possible.

In addition, links are coloured orange (by default) in the link-check results tables if there are warnings. Traditionally, orange meant a redirection, and it still does, but other warnings now colour that link orange. A double-click opens the link inspector and the warnings tab shows any reason(s) for the orange colouring.  Note that while the link inspector is concerned with the link url, many of these warnings will apply to the target page of the link url.

The full list of potential warnings (to date) is at the end of this post. We're unsure whether this list will ever be as comprehensive as the w3c validator, and unsure whether it should be.  At present it concentrates on many common and important mistakes; the ones that have consequences.

Should you wish to run a single page through the w3c validator,  that option still exists in the context menu of the SEO table (the one table that lists all of your pages.  The sitemap table excludes certain pages for good reasons.)

Full list of possible html validation warnings (so far):

unclosed div, p, form
extra closing div, p, form
extra closing a
p within h1/h2...h6
h1/h2...h6 within p
more than one doctype / body
no doctype / html / body /
no closing body / html
unterminated / nested link tag 
script tag left unclosed
comment left unclosed
end p with open span
block level element XXX cannot be within inline element XXX  (currently limited to div/footer/header/nav/p  within a/script/span  but will be expanded to recognise more elements )
'=' within unquoted src or href url
link url has mismatched or missing end quotes
image without alt text. (This is an accessibility, html validation and SEO issue. The full list of images without alt text can also be found in Scrutiny's SEO results.)
more than one canonical
Badly nested <form> and <div>
Form element can't be nested

warnings that are not html validation:

The server has returned 429 and asked us to retry after a delay of x seconds
a link contains an anchor which hasn't been found on the target page
The page's canonical url is disallowed by robots.txt
link url is disallowed by robots.txt
The link url is a relative link with too many '../' which technically takes the url above the root domain.
(if 'flag blacklisted' option switched on) The link url is blacklisted by a blacklist / whitelist rule. (default is off)   With this option on, the link is coloured red in the link views, even if warnings are totally disabled.

Tuesday, 22 December 2020

A sneaky peek at what's coming in early 2021

Scrutiny and Integrity have received constant attention and frequent updates (often weekly). 

But they feel like mature applications. They sell well, are well-used, we use them to provide services and yet reported problems are very few.

Requested features / functionality have been few too and we haven't had anything major on the radar for these apps. 

However, if you've followed recent developments you'll know that the 'warnings' functionality has been progressing. 

Traditionally, links are coloured orange if there's a redirection. The redirection itself is certainly not an error and users may or may not be concerned about them. (So warnings about redirections can be switched off.)

For some time, other things have been noted and reported as 'warnings' (and listed separately from the redirections in the link inspector) and people asked for a way to view / export these warnings. Scrutiny has been gaining new ways to view warnings (eg columns in tables, a tab in the link inspector). More recently warnings (again, distinct from redirections) have been presented in their own table.

This in itself is major new functionality. It has happened incrementally in later minor releases of version 9.  Version 10 marks what has so far been more of an evolution.

It can go much further. Way back, Scrutiny had the ability to pass every page that it scanned to the w3 validator. Although it was a little clunky internally, it was a very popular feature, and therefore frustrating when released the 'nu' validator which didn't have the API that allowed us to use it in the same way as before.

Because of the way that our crawling / parsing  engine works, it's easy to spot and report a few more html validation problems. We're working hard on this now. When the v10 beta is released after the new year, it will be able to report a number of important validation problems. It won't be a full html validator just yet, but the list of problems that it can report will grow through 2021.

The html validation functionality applies to Scrutiny and Integrity Pro only.

Scrutiny version 10 will be a free upgrade from the current version, although the word is that there may be a small price increase for 2021. (hint: if you're thinking about buying, now is a good time.) 

[update 8 Jan 2021] The first public beta of v10 is available for download. Important: It will be a free upgrade. It will ask for a key. If you already have a key it will accept your existing one. Just get in touch if you get a 'too many activations' message, the key just needs resetting.

~ Shiela

Tuesday, 27 October 2020

Sales tax change

For a very long time, the price you've seen on our products has been inclusive of sales tax. So the price advertised for Integrity Plus is $15 USD and that's all you pay. We have been paying the sales tax which is charged at your country's rate and policy. 

Sales tax varies widely. Some places have no sales tax at all or a very low percentage.  But if a customer from a country that levies a 20% rate buys Scrutiny for the advertised price of $115 USD, we get about $85 after paying the tax and some other fees. 

In some countries it's normal to mark prices exclusive of tax, and in other countries such as my own, prices in most shops are traditionally inclusive with no mention of the tax. 

But now with certain online purchases I tend to expect tax to be added at checkout. If I spend 10 of our UK pounds on digital music I know I'm going to be charged £12.

I don't think we've used the word 'inclusive' anywhere*, so the fact that we have been covering that tax may well have been a pleasant surprise to customers who are already the checkout. If they're a business or non-profit customer, depending on factors like the size and location of their business, they may well be claiming that tax back (which we have paid) so it has represented an unexpected and significant discount.

So today we're changing our policy so that our advertised prices don't include the tax. The sales tax or vat is now added at checkout, depending on the policy and rate of your country or state. 

This comes at a time when we have looked at possible price increases and ruled those out. You may well have expected to be paying the tax anyway. If you weren't, then please remember that it's a tax that your government is collecting from you on purchases that you make. 

If you have been trialling any of our products and were expecting the price to be inclusive of tax, and are reading this because you're upset to discover this, then please contact us to arrange a suitable discount. 

* if you know differently, contact us for a bounty in the form of a voucher

Tuesday, 11 August 2020

Development environment for hand-coding websites - update

Moving from an app built for myself, to a product that I expect other people to use has been a much longer process than I would have imagined. The very long list of small fixes and enhancements makes me realise what I'm prepared to live with and work around.

The first public release happened a little while ago and yesterday it received an update with lots of rough edges smoothed off.

If it turns out that I'm the only person who wants to hand-code html/css/js then that's fine, my own tool is much nicer to use than it has been for most of its life. 

The current version is entirely free. Download is here. No card details, not even an email address. The only thing I do ask for is feedback.

Wednesday, 5 August 2020

Checking hyperlinks within a Word document (.docx)

Scrutiny has long been able to check links within pdf documents encountered during a website scan. Scrutiny is a website crawling tool, it wasn't intended that you could point it at a local pdf and ask it to check the hyperlinks within it. But with a tweak, the current version can do this.

The option to check links with in a Word document isn't a frequently-requested feature, but it has arisen a couple of times, and this week I've had a task where the ability to test / examine the hyperlinks within a .docx document would be valuable.

It has been an enjoyable (if sometimes bewildering) curve to learn about the docx format. 

As with the pdf option, (with the option switched on) Scrutiny should now look in Word documents discovered during the scan and report the link url and link text, and test that link. This also works if the document is on the local drive and the hyperlink points to another local document. At present this will only work on the .docx format, not the older .doc format.

As I write this post, (5 August 2020) this feature now exists within the current development version of Scrutiny and is in testing. If you would like to try it, I'd be pleased to let you have a test version for you to try. (Contact me.) It's important to try this on as many different docx files as possible before release. 

(Scrutiny offers a 30-day trial, so you'll still be able to try the feature if you're not a Scrutiny licence holder.)

Sunday, 19 July 2020

A sneaky peeky at the current development project

Years ago I wrote a simple development environment to help me hand code small websites. (Please don't judge my code.)

Development environment is overblowing it. But it did one very important trick.

But first, why would anyone want to hand-code a website? (Does anyone else hand-code any more?)

  • Having a server-side website CMS hacked is frustrating
  • Having plain html files on the server makes your site faster
  • If you're a control-freak like me, you want to write your own clean code, rather than allowing a system to generate unnecessary unreadable guff.

The first thing you notice when you hand-code a site, even a small one, is that if you make a change to code that appears on all pages, it's a huge PITA to go through all pages making the same change.

Hence that trick I mentioned. Those blue links in the screenshot are where an 'object' is inserted. An object is stored once, you edit it once and when you 'compile' the site, those placeholders on all pages are expanded. In the same operation, the app uploads the compiled pages to the server by ftp or sftp. Obviously, clicking that blue link in the editor loads the object for editing. The editor has forward and back navigation buttons.

That's a very brief overview. I've been using this myself for years. But as with tools you write for yourself, it's not very well-refined.

I've been thinking that I may not be alone in my wish to manage small sites this way. I guess most people who don't want a server-side CSM will be using a visual website creator on their computer.

I've decided to improve this to the point where it's a credible commercial (initially free) app and see what interest there is.

It's not ready yet. The whole object / compiling / uploading trick works, I've been using that for a long time.  I now have basic syntax colouring working and before any kind of release, I plan to build in these features:

  • A button to trigger 'Tidy', a unix html tidying utility (can also do some validation)
  • Preview of selected page, updated as you type
  • Compress (remove whitespace) before uploading
If you would like to read more, an early draft of the user guide is here.

[Update 8 Aug] A very early version of the application (which I'm not even calling Beta) is now available for download.

Is this of interest to you? Are there other features you'd like in such an app? Let me know.

Wednesday, 8 July 2020

The browser padlock and why it might not appear

It's important to have an SSL certificate these days if your site is to have any credibility.

Even if you do have a valid certificate in place, you may still find that a browser refuses to display the padlock. Different browsers have their own criteria and display the information in different ways, but we've generally moved from 'a padlock when the site is secure' to a clear 'site insecure' warning.

The image above illustrates this. The site does have a valid certificate in place.  My two favourite browsers do both have developer tools which allow you to drill down and find the reason(s) for the warnings.

That's good for a single page that you know has a problem. But if you're a Scrutiny user, you want to be notified of any such problems on any page of your site.

Scrutiny has long had features to help you with migration to https://. It alerts you to old links to your http:// pages and pages which have mixed content. (images or linked files which are http://)

As mentioned above, browsers vary in their criteria for displaying the padlock. As from v9.8.0, Scrutiny makes additional checks / warnings:

The insecure content alert/report will now include:

  • insecure urls found in certain meta tags, such as open graph or Twitter cards.
  • insecure images, whether hosted internally or externally
  • insecure form action urls, even if the 'check form action' is switched off.