Friday, 26 May 2017

Alongside our recent post about supporting International Domain names within our web crawling tools, we're very proud and excited to announce that work has started on translating some apps and some web pages into other languages. Initially French, and initially Integrity, Integrity Plus and Scrutiny.

It's impossible to do all of the work at once, it will take a little while but the web pages for Integrity, Integrity Plus and Scrutiny now allow you to choose (top-right) English or French versions of the pages.

Localized context help in French is shortly to go into those apps, followed by the rest of the text within the apps.

Thursday, 25 May 2017

When is a link internal.....

Here's an interesting situation.

The website being scanned was  As you can see, the link found during the crawl is a link to a page on . That looks like an external link, and so Scrutiny has marked it as an external link, therefore checked its status but not followed it.

However, it redirects to a url on - so the ultimate destination is an internal one.

There are a bunch of pages which will only be discovered if this link is considered internal.

(the pages are old manual pages - I want them to be orphaned, but that's beside the point...)

The question is - should Integrity and Scrutiny consider that link *internal* (and therefore follow it and discover more internal pages) or *external* (so it's checked but then is a dead end). Remember that the original link is to an external domain.

Wednesday, 24 May 2017

Heads-up : Internationalised Domain Names (IDNs) supported in our web crawlers

We're a UK-based concern, our apps have been almost always available in a single language - UK English (or just English as we call it here in England!)  The vast majority of our users as I write this are from English-speaking countries.

Our alphabet entirely consists of characters available in ascii, and so there has been little call for Integrity, Integrity Plus and Scrutiny (and other tools based on the same engine) to support domain names - ie domain names which contain characters not found in the ascii character set.

But now we've started work on localisation of our apps and web pages, and have received the odd question concerning IDNs.

Let's not confuse this with unusual characters in the path and filename of the url. Our apps have long supported these. You may still see the non-ascii characters displayed, but behind the scenes, those characters are encoded before the http request is put together, usually using a percent-encoding system.

The method is similar with the domain name, but using a a more complex and clever system of character encoding. Browsers (and our web crawlers) often still display the user-friendly unicode version.

You can enter your starting url in the unicode form or the 'punycode' form and it'll be handled correctly. The same goes for unicode or punycode links found on your pages.

Personally, I'm not keen, this does allow for spoofing of legitimate domains using similar characters. There are rules excluding many characters for these reasons.

After lots of extra homework for us, Scrutiny is now handling IDNs, and is in testing.

[update 26 May 2017] Integrity and Integrity Plus also have this enhancement and are also in testing.

If this is useful for you, and you'd like to try the new version (remembering that there may still be the odd bug to iron out)  then you're very welcome to download and use it (with the condition that you let us know about any issues you spot.)


Integrity Plus


Monday, 22 May 2017

List of all of a site's images, with file sizes

A recent enhancement to Scrutiny and Integrity make it easy to see a list of all images on a site, with file size.

It was already possible to check all images (not just the ones that are linked, ie a href = "image.jpg" but the actual images on the page, img src = "image.jpg"  srcset = "image@2x.jpg 2x" )

The file size was also held within Scrutiny and Integrity, but wasn't displayed in the links views.

Now it is. It's a sortable column and will be included in the csv or html export.

Before the crawl, make sure that you switch on checking images:


You may need to switch on that column if it's not already showing - it's called 'Target size'.

Once it is showing, as with other columns in these tables, you can drag and drop them into a different order, and resize their width.

To see just the images - choose Images from the filter button over on the right (Scrutiny and Integrity Plus)

If you're checking other linked files (js or css) then their sizes may be displayed, but will probably have a ? beside them to indicate that the file size shown has not been downloaded and the uncompressed size verified (the size shown is provided in the server header fields).

This last point applies to Integrity and Integrity Plus, and will appear in Scrutiny shortly.

Note that all of this is just a measure of the sizes of all files found during a crawl. For a comprehensive load speed test on a given page, Scrutiny has such a tool - access it with cmd-2 or Tools > Page Analysis

Thursday, 18 May 2017

Post-apocalyptic life skills : handwriting

Self-service is great, if all goes as planned then it's efficient and easy for both customer and seller. (Though I still resist the self-service tills at my local convenience store - I think I care too much about the 4 or 5 jobs that have clearly gone redundant as a result. Plus - the store is reaping the benefit of the customer doing those ladies' jobs, without paying the customer in any way for doing that job.)

Anyway. When it doesn't go as planned is when the fun and games start.

Today I found myself needing to get back to a customer urgently, but the email address bounced as 'user not known'. Leaving me with a postal address.

The old business-letter-writing skills came back pretty easily; my address on the right, their address below and to the left, sincerely if you know their name, faithfully if you don't etc.

(I'm an old-fashioned kinda girl. Printing off the attempted email and putting that in an envelope wasn't going to happen.)

I was wrong to think that it would be quicker to write the address on the envelope than to work out how to print it on a label or put the envelope through the printer.

I draw a lot, but I don't write very much at all now. My handwriting was never great, it seems to be appalling now. After several practice goes, I got a result that wasn't as hideous as the first ones.

.... then turned it over and found that the address was upside-down! How crap does that look?

I'm not going to be among the survivors after the aliens come and hit us with a big EMP device.

Wednesday, 17 May 2017

Don't we love rules?

I've just Googled 'html5 page structure' to check something. The post that appeared first consists of a very simple code snippet followed by around a hundred comments, some asking questions but mostly asserting an opinion. (none spammy or trolly).

Without wanting to get very deeply into the actual topic itself (and maybe risking the odd comment myself....) a large number of those comments are about whether a section tag is allowed within an article tag and vice versa. It seems that anything goes as far as these two are concerned. But my, don't we love rules?

It reminds me of those areas of cities where the pavement and road have been paved over into one homogenous area. I hate them; surely it's better if there are rules that everyone knows?

It occurs to me that the vast number of relevant comments could be responsible for that post being ranked #1 for such a popular search term. I'm sure there's a lesson there somewhere...

Tuesday, 16 May 2017

Hidden Gems in Scrutiny 7: Locate a broken link

This tip applies to Integrity, Integrity Plus and Scrutiny.

So the app is reporting a broken link (or maybe it's a redirect you're interested in, or just a good link). You can easily see, copy or visit the target of the link, or the page it appears on. But how did the crawl find this particular page?

The Locate function will tell you. First open the link inspector by double-clicking on the link in one of the Links views Then highlight the 'appears on' page you're interested in, and click 'Locate'.

It won't show you every possible route to that link, but it will show you the shortest.

note that there's a context menu there too with these options.

You may have noticed that the link inspector and the context menus have a 'Highlight' option too. If you're having trouble seeing the link on the page, the Highlight option will do its best to open the page and apply yellow highlighter.