We recently looked at a support call where a large number of 500 errors were being reported - that's lots of red in Integrity or Scrutiny - for a site that appeared fine in the browser.
It caused some head-scratching here and lots of experimentation to find the difference between the request the browser was sending and the one Integrity was sending (cookies? javascript? request header fields?)
It was while investigating the request being sent by the browser, we noticed that although the page appeared as expected in the browser, the server was in fact returning a 500 code there too: (this is from Safari's web console)
It's a little odd that the requested page follows hot on the heels of this 500 response code. I don't know the reason for all of this, but if my user finds out and passes it on, I'll update this post.
The moral of the story is... don't take it for granted that if a page appears as expected in the browser, that nothing's wrong. (NB Google will also receive a 500 code when requesting this page.) Another good reason for using a scanning tool.
Thursday, 10 March 2016
Wednesday, 2 March 2016
testing linked files - css, javascript, favicons
This feature has been a very long time coming. Website link tester Integrity reaches back to 2007, Integrity Plus and Scrutiny build on it, using the same engine.
But none of these crawling apps have ever found and checked linked external files such as style sheets, favicons and javascript. (This isn't entirely true - Scrutiny's 'page analysis' feature which tests the responsiveness of all elements of a page does include these linked files).
So this is a well-overdue feature and now it's built into our v6 engine and can be rolled out into Integrity, Integrity Plus, Scrutiny and other apps which use the same engine.
As you can see in the top screenshot, the new checkbox sits nicely beside the 'broken images' switch (which has existed for a very long time). The option can be set 'per-site' (except for Integrity, which doesn't handle multiple sites / settings)
With that option checked, linked files should be listed with your link results (obviously there's no link text, that's given as '[linked file]').
This feature is in beta.
[update: the beta version of all three apps containing this new feature is available for download on the app's home page]
But none of these crawling apps have ever found and checked linked external files such as style sheets, favicons and javascript. (This isn't entirely true - Scrutiny's 'page analysis' feature which tests the responsiveness of all elements of a page does include these linked files).
So this is a well-overdue feature and now it's built into our v6 engine and can be rolled out into Integrity, Integrity Plus, Scrutiny and other apps which use the same engine.
As you can see in the top screenshot, the new checkbox sits nicely beside the 'broken images' switch (which has existed for a very long time). The option can be set 'per-site' (except for Integrity, which doesn't handle multiple sites / settings)
With that option checked, linked files should be listed with your link results (obviously there's no link text, that's given as '[linked file]').
This feature is in beta.
[update: the beta version of all three apps containing this new feature is available for download on the app's home page]
Saturday, 27 February 2016
Finding http links on an https website - Part 2
Since writing this post, and given that secure (https) websites are becoming more popular, Scrutiny can now specifically look out for links to the http version of your site, alert you to the fact if there are any, and offer full details.
This new behaviour is all switched on by default, but there are two new checkboxes in Preferences
Taken separately, that first checkbox is quite important because if you're scanning an https website, you probably don't want http versions of your pages included in your xml sitemap.
All of this is in Scrutiny v6.4 which will be released as beta soon. If you're interested in checking it out, please just ask.
This new behaviour is all switched on by default, but there are two new checkboxes in Preferences
All of this is in Scrutiny v6.4 which will be released as beta soon. If you're interested in checking it out, please just ask.
Wednesday, 24 February 2016
Finding http links on an https website
A couple of Scrutiny support calls have recently been along the lines "Why is your tool reporting a number of http links on my site? All internal links are https:// Is this a bug?"
In both cases, an internal link did exist on the site with the http scheme. Scrutiny treats this link as internal (as long as it has the same domain) follows it, and then all relative links will of course have the http scheme as well.
[Update - since writing this post, new functionality has been added to Scrutiny - read about that here]
I'm thinking about three things:
1. The 'Locate' function is ideal for tracing the rogue link that shunts Scrutiny (and a real user of course) over to the http site. In the shot below we can see where that happened (ringed) and so it's easy to see the offending link url, the link text and the page it appears on. Does this useful feature need to be easier to find?
2. Does a user expect that when they start at a https:// url, that an http:// link would be considered internal (and followed) or external (and not followed) ? Should this be a preference? (Possibly not needed as it's simple to add a rule that says 'do not check urls containing http://www.mysite.com)
3. Should Scrutiny alert the user if they start at an https:// url and an http:// version is found while scanning? After all, this is at the heart of the problem described above; the users assumed that all links were https:// and it wasn't obvious why they had a number of http:// links in their results.
Any thoughts welcome; email me or use the comments below.
In both cases, an internal link did exist on the site with the http scheme. Scrutiny treats this link as internal (as long as it has the same domain) follows it, and then all relative links will of course have the http scheme as well.
[Update - since writing this post, new functionality has been added to Scrutiny - read about that here]
I'm thinking about three things:
1. The 'Locate' function is ideal for tracing the rogue link that shunts Scrutiny (and a real user of course) over to the http site. In the shot below we can see where that happened (ringed) and so it's easy to see the offending link url, the link text and the page it appears on. Does this useful feature need to be easier to find?
2. Does a user expect that when they start at a https:// url, that an http:// link would be considered internal (and followed) or external (and not followed) ? Should this be a preference? (Possibly not needed as it's simple to add a rule that says 'do not check urls containing http://www.mysite.com)
3. Should Scrutiny alert the user if they start at an https:// url and an http:// version is found while scanning? After all, this is at the heart of the problem described above; the users assumed that all links were https:// and it wasn't obvious why they had a number of http:// links in their results.
Any thoughts welcome; email me or use the comments below.
Tuesday, 23 February 2016
Important new feature for those attempting to crawl a website with authentication
Scanning a website as an authenticated user is a common reason for people turning to Scrutiny.
The process necessarily involves some trial and error to get things set up properly, because different websites use different methods of authorisation and sometimes have unusual security features.
Scrutiny now has an important new feature. Some login forms use a 'security token'. I'm not going to go into details (I wouldn't want to deprive my competitors of the exasperations that I've just been through!)
There's a simple checkbox to switch this feature on (available since Scrutiny v6.4), and this may enable Scrutiny to crawl websites that have been uncooperative so far. (This may well apply to websites that have been built using Expression Engine).
All the information I have about setting Scrutiny up to scan your site (or member-only pages etc) which requires authentication is here.
Version 6.4 is in beta as I write this, if you're interested in trying it, please just ask.
Small print: Note that some care and precautions (and a good backup) are required because scanning a website as an authenticated user can affect your website. Yes, really! Use the credentials of a user with read access, not author, editor or administrator.
The process necessarily involves some trial and error to get things set up properly, because different websites use different methods of authorisation and sometimes have unusual security features.
Scrutiny now has an important new feature. Some login forms use a 'security token'. I'm not going to go into details (I wouldn't want to deprive my competitors of the exasperations that I've just been through!)
There's a simple checkbox to switch this feature on (available since Scrutiny v6.4), and this may enable Scrutiny to crawl websites that have been uncooperative so far. (This may well apply to websites that have been built using Expression Engine).
All the information I have about setting Scrutiny up to scan your site (or member-only pages etc) which requires authentication is here.
Version 6.4 is in beta as I write this, if you're interested in trying it, please just ask.
Small print: Note that some care and precautions (and a good backup) are required because scanning a website as an authenticated user can affect your website. Yes, really! Use the credentials of a user with read access, not author, editor or administrator.
Sunday, 14 February 2016
Scrutinize your website - Scrutiny for Mac 50% offer for the rest of Feb
As flagged up in a recent mailing to those on the email list, A 50% offer will be running on Scrutiny for Mac for the rest of the month. The app is used by larger organisations and individuals alike. It seems fair to give the smaller businesses the opportunity to buy at a more affordable price.
Recent enhancements include 'live view' (shown below) and improved 'site sucking' page archiving while scanning.
So here it is - for 50% discount for the rest of February, please use this coupon.
693CFA08
This isn't a key, click 'buy' when the licensing window appears and look for the link that says 'check out with coupon and use the code above for a 50% discount.
Recent enhancements include 'live view' (shown below) and improved 'site sucking' page archiving while scanning.
So here it is - for 50% discount for the rest of February, please use this coupon.
693CFA08
This isn't a key, click 'buy' when the licensing window appears and look for the link that says 'check out with coupon and use the code above for a 50% discount.
Tuesday, 9 February 2016
Improved archiving functionality in Scrutiny
I hadn't appreciated What a complex job sitesucker-type applications do.
In the very early days of the web (when you paid for the time connected) I'd use SiteSucker to download an entire website and then go offline to browse it.
But there are still reasons why you might want to archive a site; for backup or for potential evidence of a site's contents at a particular time, for two examples.
Integrity and Scrutiny have always had the option to 'archive pages while crawling'. That's very easy to do - they're pulling in the source for the page in order to scan it, why not just save that file to a directory as it goes.
Although the file then exists as a record of that page, viewing in a browser often isn't successful; links to stylesheets and images may be relative, and if you click a link it'll either be relative and not work at all, or absolute and whisk you off to the live site.
Processing that file and fixing all these issues, plus reproducing the site's directory structure, is no mean feat, but now Scrutiny offers it as an option. As from Scrutiny 6.3, the option to process (convert) archived pages is in the Save dialogue that appears when the scan finishes. Along with another option (a requested enhancement) to just go ahead and always save in the same place without showing the dialogue each time. These options are also available via an 'options' button beside the 'Archive' checkbox.
When some further enhancements are made, it'll be available in Integrity and Integrity Plus too.
In the very early days of the web (when you paid for the time connected) I'd use SiteSucker to download an entire website and then go offline to browse it.
But there are still reasons why you might want to archive a site; for backup or for potential evidence of a site's contents at a particular time, for two examples.
Integrity and Scrutiny have always had the option to 'archive pages while crawling'. That's very easy to do - they're pulling in the source for the page in order to scan it, why not just save that file to a directory as it goes.
Although the file then exists as a record of that page, viewing in a browser often isn't successful; links to stylesheets and images may be relative, and if you click a link it'll either be relative and not work at all, or absolute and whisk you off to the live site.
Processing that file and fixing all these issues, plus reproducing the site's directory structure, is no mean feat, but now Scrutiny offers it as an option. As from Scrutiny 6.3, the option to process (convert) archived pages is in the Save dialogue that appears when the scan finishes. Along with another option (a requested enhancement) to just go ahead and always save in the same place without showing the dialogue each time. These options are also available via an 'options' button beside the 'Archive' checkbox.
When some further enhancements are made, it'll be available in Integrity and Integrity Plus too.
Subscribe to:
Posts (Atom)