Saturday 27 February 2016

Finding http links on an https website - Part 2

Since writing this post, and given that secure (https) websites are becoming more popular, Scrutiny can now specifically look out for links to the http version of your site, alert you to the fact if there are any, and offer full details.

This new behaviour is all switched on by default, but there are two new checkboxes in Preferences

Taken separately, that first checkbox is quite important because if you're scanning an https website, you probably don't want http versions of your pages included in your xml sitemap.

All of this is in Scrutiny v6.4 which will be released as beta soon. If you're interested in checking it out, please just ask.

Wednesday 24 February 2016

Finding http links on an https website

A couple of Scrutiny support calls have recently been along the lines "Why is your tool reporting a number of http links on my site? All internal links are https://  Is this a bug?"

In both cases, an internal link did exist on the site with the http scheme. Scrutiny treats this link as internal (as long as it has the same domain) follows it, and then all relative links will of course have the http scheme as well.

[Update - since writing this post, new functionality has been added to Scrutiny - read about that here]

I'm thinking about three things:

1. The 'Locate' function is ideal for tracing the rogue link that shunts Scrutiny (and a real user of course) over to the http site. In the shot below we can see where that happened (ringed) and so it's easy to see the offending link url, the link text and the page it appears on. Does this useful feature need to be easier to find?



2. Does a user expect that when they start at a https:// url, that an http:// link would be considered internal (and followed) or external (and not followed) ? Should this be a preference? (Possibly not needed as it's simple to add a rule that says 'do not check urls containing http://www.mysite.com)

3. Should Scrutiny alert the user if they start at an https:// url and an http:// version is found while scanning? After all, this is at the heart of the problem described above; the users assumed that all links were https:// and it wasn't obvious why they had a number of http:// links in their results.

Any thoughts welcome; email me or use the comments below.

Tuesday 23 February 2016

Important new feature for those attempting to crawl a website with authentication

Scanning a website as an authenticated user is a common reason for people turning to Scrutiny.

The process necessarily involves some trial and error to get things set up properly, because different websites use different methods of authorisation and sometimes have unusual security features.

Scrutiny now has an important new feature. Some login forms use a 'security token'. I'm not going to go into details (I wouldn't want to deprive my competitors of the exasperations that I've just been through!)



There's a simple checkbox to switch this feature on (available since Scrutiny v6.4), and this may enable Scrutiny to crawl websites that have been uncooperative so far. (This may well apply to websites that have been built using Expression Engine).

All the information I have about setting Scrutiny up to scan your site (or member-only pages etc) which requires authentication is here.

Version  6.4 is in beta as I write this, if you're interested in trying it, please just ask.

Small print: Note that some care and precautions (and a good backup) are required because scanning a website as an authenticated user can affect your website. Yes, really! Use the credentials of a user with read access, not author, editor or administrator.

Sunday 14 February 2016

Scrutinize your website - Scrutiny for Mac 50% offer for the rest of Feb

As flagged up in a recent mailing to those on the email list, A 50% offer will be running on Scrutiny for Mac for the rest of the month.  The app is used by larger organisations and individuals alike. It seems fair to give the smaller businesses the opportunity to buy at a more affordable price.

Recent enhancements include 'live view' (shown below) and improved 'site sucking' page archiving while scanning.

So here it is - for 50% discount for the rest of February, please use this coupon.

693CFA08

This isn't a key, click 'buy' when the licensing window appears and look for the link that says 'check out with coupon and use the code above for a 50% discount.

Tuesday 9 February 2016

Improved archiving functionality in Scrutiny

I hadn't appreciated What a complex job sitesucker-type applications do.

In the very early days of the web (when you paid for the time connected) I'd use SiteSucker to download an entire website and then go offline to browse it.

But there are still reasons why you might want to archive a site; for backup or for potential evidence of a site's contents at a particular time, for two examples.



Integrity and Scrutiny have always had the option to 'archive pages while crawling'. That's very easy to do - they're pulling in the source for the page in order to scan it, why not just save that file to a directory as it goes.

Although the file then exists as a record of that page, viewing in a browser often isn't successful; links to stylesheets and images may be relative, and if you click a link it'll either be relative and not work at all, or absolute and whisk you off to the live site.

Processing that file and fixing all these issues, plus reproducing the site's directory structure, is no mean feat, but now Scrutiny offers it as an option. As from Scrutiny 6.3, the option to process (convert) archived pages is in the Save dialogue that appears when the scan finishes. Along with another option (a requested enhancement) to just go ahead and always save in the same place without showing the dialogue each time. These options are also available via an 'options' button beside the 'Archive' checkbox.


When some further enhancements are made, it'll be available in Integrity and Integrity Plus too.