PeacockMedia: Improved archiving functionality in Scrutiny

Tuesday, 9 February 2016

Improved archiving functionality in Scrutiny

I hadn't appreciated What a complex job sitesucker-type applications do.

In the very early days of the web (when you paid for the time connected) I'd use SiteSucker to download an entire website and then go offline to browse it.

But there are still reasons why you might want to archive a site; for backup or for potential evidence of a site's contents at a particular time, for two examples.

Integrity and Scrutiny have always had the option to 'archive pages while crawling'. That's very easy to do - they're pulling in the source for the page in order to scan it, why not just save that file to a directory as it goes.

Although the file then exists as a record of that page, viewing in a browser often isn't successful; links to stylesheets and images may be relative, and if you click a link it'll either be relative and not work at all, or absolute and whisk you off to the live site.

Processing that file and fixing all these issues, plus reproducing the site's directory structure, is no mean feat, but now Scrutiny offers it as an option. As from Scrutiny 6.3, the option to process (convert) archived pages is in the Save dialogue that appears when the scan finishes. Along with another option (a requested enhancement) to just go ahead and always save in the same place without showing the dialogue each time. These options are also available via an 'options' button beside the 'Archive' checkbox.