Next, a couple of new options to make the crawl more flexible.
Sites sometimes contain documents in pdf format which contain hyperlinks. Scrutiny can now scan those documents and check the links. It's an option turned off by default because pdfs can be huge and loading them in will obviously have an impact on the amount of memory Scrutiny is using.
The first thing you'll notice on starting Scrutiny 5 is the new interface. As Scrutiny has become bigger, the interface has become cluttered and bewildering to some. We hope that the new interface looks more welcoming. The list of sites shows more information than before with bigger icons.
I hope that's sparked your interest. Scrutiny 5.0 is ready for beta testing and if you'd like to be involved in return for a free upgrade from v4 then contact firstname.lastname@example.org
[update added 28 April 2014]
Scrutiny 5 is still in beta but is now on v5.0.3 and the following enhancements have been added:
- Page character encoding detection is improved, and character encodings now supported include CP1251 (Cyrillic script eg Russian, Bulgarian, Serbian Cyrillic)
- Now supports urls which include non-ascii characters. Some may argue that this is against web standards, but it's becoming more common and accepted by Google and browsers
- New option to include <lastmod> in xml sitemap. If this is checked, the last modified date for internal pages is logged (if the server gives it) and shown in the sitemap table