Friday 19 January 2018

Limiting Integrity / Scrutiny's scan to a particular section of the site or directory

This is a frequently-asked question and a subject that's been on the workbench this week.

Integrity / Integrity Plus and Scrutiny allow you to set up rules which limit the scan, or prevent checking of urls matching a pattern:


The rules dialog* will shortly look as above; the dialog will appear as a sheet attached to its main window, it'll allow 'urls which contain' / 'urls which don't contain' and the 'only follow' option is gone from the first menu, leaving 'ignore', 'do not check', and 'do not follow'.

('only follow' was confusing because if you have two such rules, then it doesn't make logical sense. 'do not follow urls that don't contain' does the same job and makes sense if you have more than one.)

It's important to say that it shouldn't be necessary to use a rule such as 'do not follow urls that don't contain /mac/scrutiny'  if you are starting at peacockmedia.software/mac/scrutiny  because Integrity and Scrutiny should only follow urls at or below the 'directory' that you start at.

It's also important to note that 'not following' isn't the same as 'ignoring'. If a link appears on a page then you will see it in the link check results regardless of whether it matches your 'follow' pattern. If you only want to see urls which follow a certain pattern, use an 'ignore' rule instead.

(That last statement applies to the link check results. The SEO table in Scrutiny and Sitemap table in Integrity Plus and Scrutiny should only show pages which meet your rules and appear at or below the directory you start in.)

Important note - if you're 'ignoring urls not containing'  then you probably won't be able to find broken images or linked files (unless their urls happen to match your rule's pattern). So if you have the 'images and linked files' option checked then you'll need to use a 'follow' rule rather than 'ignore'.

Protip - this has been possible for a while but not very well documented: you can use certain special characters. An asterisk to mean 'any number of any character' and a dollar sign at the end to mean 'appears at the end'. For example,  "don't check urls that contain .dmg$" will only match urls where .dmg appears at the end of the url. And peacockmedia.software/*/mac/scrutiny will match  peacockmedia.software/en/mac/scrutiny  and peacockmedia.software/fr/mac/scrutiny

Regex is not supported (yet) in these rules (though it is available in Scrutiny's site search). It's not necessary to type whole urls or to use the asterisk at the start or end of your term. By default it's a partial match, so "urls not containing /fr/mac/scrutiny" will work.


The new "urls that don't contain" functionality is in Scrutiny from v7.6.8  and should shortly be available in Integrity and Integrity plus from v6.11.14


*Years ago, these rules were called 'Blacklist / Whitelist Rules'. I'm mentioning that in case anyone searches for those terms.

No comments:

Post a Comment