The best way to explain this will be with three examples. The manuals for some of my software are on the peacockmedia.co.uk domain and I'll assume that I want to run Integrity or Scrutiny but check those manual pages separately or not at all.
1. Blacklisting based on url (Integrity or Scrutiny)All of the manual pages have '/manual/' in the url, so I can type '/manual/' (without quotes). Including the slash ensures that it'll only blacklist directories called 'manual'. If I was confident that no other urls included the word 'manual' there'd be no need for the slashes.
Integrity and Scrutiny don't use complex pattern matching (other crawlers use regex or wildcards). Simply type a keyword or part of the url.
I have the option of using 'Do not check..' or 'Do not follow..' Check means get the header information and reporting the server response code. Follow means go one step further and collect the html of the target page and find the links on it.
So to disregard the manuals but still check the outgoing links to that area, I'll want to 'Do not follow..' If I don't even want to check those links then it's 'Do not check..'
Another use of the 'Do not check' box is to speed up the crawl by disregarding certain file types or that you either don't need to check or can't check properly anyway (such as secure pages if you're using Integrity which doesn't allow authentication). For example you can type .pdf, .mp4 or https:// into that box (or multiple values separated by comma).
2. Whitelisting based on url (Integrity or Scrutiny)
This time, links which are not whitelisted (ie those that don't contain 'manual') are checked and seem to be ok, but are in red because they're not being followed and I'm still highlighting blacklisted links.
3. Blacklisting based on content (Scrutiny)
The result is the same as the screenshot in the first example, but Scrutiny is finding my search term in the page content rather than the url.
In this example, the phrase won't be found in urls because it contains spaces, but for a single keyword, Scrutiny would look for the term in both url and content and blacklist the page if it finds it in either.
If the manuals were all on a subdomain, such as manual.peacockmedia.co.uk, it would be possible to blacklist or whitelist using the term 'manual.' but it would also be possible to use the 'Treat subdomains as internal' checkbox in Preferences. Subdomains is a bigger topic and one for its own tutorial.
Any problems, do get in touch