Tuesday, 3 November 2015

Which web pages does Scrutiny include in its SEO table and Sitemap table?

When asked that question recently I had to look through code to find definitive answers (not ideal!) and realised that the manual should contain that information.

It does now, and here's the list for anyone who's interested:

The SEO table will include pages that are:
  • html*
  • internal (urls with a subdomain may or may not be treated as internal depending on whether the preference is checked in Preferences > General)
  • status must be good (ie urls with status 0, 4xx or 5xx will not be included)
  • not excluded by your blacklist / whitelist rules on the Settings screen
  • your robots.txt file will be observed (ie a page will be excluded if it is disallowed by robots.txt) if that preference is checked on the Settings screen (below Blacklist and Whitelist rules)
The sitemap table will include a subset of those pages - so in addition to the above, the following rules apply:
  • will include pdfs (if that preference is checked in Preferences > Sitemap)
  • not excluded by your robots.txt file (if that box is checked in Preferences > Sitemap)
  • not excluded by a 'robots noindex' meta tag (if that box is checked in Preferences > Sitemap)
  • does not have a canonical meta tag that points to a different url
As always, I'm always very happy to look into any particular example that you can't make sense of. 

Scrutiny's SEO results table

* this doesn't mean a .html file extension, but the mime type of the page as it is served up. Most web pages will be html. Images are shown in the SEO results but in a separate table which shows the url, page it appears on, and the alt text.

** Although Integrity Plus doesn't display SEO results (at present) it does display the same sitemap table as Scrutiny and all of the rules above apply.

No comments:

Post a comment