Wednesday 9 October 2013

Finding 'soft 404' internal and external links.

A 'soft 404' isn't the page that the user requested, but which returns a 200 response code. The page may say 'Page not found' or it may be a default page such as the home page or a special page set up for the purpose.

If the page doesn't state that the requested page hasn't been found then it's confusing for the visitor. Unless the page returns a 404 or 410 code then it's very difficult for a web crawler to find the broken link.

Google doesn't like such pages - they don't want to index a page which isn't the expected page. They and other search engines are testing sites for soft 404s. It's best if your site returns a 404 or 410 code when a page isn't found.

However, if your site does return soft 404s or if you want to find your external links that link to soft 404s, then from version 4.5, Scrutiny and Integrity can try to spot and highlight them.

There's a new section in Preferences:



You can switch the feature off (in Preferences) if you have a large site and want best performance and this feature isn't important to you

If your site does generate such pages and you'd like them marked as 404 rather than 200, then either find a term in the page content itself (such as 'sorry, the page you are looking for does not exist') and add this phrase to the list on a new line. If your soft 404 page has a specific url, then you can add all or part of this url to the list.

To find external pages which may be soft 404s, the box in Preferences contains a list of suspicious terms. The list may be increased in future versions, but you can add to it yourself too.

Pages which look like soft 404s (ie return a 2xx code but contain one of the terms in url or content) will have a status of 'soft 404' and will be marked red as per regular 404 pages.

No comments:

Post a Comment