Tuesday, 26 January 2016

Integrity and Scrutiny displaying a 200 code for a url that doesn't exist

This problem is specific to the user (ie someone else somewhere else may correctly get an error reported for the same url). When pasted into the browser (or visited from within Integrity or Scrutiny) a page is shown, branded with the internet provider's logo, with a search box and maybe some advertising. 

What's happening? 

The user's internet service provider is recognising that the server being requested doesn't exist, and is 'helpfully' displaying something it considers more useful. My own provider says (quote from their website) "this service is provided free and is designed to enhance the surfing experience by reducing the frustration caused by error pages".

(Note the advertising - your provider is making money out of this.)

The content of the page they provide is neither helpful nor unhelpful, but the 200 code they return with the page is decidedly unhelpful when we're trying to crawl a website and find problems. A web crawler like Integrity or Scrutiny can only know that there's a problem with a link by the server response code.

Personally I think this practice is wrong. If you request a url where the server doesn't exist, it's incorrect to be shown a page with a 200 code.

This is similar to a soft 404 because a 200 is being returned when a request is sent for a page that doesn't exist. I'm tempted to call this a 'soft 5xx' because 5 codes are server errors, although in this case, if there is no server, then we can't have a server response code.

What can we do?

I now know of two providers that offer to switch this service off. Do some digging, your provider may have a web page that allows you to switch this preference yourself. If not, contact them and ask them to switch it off. Integrity / Scrutiny will then behave as expected.

If that fails, then you can use Integrity / Scrutiny's 'soft 404' feature. (Preferences > Links) Find some unique text on the error page (maybe in the page title) and type part or all of that text into this box:

The problem urls will then be reported with a 'soft 404' status which is better than the 200.

No comments:

Post a Comment