Friday 26 October 2012

Server returning 400 for url with no referer

I've had an interesting support problem this morning and thought that it might be useful to log the answer here.

The problem was Scrutiny not being able to get past the starting url - reporting '400 bad request'. But the same url would return the expected page in a browser.

It seems that this particular server doesn't like not being sent a 'referer' field. Scrutiny does send a referer for all other pages it crawls, filled in with the url of the page that the link appears on. But by definition there is no referer for the starting url and at present it doesn't send one.

Going to Advanced settings and entering 'referer' as the name of the first custom header field, with any valid url (including 'http://') as the value then the crawl worked.

Sending an empty string or a space for the value doesn't seem to work, so I'm not sure what the browsers do (If anyone knows the answer to this I'd be grateful)

