Tuesday 28 July 2015

403 'forbidden' server response when crawling website using Scrutiny

The problem: Scrutiny fails to retrieve the first page of your website and therefore gets no further. The result looks like this (above).

The reason: By default Scrutiny uses its own user-agent string (thus being honest with servers about its identity). This particular website (and the first I've seen for a long time to do this) is refusing to serve the website without the request being made from a recognised browser.

The solution: Scrutiny > Preferences > General

The first box on the first Preferences tab is 'User agent string'. A button beside this box allows you to choose from a selection of browsers (this is called 'spoofing'). If you'd like Scrutiny to identify itself as a browser or a version not in the list, just find the appropriate string and paste it in (if you can run your chosen browser, you can use this tool to find the UA string)

With the User agent string changed to that of a recognised browser, this problem may be solved.

