Tuesday 25 November 2014

203 server response code

Here's a very interesting code that Scrutiny turned up on a website.

Scrutiny reports '203 non-authoritative information'. W3C elaborates on this a little bit:

Partial Information 203
When received in the response to a GET command, this indicates that the returned metainformation is not a definitive set of the object from a server with a copy of the object, but is from a private overlaid web. This may include annotation information about the object, for example.

So this means that a third-party is providing the information you see. Presumably this is no different from something many of us do - making some space available on the page and allowing Google or another third party to populate it with advertising. (And indeed the page you get at the domain in question here is the kind of thing you'd expect from clicking on an ad).

What's interesting here is that you can visit a domain and see a page not controlled by the owner of that domain. I guess a less responsible owner wouldn't have the server give this response code, but this seems to me like information I'd really like to know while I'm browsing. Should your browser alert you to this?

Friday 14 November 2014

Link checking Wordpress websites using Scrutiny

This article is for you if you want to scan your Wordpress site for broken links, SEO issues, generate a sitemap, or perform spelling / grammar checks. It also applies to other CMS's which generate 'SEO-friendly' urls.

If you want to start your crawl at the root, www.my-wordpress-site.com then just go ahead, it should be fine and you can limit your crawl if you need to by including or excluding partial urls (blacklisting or whitelisting).

But if you wish to start deep within a site, Scrutiny will limit its crawl to pages within and below that directory. eg www.my-wordpress-site.com/publications/all-publications/

You may know that all-publications is a page. But to Scrutiny it looks like a directory (it's not the trailing slash, it's the lack of a file extension) and it will dutifully check all links on that page but only follow (crawl) links which are in the same 'directory' or below. Therefore your pages won't be fully crawled as you expect.

Since v5.5, Scrutiny has a new option on the Settings screen which allows you to tell Scrutiny that urls are in this form.

It means 'the last part of this url is a filename, not a directory' and so in our example, a crawl starting at www.my-wordpress-site.com/publications/all-publications/  would be limited to the /publications/  directory which is probably what you would expect.