Thursday 27 September 2012

New 'Links by page view' in Integrity

I must admit that I wasn't really seeing the true value of this new idea until this afternoon when I hooked it up to the 'Bad links only' switch and -

Wow! A list of pages that need attention, each opening up to show you the bad links on that page!

I'm just tidying up some loose ends and testing. The new version, v3.9 will be available very shortly. Still free.

Any thoughts, do let me know.

Wednesday 26 September 2012

Panda and Penguin in plain English

Thank you to Tekdig for this very easy-to-understand guide to Google's Panda and Penguin updates.

Do read the article but in short, make sure your content is high-quality - don't fill a page with links or stuff it with keywords, don't have lots of inbound links using the same keyword or from pages which look like link farms.

Tuesday 25 September 2012

Tutorial - how to limit the crawl of your website when using Integrity and Scrutiny

[updated 24 Jun 2019]

Although the interface looks quite simple, the rules behind these boxes may not be quite so obvious and so in this simple tutorial I'd like to help you to get the result that you want.

The best way to explain this will be with three examples. The manuals for some of my software are on the peacockmedia  domain and I'll assume that I want to run Integrity or Scrutiny but check those manual pages separately or not at all.

The first thing to say is that you may not need to use these rules. Integrity and Scrutiny have a 'down but not up' policy. So if you start your scan at:
then the scan will automatically be limited to urls 'below' For the purposes of this tutorial, I'll show some examples using blacklist / whitelist rules.

1. Blacklisting based on url (Integrity or Scrutiny)

Ignore urls that contain /manual/

All of the manual pages have '/manual/' in the url, so I can type '/manual/' (without quotes). Including the slash ensures that it'll only blacklist  directories called 'manual'. If I was confident that no other urls included the word 'manual' there'd be no need for the slashes.

Simply type a keyword or part of the url. (If you like, you can use an asterisk to mean 'any number of any character' and a dollar sign to indicate 'comes at the end')

We have the option of using 'Ignore', 'Do not check..' or 'Do not follow..' Check means get the header information and reporting the server response code. Follow means go one step further and collect the html of the target page and find the links on it.

So to disregard the manuals but still check the outgoing links to that area, I'll want to 'Do not follow..' If I don't even want to check those links but see them listed then it's 'Do not check..' If I want to disregard them completely then it's 'Ignore'.

Another use of the 'Do not check' box is to speed up the crawl by disregarding certain file types or that you either don't need to check or can't check properly anyway (such as secure pages if you're using Integrity which doesn't allow authentication). For example you can type .pdf, .mp4 or https:// into that box (or multiple values separated by comma). 

2. Whitelisting based on url (Integrity or Scrutiny)

Do not check urls that don't contain /manual/

Let's assume that on another occasion I want to only check the manual pages. This time I type the word '/manual/' into 'Only follow..'. It's important that I start my crawl at a page which contains the whitelisted term, eg

This time, links which are not whitelisted (ie those that don't contain 'manual') are checked and seem to be ok, but are in red because they're not being followed and I'm still highlighting blacklisted links.

3. Blacklisting based on content (Scrutiny)

This time we'll assume that I want to exclude the manual pages, but there isn't a keyword in the url that I can use for blacklisting.

I'm going to use the phrase "manual / help" which occurs in the footer of the manual pages but no other pages. So I type that phrase into 'Do not follow..' and tick 'Check content as well as url..'

The result is the same as the screenshot in the first example, but Scrutiny is finding my search term in the page content rather than the url.

In this example, the phrase won't be found in urls because it contains spaces, but for a single keyword, Scrutiny would look for the term in both url and content and blacklist the page if it finds it in either.

If the manuals were all on a subdomain, such as, it would be possible to blacklist or whitelist using the term "manual." but it would also be possible to use the 'Treat subdomains as internal' checkbox in Preferences. Subdomains is a bigger topic and one for its own tutorial.

Any problems, do get in touch

Monday 24 September 2012

Running a website check on schedule, sorting data and mining content for SEO keywords

Scrutinty v3 is finished, tested and released.

Although the new version of the webmaster tool suite comes fairly soon after v2, I've decided to make this a major release rather than a point version because there has been some serious work 'under the hood' making the crawl slightly faster and more memory-efficient, some interface improvements such as sorting on all views, and some important new features such as the ability to include content in the keyword count (pictured) or to run on schedule.

I've made a page detailing the new features:

Once more, the web download will be two or three weeks ahead of the App Store. I'm always challenged about this but it's simply because of the long wait for Apple to check it. And if it's rejected for some reason, then there's more work to do before re-submitting.

Any thoughts or questions, do get in touch