Tuesday 28 July 2015

403 'forbidden' server response when crawling website using Scrutiny

The problem: Scrutiny fails to retrieve the first page of your website and therefore gets no further. The result looks like this (above).

The reason: By default Scrutiny uses its own user-agent string (thus being honest with servers about its identity). This particular website (and the first I've seen for a long time to do this) is refusing to serve the website without the request being made from a recognised browser.

The solution: Scrutiny > Preferences > General

The first box on the first Preferences tab is 'User agent string'. A button beside this box allows you to choose from a selection of browsers (this is called 'spoofing'). If you'd like Scrutiny to identify itself as a browser or a version not in the list, just find the appropriate string and paste it in (if you can run your chosen browser, you can use this tool to find the UA string)

With the User agent string changed to that of a recognised browser, this problem may be solved.

Tuesday 21 July 2015

Getting started with Scrutiny - first video

A milestone! I can't tell you how please I am with our first instructional video.

It's a quick tour of Scrutiny for Mac, performing a basic link check, reading the results, discussing a few settings and some troubleshooting. Much of this will be relevant to Integrity and Integrity Plus.

... top marks to tacomusic who has just become the voice of PeacockMedia!

Apple Music not playing nicely

[update 16/8/15] This issue now seems to be resolved in iTunes

21 Jul 2015

Screensleeves, the album art screensaver for Mac, is currently having trouble displaying the artwork and track details when the new streaming service (Apple Music) is being used.

Applescript (the sensible way for applications to talk to each other) is often overlooked - there are ongoing problems with Spotify, although it's been possible to work around most of these. But it's particularly disappointing when Apple themselves neglect their own scripting interface.

iTunes' 'current track' seems broken when it comes to the new music service as developers of other apps have reported.

I hope that this is 'teething trouble' with the new service and can only suggest installing updates when they're available.

Wednesday 15 July 2015

Scrutiny included in July Mac Bundle

Scrutiny 5 - Improve your website's quality, SEO and user experience. Find out more

A brief message because PeacockMedia's flagship application, Scrutiny, has been included in this bundle at short notice and the clock is already running. 

For 14 more days, Scrutiny is one of eight quality Mac apps available at a ridiculous discount in BundleCult's July bundle.

Details of how to take advantage of this offer

Thursday 9 July 2015

Using Integrity to scan Blogger sites for broken links - some specifics

I've recently been helping someone with a few issues experienced when testing a Blogger blog with Integrity.

Some of these things are of general interest, some will be useful to anyone else who's link-checking a Blogger site. These tips apply equally to Integrity Plus and Scrutiny.

1. Share links being reported as bad

You may have these share links at the bottom of each post.
As you'd expect, they redirect to a login page, so no danger of Integrity actually sharing any of your posts. The problem comes when you're testing a larger site with more threads. These links may eventually begin to return an error code. I don't know whether this is because of the heavy bombardment on the share functionality, or whether Blogger is detecting the abnormal use. Either way, you may begin to get lots of red in your results.

One solution is to turn down the number of threads to a minimum. This isn't desirable because the crawl will then take hours. A better solution is to ask Integrity not to check those links (it's pretty certain that they'll be ok).

(Note: Even though these link use a querystring with parameters, checking 'ignore querystrings' won't work because these links have a different domain to the blog address, thus they look like external links and the 'ignore querystrings' setting only applies to internal links.)

Add a 'blacklist rule' using the little [+] button (screenshot below). Make a rule that says 'do not check urls containing share-post'
While here, add similar rules for 'delete-comment' and 'post-edit'. It was a concern to see these urls appearing in my link-check results. They do indeed appear in the pages' html code, although they're hidden by the browser if you're browsing as a guest. But no need to worry - as you'd expect, they also redirect to a login screen and Integrity isn't capable of logging in. *

2. A large amount of yellow

Integrity highlights redirected urls in yellow. Not an error but a 'FYI'. Some webmasters like to find and deal with redirects, but the Blogger server uses redirects extensively and it's just part of the way it works. When testing a Blogger site, you will see a lot of these but it's not usually something you need to worry about.

If you like, you can change the colour that Integrity uses to highlight such links - you can change it to white, or better still, transparent. See Preferences > Views and then click the yellow colour-well to see the standard OSX colour picker with an 'opacity' slider.

3. Pageviews on your website

Given that Google Analytics uses client-side javascript to make it work (meaning that crawling apps like Integrity don't trigger page views **) I was surprised to find Integrity triggering page views with a Blogger site. I guess it counts the views server-side.

It seems that changing the user-agent string to that of Googlebot stopped these hits from registering.

The user-agent string is how any browser or web crawler identifies itself. It's useful for a web server to know who's hitting on it.

Posing as Googlebot by using the Googlebot user-agent string:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
... seems to work - it prevents hits from triggering page views in Blogger's dashboard

Deliberately using another string (known as 'spoofing') is technically mis-use of the user-agent string, but until Google recognises Scrutiny and Integrity as web crawlers, then I think this is forgivable. If you'd like to be a little more transparent then I've found that this alternative also works:
Integrity/5.4 (posing as: Googlebot/2.1; +http://www.google.com/bot.html)

I will be shortly building this Googlebot string into the drop-down picker in Preferences. In the mean time just go to Preferences > Global and paste one of those strings into the 'User-agent string' box.

* neither Integrity or Integrity Plus are capable of authenticating themselves, in effect they're viewing websites as an anonymous guest. Scrutiny is capable of authentication, it's a feature that's much in demand (if you want to test a website which requires you to log in before you see the content) but the feature must be used with care - it's not possible to switch it on without seeing warnings and advice.

** I guess that Scrutiny could trigger page views when its 'run js' feature is switched on, though I haven't tested that

Saturday 4 July 2015

New view in Integrity / Scrutiny groups links by status

I don't know how it's taken so long to do this (Integrity has been around since 2007)
Some people are interested in the redirect code and like to sort those out. Other people just care about the final status of the page after redirection. No problem, choose initial status, final status or the combination which you're used to seeing.
The 'bad links only' button will work on this view just the way you're used to. This view can be exported, perhaps you want to expand to show just one  particular status, or just 5xx codes for example, before exporting.

Finally, one more thing is worth mentioning here as we've touched on redirects. Some people are tasked (or task themselves) with making a list of all redirected urls (3xx). Integrity, Integrity Plus and Scrutiny will achieve the task using the techniques above. But the most effective way to achieve this is to use Scrutiny's filter button for 'redirects'. Shown below is the flat view sorted by status but the new 'by status' view will do just as well. When exported to csv, any filter or search will be respected.

Wednesday 1 July 2015

SiteViz updated with 3D theme

The video in the last post isn't terribly clear, so here's a static screenshot of the new 3D theme in SiteViz.
I've released a new version of SiteViz today - still beta - it needs work - but you're more than welcome to try it and feed back. It opens the sitemap visualisation file generated by Integrity Plus and Scrutiny and displays it in a number of ways