Friday, 23 August 2013

Using canonical href to exclude duplicates from your xml sitemap

Here's the problem. Scrutiny is finding the same page on your site twice, each with a different url, and including both in your sitemap.
Duplicates in your xml sitemap may not be such a problem according to Google.

However, the same article explains that Google like to know which version of your url they should rank and which page they shouldn't.

The canonical href is the answer. Here is the explanation from Google, but in short, you need to insert a meta tag like this:


<link rel="canonical" href="http://peacockmedia.co.uk" />


This line means 'this is the url I'd like you to rank for this content'. (The page at the url given should obviously have the same content.)

From version 4.3, Scrutiny will pick up this canonical href. You'll find it listed in the SEO table but the column may be way over to the right, or you may need to switch it on in Preferences > Views. Note (as with any of the columns in Scrutiny)  if you're interested in this column you can move it by dragging:

 You can see in the screenshot above that the index page in this example now has the canonical href. After re-crawling the site, the problem at the start of this article has gone away. Scrutiny's Sitemap tab now only excludes pages where canonical (if present) doesn't match the url of the page. When I export my XML sitemap, only the http://peacockmedia.co.uk  version will be included.

Note that Scrutiny will exclude pages according to canonical href in version 4.3.1 and higher

Find duplicate content (same content, different url) in your website using Scrutiny

[updated 24 Jun 2019]

Duplicate content on your website is sometimes said to harm your search engine optimisation. It may not be such a serious problem as this article explains.

Here's how to check for duplicate content on your site using Scrutiny.  Integrity Pro also has this functionality and will look much the same.

1. First scan your site. If your site isn't already in the sites list, press 'New' and type your starting url. Press 'Next' to see the default settings, press 'Next' again to accept those settings. Press 'Go' beside 'Check for broken links'.


2. When the crawl has finished, switch to the SEO tab

3. Switch the 'Filter' to 'Pages with possible duplicates'

You'll now see possible duplicates in the main view. To see a list of the pages that Scrutiny thinks are duplicates for a particular url, double-click it to open the page inspector.


In this example, the problem has arisen because scrutiny has found links to the same page in two different forms - peacockmedia.co.uk/clipassist  and peacockmedia.co.uk/clipassist/index.html

Dealing with duplicates

If you want to deal with this problem, use canonical meta tag in your pages

Thursday, 1 August 2013

Categorising your items in Organise

Organise v7.1 adds categories to Items. Items can be in one or more categories, or none. If you don't need to use categories then just ignore the categories box and all will work as before.

1. Categorising your items

You can type a category into individual items. For more than one category, separate using a comma:  

Alternatively, you can set the category for more than one item by selecting them (hold down shift to select more than one item in the table, use the search box to find your items, ctrl-click or right-click to call this context menu): 

Note that you can  remove multiple items from a category using the same method. To remove a category completely, choose it from the Filter box, select all items in the list and use the Remove menu item.

2. Using categories

The filter drop-down box will contain all the categories that you have entered. Simply select a category to see all items in that category. Alternatively you can type the category into the search box:

When adding an Item to an Order, you can type a category into the search box:

Reports that work on your inventory can now be tweaked to select items from a category:

Monday, 24 June 2013

Productivity improvements in Integrity and Scrutiny

Over the years the web crawling engine that's shared by Integrity and Scrutiny has become faster, more efficient, more accurate and free of problems. It does what it does really well.

But the interface hasn't kept up; it displays its results and then you're on your own. Support calls have been about what people want to do next:

"Where is this link that it's reporting?"

"I only want to copy that URL but there doesn't seem to be an easy way to do it"

I hope that the new version 4.2 will help with such tasks. Useful functions such as 'Copy URL', 'Visit', 'Highlight' and the exciting new 'Locate' function can be found via context menus, buttons, keyboard shortcuts and menus.

First of all, the 'by link' view is an expandable view, meaning that the list of pages that the link appears on can more intuitively be seen from that view without having to open the link inspector:

The link inspector is improved. You can still double-click in its table to either visit or highlight (according to your preferences) but you can now also pop up a context menu with a number of options, or use the new buttons to visit, highlight or locate:

The Locate function is a big help in those situations where you're not quite sure how the crawler has found a certain page - maybe it's an old page you thought you'd orphaned. Previously it was possible to trace the path but it was time-consuming. Now you can call up a list of the clicks required from your starting point to the link in question:

These useful functions are available from context menus - right-click or command-click the link or page to pop up a menu:

The new versions of Integrity and Scrutiny containing these features are available for download:

http://peacockmedia.co.uk/scrutiny/

http://peacockmedia.co.uk/integrity/

And on the Mac App Store shortly.

Wednesday, 12 June 2013

Periodic Table Of SEO Success Factors

I've just seen this wonderful graphic showing the factors which influence your search engine ranking with an indication of how each is weighted (and which ones work against you).


For the large version complete with lots of explanation.

Scrutiny will be able to help you with many of these things.

Thursday, 23 May 2013

Find missing meta tags

[Updated 24 Jun in line with Scrutiny and Integrity Pro v9]

Meta keywords may not be as important now as they used to be, but your meta title is one of the most important SEO factors and your meta description will appear on search result pages and net you click-throughs.

Here's how to check your site to see whether these things are in place using Scrutiny and Integrity.

1. First you need to scan your site. Scrutiny: At the Sites screen, press 'New' and type your starting url. Press 'Next' to see the default settings, press 'Next' again to accept those settings. Press 'Go' beside 'SEO'.

Integrity: Type your starting url into the address bar and press Go. Integrity Plus and Pro, First press the [+] button below the left-hand pane to add a new site.



2. When the crawl has finished, the SEO screen will open

3. Use the Filter box to select 'missing title' or 'missing description'

4. Scrutiny and Integrity Pro have a Meta Data tab under the SEO tab. There are many columns that you can switch onto show various meta data. So if you're interested in the Twitter tags, then you can switch on those columns and see which pages contain those tags or are missing them.




How to test a website which requires authentication

[last updated 23 Mar 2021]
This process will involve a little experimentations because of the different ways that authentication can work. It may be as simple as checking a box.

It also involves a risk of changing or deleting your pages - yes really. With some content management systems, the buttons to perform these actions can look like links to Scrutiny and it will dutifully try to click them.  Here are some precautions:
  • If possible, use an account which has access to view the site but no higher
  • If you know the url(s) of links which could perform changes, use Scrutiny's blacklist feature to make sure that they're 'not checked'
  • Make sure that your site is backed up and that you're prepared to restore if necessary

So, we're ready to go. Work through these steps until Scrutiny successfully crawls the site as an authenticated user. 

1. Go to Advanced Settings and Check 'Attempt to authenticate'. You will see a warning - read it, heed it and OK it.


2. Step 2 used to be:
Log into your site using Safari, using an account which has read access but no higher.  If your site has an option for 'keep me logged in' then check that. (step 9 should be up here - do that now) Then try to crawl the site.
However, that method won't work if you're on Yosemite or newer because of tighter security in MacOS making cookies browser-specific and no longer systemwide. If you are on 10.9 or earlier, do that ^.

If you're on 10.10 or higher, here's the new step 2. Check the 'attempt to authenticate' button.

It's a simple workaround - if your website tracks you using a cookie (rather than session id) then you can use a simple browser window here to log in. If you check the 'handle cookies' button, then when you start Scrutiny's crawl, it should retain and use the cookie you just collected. 

(Version 10.3.1 updated the webview used in the Log in window. This helped the functionality to work with some sites and broke the functionality for others. Version 10.3.3 offers a choice - try the legacy version first, if that doesn't work, try the other.)




3. This is worth a try, I have seen it work. It may or may not work when you try it in your browser - currently it works in some browsers but not others, but may still work if you try it in Scrutiny.

Add the username and password to your starting url, in the form:
http://user:password@example.com
4. If that doesn't work, enter the username and password into the top two boxes in the Advanced Settings window and try again.

5. If your site uses a web form to send the authentication details ( eg Wordpress) then find out the names of the username and password fields. Here's a snippet of the source for the site above, and you'll see that the names of the fields in this case (a Wordpress site) are 'log' and 'pwd'. Enter these in the second pair of boxes and try the crawl again.

<input type="text" name="log" id="user_login" class="input" value="" size="20" />
<input type="password" name="pwd" id="user_pass" class="input" value="" size="20" />

6. You may need to experiment with your starting url too. If you're using the web form fields described in step 4 then Scrutiny will send these by POST request but it'll only send them to your starting url. (The site should use a cookie or some kind of session id after that.) Again, if you check the source code of your login form, find the form action and use that url as your starting url. 

7. If authentication still not working, check your html login form for hidden fields (or visible ones) which may be necessary for the login to work. [Note step 7 - this step may now be superseded by a new checkbox.] Since v5.5 you can enter any field names and values to be included in the POST request:



8. If your login form uses a security token - check the box and try again. (This may take care of step 7 so it may not then be necessary to add the names / values of  hidden fields). This feature is available in Scrutiny v6.4 onwards.




9. If no joy, this may work: Using a custom header field in the Advanced settings panel, set Field to 'Authorization' and Value to 'Basic [base64 encoded credentials]'. Credentials should  be in the form username:password and an encoded version can be obtained here

10. Once logged in, there may be a 'logout' link on your pages. Obviously you don't want Scrutiny to log itself out on the first pageful of links, so you may have to blacklist such links (see screenshot above).


11. If you've tried all of this and are still unable to log in, please contact Scrutiny support. Be ready to let us have the details for a test user account with read access but no higher.