Showing posts with label SEO. Show all posts
Showing posts with label SEO. Show all posts

Saturday, 4 September 2021

Crawling big-name websites. Some thoughts.

Over the last couple of weeks I've been crawling the websites of some less-popular* big names. 

I enjoy investigating websites, it gives me some interesting things to think about and comment on, and it allows me to test my software 'in the wild'.

Already I'm feeling disappointed with the general quality of these sites, and I'm noticing some common issues. 


The most common by far is the "image without alt text" warning. As someone with a history in website accessibility, this is disappointing, particularly as it's the easiest accessibility improvement and SEO opportunity. Above is a section of the warnings from the RBS site. Every page has a list of images without alt text, and I see this regularly on sites that I'm crawling.

Next are the issues which may be the result of blindly plugging plugins and modules into a CMS. Last week I saw the issue of multiple <head> tags, some of them nested in the Shell UK website. This showed up a small issue with Scrutiny (fixed in 10.4.2 and above). 

One of the sites I've crawled this week, Ryanair, showed a different problem which may also be the result of plugins that don't play nicely together. 

The content page has two meta descriptions. Only one of them is likely to be displayed on Google's search results page. Don't leave that to chance.

Before getting to that point, the first black mark to Ryanair is that the site can't be viewed without javascript rendering. It's all very well for js to make pretty effects on your page but if nothing is visible on the page without js doing its stuff in the browser, then that is bad accessibility and arguably could hinder search engines from being able to index the pages properly**

This is what the page looks like in a browser without JS enabled, or on any other user agent that doesn't do rendering. This is what Integrity and Scrutiny would see by default. To crawl this site we need to enable the 'run js' feature. 

This aspect of the site helps to mask the 'double-description' problem from a human - if you 'view source' in a browser (depending on the browser) you may not even see the second meta description because you may see the 'pre-rendered' page code.

 Scrutiny reported the problem and I had to look at the 'post-rendered' source to see the second one:

I hope you enjoy reading about this kind of thing. I enjoy doing the investigation. So far no-one from any of the companies I've mentioned on blog pages and tweets have made contact, but I'd welcome that. 




*less-popular with me.

** It used to be the case that no search engine would be able to index such a page. Now Google (but not all search engines) does render pages. To some extent. 

Tuesday, 10 March 2020

Changes to nofollow links : sponsored and ugc attributes : how to check your links

Google announced changes last year to the way they'd like publishers to mark nofollow links.

The rel attribute can also contain 'sponsored' or 'ugc' to indicate paid links and user-generated content. A while ago, nofollow links were not used for indexing or ranking purposes. But this is changing. Google will no longer treat them as a strict instruction to not follow or index.

This article on moz.com lists the changes and how these affect you.

As from version 9.5.6 of Integrity (including Integrity Plus and Pro) and version 9.5.7 of Scrutiny, These apps allow you to see and sort your links according to these attributes.

There was already a column in the links views for 'rel' which displayed the content of the rel attribute, and a column for 'nofollow' which displayed 'yes' or 'no' as appropriate. Now there are new columns for 'sponsored' and 'ugc' (also displaying yes/no for easy sorting). Many of the views have a 'column' selector . If visible, these columns will be sortable and they'll be included in csv exports.


Thursday, 2 February 2017

Scrutiny 7 launched! 50% deal via MacUpdate

After many months in development and more in testing, Scrutiny v7 is now available.


Scrutiny builds on the link tester Integrity. As well as the crawling and link-checking functionality it also handles:

  • SEO - loads of data about each page
  • Sitemap - generate and ftp your XML sitemap (broken into parts with a sitemap index for larger sites)
  • Spelling and grammar check
  • Site search with many parameters including multiple search terms and  'pages that don't contain'
  • Many advanced features such as authentication, cookies, javascript.


The main new features of version 7 are:

  • Multiple windows - have as many windows open as you like to run concurrent scans, view data, configure sites, all at once
  • New UI, includes breadcrumb widget for good indication of where you are, and switching to other screens
  • Organise your sites into folders if you choose
  • Autosave saves data for every scan, giving you easy access to results for any site you've scanned
  • Better reporting - summary report looks nicer, full report consists of the summary report plus all the data as CSV's
  • Many more new features and enhancements

MacUpdate are currently running a 50% discount. [update, now finished, but look out for more]

Note that there's an upgrade path for users of v5 and v6 with a small fee ($20). You can use this form for the upgrade.

Friday, 2 September 2016

Scrutiny v7 - closer!

The new version of Scrutiny for MacOS has now made it off the scraps of paper and as far as a working prototype (as far as the UI is concerned, which is where the major changes are).

The new features are:

  • Organise your sites into folders, with drag and drop to move them around (above)
  • Next and Previous buttons are gone; navigate by simply clicking what you want 
  • A new breadcrumb widget (top-left in the screenshots) allows you to navigate as well as giving a sense of location
  • The growing list of site-specific settings are organised into tabs. These had become so disorganised and ugly with at least two dialogs (advanced and schedules) accessed via buttons
  • Scrutiny becomes document based meaning as many windows open as you like showing different sites (just cmd-N or File-New to open a new window). Make multiple simultaneous scans
  • This also makes better handling of data, with windows remembering their state and their data (if autosave switched on)
  • Improved flow - From the summary / settings screen choose to make a new scan, view existing data (if available) or load data. Only after the scan do you choose which results you want to view. 


There will be few changes to the link checking, SEO check, sitemap and other functionality.

If you would like to have a click around this prototype and feed back, please just ask.
SaveSave

Tuesday, 3 November 2015

Which web pages does Scrutiny include in its SEO table and Sitemap table?

When asked that question recently I had to look through code to find definitive answers (not ideal!) and realised that the manual should contain that information.

It does now, and here's the list for anyone who's interested:

The SEO table will include pages that are:
  • html*
  • internal (urls with a subdomain may or may not be treated as internal depending on whether the preference is checked in Preferences > General)
  • status must be good (ie urls with status 0, 4xx or 5xx will not be included)
  • not excluded by your blacklist / whitelist rules on the Settings screen
  • your robots.txt file will be observed (ie a page will be excluded if it is disallowed by robots.txt) if that preference is checked on the Settings screen (below Blacklist and Whitelist rules)
The sitemap table will include a subset of those pages - so in addition to the above, the following rules apply:
  • will include pdfs (if that preference is checked in Preferences > Sitemap)
  • not excluded by your robots.txt file (if that box is checked in Preferences > Sitemap)
  • not excluded by a 'robots noindex' meta tag (if that box is checked in Preferences > Sitemap)
  • does not have a canonical meta tag that points to a different url
As always, I'm always very happy to look into any particular example that you can't make sense of. 

Scrutiny's SEO results table

* this doesn't mean a .html file extension, but the mime type of the page as it is served up. Most web pages will be html. Images are shown in the SEO results but in a separate table which shows the url, page it appears on, and the alt text.

** Although Integrity Plus doesn't display SEO results (at present) it does display the same sitemap table as Scrutiny and all of the rules above apply.

Sunday, 21 June 2015

Surprising SEO results with a personal blog

A small but perhaps very useful enhancement to Scrutiny (in progress right now) is to remove the limited summary here on the SEO results screen (previously it just gave numbers for pages without title / meta description) in favour of a more comprehensive summary:
Screenshot showing Scrutiny's SEO results table plus the new summary text

Bit of a surprise with this one (a personal blog).

Previously with Scrutiny you've had to use the filter button to visit the results for each test in turn. Now the list is just there at a glance and I guess I haven't been very vigilant here - I wasn't aware that blogger don't automatically stick in a description and I guess I've always been too excited about each new blog post to worry about image alt text....

Saturday, 24 May 2014

Google able to index content relying on javascript

When you've worked as a webmaster with accessibility an important part of the job, it's difficult to accept that javascript is now becoming a more legitimate part of the page's rendering process.

Early on, any important content needed to be visible as text with javascript disabled. If it couldn't be seen in a text-only browser without js then it wasn't going to get past me and onto a local government website.

Justifications include Googlebot's blindness to such content and the anecdotal user with assistive technology. (Not really hypothetical, I met some.) But even then I couldn't help feeling that perhaps their software ought to be capable of a bit more, rather than the web being limited by the most basic or old user agents.

I guess the tipping point is the point at which Google is able to index content that relies on javascript to render it. That shoots a big fox of old stalwarts like me and that point has arrived.

This is the reason for Scrutiny now being able to execute javascript before scanning a page (it's early days and if it doesn't work as expected for you, we need to know, let support know.)

This article from Google's Webmaster Central Blog gives their view of the matter. As well as useful tips like making sure the necessary js and css files are accessible by the search engine bots, I'm happy to say that they do still recommend that your pages 'degrade gracefully', ie that users without the latest whizzy browser can still get at your important content.

Friday, 23 August 2013

Using canonical href to exclude duplicates from your xml sitemap

Here's the problem. Scrutiny is finding the same page on your site twice, each with a different url, and including both in your sitemap.
Duplicates in your xml sitemap may not be such a problem according to Google.

However, the same article explains that Google like to know which version of your url they should rank and which page they shouldn't.

The canonical href is the answer. Here is the explanation from Google, but in short, you need to insert a meta tag like this:


<link rel="canonical" href="http://peacockmedia.co.uk" />


This line means 'this is the url I'd like you to rank for this content'. (The page at the url given should obviously have the same content.)

From version 4.3, Scrutiny will pick up this canonical href. You'll find it listed in the SEO table but the column may be way over to the right, or you may need to switch it on in Preferences > Views. Note (as with any of the columns in Scrutiny)  if you're interested in this column you can move it by dragging:

 You can see in the screenshot above that the index page in this example now has the canonical href. After re-crawling the site, the problem at the start of this article has gone away. Scrutiny's Sitemap tab now only excludes pages where canonical (if present) doesn't match the url of the page. When I export my XML sitemap, only the http://peacockmedia.co.uk  version will be included.

Note that Scrutiny will exclude pages according to canonical href in version 4.3.1 and higher

Find duplicate content (same content, different url) in your website using Scrutiny

[updated 24 Jun 2019]

Duplicate content on your website is sometimes said to harm your search engine optimisation. It may not be such a serious problem as this article explains.

Here's how to check for duplicate content on your site using Scrutiny.  Integrity Pro also has this functionality and will look much the same.

1. First scan your site. If your site isn't already in the sites list, press 'New' and type your starting url. Press 'Next' to see the default settings, press 'Next' again to accept those settings. Press 'Go' beside 'Check for broken links'.


2. When the crawl has finished, switch to the SEO tab

3. Switch the 'Filter' to 'Pages with possible duplicates'

You'll now see possible duplicates in the main view. To see a list of the pages that Scrutiny thinks are duplicates for a particular url, double-click it to open the page inspector.


In this example, the problem has arisen because scrutiny has found links to the same page in two different forms - peacockmedia.co.uk/clipassist  and peacockmedia.co.uk/clipassist/index.html

Dealing with duplicates

If you want to deal with this problem, use canonical meta tag in your pages

Wednesday, 12 June 2013

Periodic Table Of SEO Success Factors

I've just seen this wonderful graphic showing the factors which influence your search engine ranking with an indication of how each is weighted (and which ones work against you).


For the large version complete with lots of explanation.

Scrutiny will be able to help you with many of these things.

Saturday, 16 February 2013

robots.txt - a cautionary tale

I was surprised the other day to see this:


I have to confess that I'd been working on robots.txt functionality in Scrutiny and had accidentally left the robots.txt file in place.

That's a cautionary tale, but not the one I wanted to tell. My file was very similar to this (with a couple of urls in the Google disallow list) - the screenshot is from www.robotstxt.org


I've ruled out a few possibilities but I see from Google's own developer help that the name of the Google robot is actually googlebot not Google. I'm assuming that this is the reason for my problem. If you can confirm this or know better, please leave a message in the comments below.

Monday, 7 January 2013

Making sure your site is fast and why it matters

I've just read this post at CopyBlogger which makes a compelling case for making sure that your web pages load as quickly as possible.

Load speed is a signal used by Google. But just as importantly, very small delays in load speed loses visitors.

Reactivity is free when downloaded from the web and tests your page, breaking it down and showing all the files it finds with load times for each (using a single thread for a more accurate measurement of each).

You'll easily spot bottlenecks - often third-party code or plugins can cause such delays and be difficult to track down.

http://peacockmedia.co.uk/reactivity/

Reactivity is a standalone component of Scrutiny which is a suite of website analysis tools.

http://peacockmedia.co.uk/scrutiny/

Wednesday, 26 September 2012

Panda and Penguin in plain English




Thank you to Tekdig for this very easy-to-understand guide to Google's Panda and Penguin updates.

Do read the article but in short, make sure your content is high-quality - don't fill a page with links or stuff it with keywords, don't have lots of inbound links using the same keyword or from pages which look like link farms.