Sunday 18 February 2018

Integrity v8 engine under development

[29 Aug 2021  This post is now old, but left here as a record of past developments. This version of the engine has been improved but is still current. The next generation is under development.]

The beta of link checker Integrity v8 is imminent, with Integrity Plus following closely behind and then website scrutineer Scrutiny.

[update 23 Apr 2018 - version 8 is now the full release in Integrity, Integrity Plus and the new Integrity Pro]

Obviously we've been looking forward to version 8 so that we can say 'V8 engine'. The biggest changes in the upcoming versions of Integrity and Scrutiny are in the engine rather than interface, so there's not a huge amount to see on the surface but there is a lot to say about it.

Changes to the way that the data is structured in memory

For 11 years, the way that the data has been structured in Integrity and its siblings has meant that for certain views, tables had to be built following the scan. And rebuilt when certain actions were taken (rechecking, marking as fixed). Not a big deal with small sites, but heavy on time and resources for very large sites. It also meant that certain information wasn't readily available (number of internal backlinks, for example) and had to be looked up or calculated either on the fly or as a background task after the scan. And I have to admit to some cases of instability, unresponsiveness or unexpected side effects caused over the years by that stuff going on in the background.

So we've invested a lot of time in something that isn't a killer new feature, but will make things run more smoothly, efficiently and reliably. Initial informal tests show improvements in speed and efficiency (memory use).

More information about your links

There have been a number of requests to add more information about the properties of a link , eg hreflang and more of the 'rel' attributes (it has been displaying the 'nofollow' status but not much else). HTML5 allows loads of allowable values in the 'rel' and if the one you want doesn't have its own yes/no column in the new apps then you can view the entire rel attribute. You can switch these columns on in any of the link tables as well as the one in the link inspector.


The information displayed in Scrutiny's SEO table will include more data about the pages, notably og: meta data which is important to a lot of users.

More information about redirects

Redirect information wasn't stored permanently, only the start and end of a redirect chain (the majority are a single step. But if there is more than one redirect, you'll want to know all the in-between details)


Better control over volume of requests

It's becoming increasingly necessary to apply some limits on the rate of the requests that Integrity / Scrutiny make.

Your server may keep responding no matter how hard you hit it. That's great; turn up those threads and watch Integrity / Scrutiny go through your site like a dose of salts.

But I'm seeing more cases of servers objecting to rapid requests from the same source. And the reason isn't always clear. They may return 429 (which means 'too many requests') or some other less obvious error response. Or they may just stop responding, perhaps a defensive measure.

If that's your site, you'll no longer have to turn down the threads and use trial and error with the delay field. You'll be able to simply enter a maximum number of requests per minute, while still using multiple threads for efficiency.

Generally a little more helpful

For a while, Integrity has automatically been re-trying certain bad links, in case the reason for the first unsuccessful attempt was temporary. v8 will built on this. For example, a 'too many http requests' is often caused when a website won't work without the user having cookies enabled (this is true, I'll be happy to discuss further with examples) and in these cases, the link will be re-tried with cookies enabled and this will usually then check out ok. In the case of 429 (too many requests) they'll be re-tried after a pause (if a 'Retry-After' header is sent by the server, the recommended delay will be made before retrying). The scan will be paused if too many are received and advice given to the user. On continue, the 429s will automatically be re-tried. Once again, enhancements that will be invisible to the user.

Why wasn't I informed about Integrity version 7?

Because it doesn't exist; Integrity will skip a version. v7 of Scrutiny was mostly about the interface. Integrity continued to use the classic interface and so remained v6. Integrity, Integrity Plus and Scrutiny (and a new app, Integrity Pro, you heard it here first) will use the v8 engine when it's ready and so for consistency all apps will be called v8.

Charge for an upgrade?  Price increase?

So in summary, as far as Scrutiny v8 and Integrity v8 are concerned, the changes won't be immediately visible but they'll do what they do more efficiently and with more information available.   They'll provide more information more efficiently in a smaller memory footprint.

There are no immediate plans for price increases, or for upgrade fees. Prices are of course always under review, but any future increases will be for new users, upgrading to 8 from Scrutiny 7 or Integrity 6 will be free.

Update

version 8 of Integrity, Integrity Plus and now also Integrity Pro are through beta and available in full.


SaveSave

2 comments:

  1. There's clearly more to server responses that you'd think. The cookie thing is just odd, but looks like you have some quite well thought out options to deal with these things.

    Congrats on V8, it looks like the product has come a long way.

    ReplyDelete
  2. It's true, there are some websites which perform a redirect to the same page in order to detect cookies being enabled, and will do that indefinitely until a cookie is detected. So turn cookies off in your browser and you'll get 'too many http redirects' reported. It's a terrible practice, accessibility guidelines say that websites should work without requiring cookies or scripts, but in the world of commerce, I guess these people aren't interested in you unless they can track you and present their site in the way they want.

    Having a tool which simply reports the status received isn't enough. "It works in my browser" is a common phrase on the support desk. And I suppose that's fair enough. But a request isn't just a request, there are all sorts of settings and header fields.

    Thanks!

    ReplyDelete