Thursday, 19 September 2019

New feature for Website Watchman and spin-off website archiving app

I wouldn't be without Website Watchman scanning the PeacockMedia website on schedule every day. As with Time Machine, I can dial up any day in the past year and browse the site as it appeared on that day. And export all files associated with any page if I need to.

It also optionally alerts you to changes to your own or anyone else's web page(s).

It took a lot of development time last year. I have (literally) a couple of reports of websites that it has trouble with but on the whole it works well and I think the tricks it does are useful.

Since it went on sale early this year, it has sold. But not in the numbers I'd hoped. Maybe it's just one of those apps that people don't know that they need until they use it.

Here's the interesting part. Of the support requests I've had, more have been on one question than any other. Frustrated emails asking how to make it export the entire site that they've just scanned.   What those people are after is an app which does a 'one-shot' scan of a website and then saves all the files locally. It's a reasonable question because WW's web page talks about archiving a website.

My stock answer is that WW is designed to do two specific things, neither of which is running a single scan and exporting the entire thing.  Like Time Machine, It does hold an archive, which it builds over time, and that's designed to be browsed within WW and any files that you need from a given date are recoverable.

There is of course quite a well-known and long-standing app which does suck and save an entire site. I've had conversations with my users about why they're not using that.

So knowing about those requirements, and owning software which already nearly does that task, the time seems right to go ahead.

I've mused for a while about whether to make an enhancement to Website Watchman so that it can export the files for an entire site (remember that it is designed to make multiple scans of the same website and build an archive with a time dimension, so this option must involve selection of a date if more than one scan has been made) or whether what people are after is a very simple 'one-trick' app which allows you to type a starting URL, press Go, choose a save location and job done.

I've decided to do both. The new simple app will obviously be much cheaper and do one thing. Website Watchman will have the new full-site export added to its feature list. I'm using WebArch for the new app as a working title which may stick.

So where are these new things?

It's been more than a week since writing the news above. Even though we already had toes in this water with the archive functionality (such as it is) in Integrity Plus and Pro and Scrutiny, it turns out that it's no easy thing to make a local copy of a website browsable. In some cases it's easy and the site just works. But each site we tested was like opening a new Pandora's box, with new things that we hadn't taken into account. It's been an intense time but I'm really happy with the stage it's at now. I'd like to hear from users in cases where the local copy isn't browsable or problems are seen.

The new functionality is in Website Watchman from version 2.5.0. There's a button above the archive browser, and a menu item with keyboard shortcut under the File menu.

The new simple app will follow shortly.

Friday, 19 July 2019

Migrating to a secure (https://) website using Scrutiny 9

Yesterday I moved another website to https:// and thought I'd take the opportunity to make an updated version of this article. Scrutiny 9 has just been launched.

Google have long been pushing the move to https. Browsers now display an "insecure" message if your site isn't https://

Once the certificate is installed (which I won't go into) then you must weed out links to your http:// pages and pages that have 'mixed' or 'insecure' content, ie references to images, css, js and other files which are http://.

Scrutiny makes it easy to find these.

1. Find links to http pages and pages with insecure content.

First you have to make sure that you're giving your https:// address as your starting url, and make sure that these two boxes are ticked in your settings,

and these boxes ticked in your Preferences,

After running a scan, Scrutiny will offer to show you these issues. If you started at an https:// url, and you had the above boxes checked, then you'll automatically see this box (if there are any issues).
You'll have to fix-and-rescan until there's nothing reported. (When you make certain fixes, that may reveal new pages to Scrutiny for testing).

2. Fix broken links and images

Once those are fixed, there may be some broken links and broken images to fix too (I was copying stuff onto a new server and chose to only copy what was needed. There are inevitably things that you miss...) Scrutiny will report these and make them easy to find.

3. Submit to Google.

Scrutiny can also generate the xml sitemap for you, listing your new pages (and images and pdf files too if you want).

Apparently Google treats the https:// version of your site as a separate 'property' in its Search Console (was Google Webmaster Tools). So you'll have to add the https:// site as a new property and upload the new sitemap.

[update 15 Jul] I uploaded my sitemap on Jul 13, it was processed on Jul 14.

4. Redirect

As part of the migration process, Google recommends that you then "Redirect your users and search engines to the HTTPS page or resource with server-side 301 HTTP redirects"  (full article here)





Sunday, 7 July 2019

Press Release - Integrity Pro v9 released

Integrity Pro version 9 is now fully released. It is a free update for existing licence holders.

The major new features are as follows:
  • Improved Link and Page inspectors. New tabs on the link inspector show all of a url's redirects and any warnings that were logged during the scan.

  • Warnings. A variety of things may now be logged during the scan. For example, a redirect chain or certain problems discovered with the html. If there are any such issues, they'll be highlighted in orange in the links views, and the details will be listed on the new Warnings tab of the Link Inspector.
  • Rechecking. This is an important part of your workflow. Check, fix, re-check. You may have 'fixed' a link by removing it from a page, or by editing the target url. In these cases, simply re-checking the url that Integrity reported as bad will not help. It's necessary to re-check the page that the link appeared on. Now you can ask Integrity to recheck a url, or the page that the url appeared on. And in either case, you can select multiple items before choosing the re-check command.
  • Internal changes. There are some important changes to the internal flow which will eliminate certain false positives.


More general information about Integrity Pro is here:
https://peacockmedia.software/mac/integrity-pro/

Friday, 5 July 2019

Two Mac bundles: Web Maestro Bundle and Web Virtuoso Bundle

I recently answered a question about the overlap with some of our apps. The customer wanted to know which apps he needed in order to possess all of the functionality.

For example, Website Watchman goes much further with its archiving functionality than Integrity and Scrutiny. But Webscraper entirely contains the crawling and markdown conversion of HTML2MD

The answer was that he'd need three apps. It was clear that there should be a bundle option. So here it is.

There are two bundles, one containing Integrity Pro, which crawls a website checking for  broken links, SEO issues, spelling and generates XML sitemap.  The alternative bundle contains Scrutiny which has many advanced features over Integrity Pro, such as scheduling, js rendering.

These are the bundles.

Web Maestro Bundle:

Scrutiny: Link check, SEO checks, Spelling, Searching, Advanced features
Website Watchman: Monitor, Archive. Time Machine for your website
Webscraper: Extract and Convert data or entrire content. Extract content as html, markdown or plain text. Extract data from spans, divs etc using classes or ids. Or apply a Regex to the pages.


Web Virtuoso Bundle:

Integrity Pro: Link check, SEO checks, Spelling
Website Watchman: Monitor, Archive. Time Machine for your website
Webscraper: Extract and Convert data or entrire content. Extract content as html, markdown or plain text. Extract data from spans, divs etc using classes or ids. Or apply a Regex to the pages.


The link for accessing these bundles is:
https://peacockmedia.software/#bundles

If you're interested in an affiliate scheme which allows you to promote and earn from these bundles and the separate products, this is the sign-up form:
https://a.paddle.com/join/program/198


Wednesday, 29 May 2019

Press Release - Scrutiny 9 Launched

Scrutiny version 9 is now fully released. It is a free update for v7 or v8 licence holders. There is a small upgrade fee for holders of a v5 / v6 licence. Details here: https://peacockmedia.software/mac/scrutiny/upgrade.html

The major new features are as follows:
  • Improved Link and Page inspectors. New tabs on the link inspector show all of a url's redirects and any warnings that were logged during the scan.

  • Warnings. A variety of things may now be logged during the scan. For example, a redirect chain or certain problems discovered with the html. If there are any such issues, they'll be highlighted in orange in the links views, and the details will be listed on the new Warnings tab of the Link Inspector.
  • Rechecking. This is an important part of your workflow. Check, fix, re-check. You may have 'fixed' a link by removing it from a page, or by editing the target url. In these cases, simply re-checking the url that Scrutiny reported as bad will not help. It's necessary to re-check the page that the link appeared on. Now you can ask Scrutiny to recheck a url, or the page that the url appeared on. And in either case, you can select multiple items before choosing the re-check command.
  • Internal changes. There are some important changes to the internal flow which will eliminate certain false positives.
  • Reporting. The summary of the 'full report' is customisable (in case you're checking a customer's site and want to add your own branding to the report). You now have more choice over which tables you include in csv format with that summary.





A full and detailed run-down of version 9's new features is here:
https://blog.peacockmedia.software/2019/05/scrutiny-version-9-preview-and-run-down.html

More general information about Scrutiny is here:
https://peacockmedia.software/mac/scrutiny/

Saturday, 18 May 2019

Scrutiny version 9, a preview and run-down of new features

Like version 8, version 9 doesn't have any dramatic changes in the interface, so it'll be free for existing v7 or v8 licence holders. It'll remain a on-off purchase but there may be a small increase in price for new customers or upgraders from v6 or earlier.

But that's not to say that there aren't some important changes going on, which I'll outline here.

All of this applies to Integrity too, although the release of Scrutiny 9 will come first.

Inspectors and warnings

The biggest UI change is in the link inspector. It puts the list of redirects (if there are any) on a tab rather than a sheet, so the information is more obvious.  There is also a new 'Warnings' tab.
Traditionally in Integrity and Scrutiny, a link coloured orange means a warning, and in the past this meant only one thing - a redirect (which some users don't want to see, which is OK, there's an option to switch redirect warnings off.)

Now the orange warning could mean one or more of a number of things. While scanning, the engine may encounter things which may not be showstoppers but which the user might be grateful to know about. There hasn't been a place for such information. In version 9, these things are displayed in the Warnings tab and the link appears orange in the table if there are any warnings (including redirects, unless you have that switched off.)

Examples of things you may be warned about include more than one canonical tag on the target page, unterminated or improperly-terminated script tags or comments (ie <!--   with no  --> which could be interpreted as commenting out the rest of the page, though browsers usually seem to ignore the opening tag if there's no closing one). Redirect chains also appear in the warnings. The threshold for a chain can now be set in Preferences. Redirect chains were previously visible in the SEO results with a hard-coded threshold of 3.  

The number of things Scrutiny can alert you about in the warnings tab will increase in the future.

Strictly-speaking there's an important distinction between link properties and page properties. A link has link text, a target url and other attributes such as rel-nofollow.  A page is the file that your browser loads, it contains links to other pages.

A 'page inspector' has long been available in Scrutiny. It shows a large amount of information; meta data, headings, word count, the number of links on the page, the number of links *to* that page and more. A lot of this is of course visible in the SEO table.

Whether you see the link inspector or the page inspector depends on the context (for example, the SEO table is concerned with pages rather than links, so a double-click opens the page inspector.) But when viewing the properties of a link, you may want to see the page properties. (In the context of a link, you may think about the parent page or the target page, but the target may more usually come to mind). So it's now possible to see some basic information about the target on the 'target' tab of the link inspector and you can press a button to open the full page inspector.

[update 26 May] The page inspector has always had fields for the number of inbound links and outbound links. Now it has sortable tables showing the inbound and outbound links for the page being inspected:

Rechecking

This is an important function that Integrity / Scrutiny  have never done very well.

You run a large scan, you fix some stuff, and want to re-check. But you only want to re-check the things that were wrong during the first scan, not spend hours running another full scan.

Re-checking functionality has previously been limited. The interface has been changed quite recently to make it clear that you'll "Recheck this url", ie the url that the app found during the first scan.

Half of the time, possibly most of the time, you'll have 'fixed' the link by editing its target, or removing it from the page entirely.

The only way to handle this is to re-check the page(s) that your fixed link appears on. This apparently simple-sounding thing is by far the most complex and difficult of the v9 changes.

Version 9 still has the simple 're-check this url' available from the link inspector and various context menus (and can be used after a multiple selection). It also now has 'recheck the page this link appears on'. (also can be used with a multiple selection).

Reporting

This is another important area that has had less than its fair share of development in the past. Starting to offer services ourselves has prompted some improvements here.

Over time, simple functionality is superseded by better options. "On finish save bad links / SEO as csv" are no longer needed and have now gone because "On finish save report" does those things and more. This led to confusion when they all existed together, particularly for those users who like to switch everything on with abandon. Checking all the boxes would lead to the same csvs being saved multiple times and sometimes multiple save dialogs at the end of the scan.

The 'On finish' section of the settings now looks like this. If you want things saved automatically after a scheduled scan or manual 'scan with actions' then switch on 'Save report' and then choose exactly what you want included.

Reduced false positives

The remaining v9 changes are invisible. One of those is a fundamental change to the program flow. Previously link urls were 'unencoded' for storage / display and encoded for testing. In theory this should be fine, but I've seen some examples via the support desk where it's not fine. In one case a redirect was in place which redirected a version of the url containing a percent-encoding, but not the identical url without the percent-encoding. The character wasn't one that you'd usually encode. This unusual example shows that as a matter of principle, a crawler ought to store the url exactly as found on the page and use it exactly as found when making the http request. 'Unencoding' should only be done for display purposes.

When you read that, it'll seem obvious that a crawler should work that way. But it's the kind of decision you make early on in an app's development, and tend to work with as time goes on rather than spending a lot of time making fundamental changes to your app and risking breaking other things that work perfectly well.

Anyhow, that work is done,  it'll affect very few people, but for those people it'll reduce those false positives. (or 'true negatives' whichever way you want to look at a link reported bad that is actually fine.)

[update 23 May 2019] I take it back, possibly the hardest thing to do was to add the sftp option (v9 will allow   ftp / ftp with TLS (aka ftps) / sftp)  for when automatically ftp'ing sitemap or checking for orphaned pages.

update 10 Jun: A signed and notarized build of 9.0.3 is available. Please feel free to run it if you're prepared to contact us with any problems you notice. Please wait a while if you're a production user - ie if you already use Scrutiny and rely on it.

update 19 Jun: v9 is now the main release, with 8.4.1 still available for download.

Thursday, 25 April 2019

Apple's notarization service

A big change that's been happening quietly in MacOS is Apple's Notarization service.

Ever since the App Store opened and was the only place to obtain software for the iPhone ('jailbreaking' excepted), I've been waiting for the sun to set on being able to download and install Mac apps from the web. Which is the core of my business. (Mac App Store sales amount to a small proportion of my income. That balance is fine, because Apple take a whopping 1/3 of the selling price).

Notarization is a step in that direction, although it still leaves developers free to distribute outside the app store. It means that Apple examine the app for malware. At this point they can't reject your app for any reason other than the malware search. They do specify 'hardened runtime' which is a tighter security constraint but I've  not found this to restrict functionality, as the Sandboxing requirement did when the App Store opened.

When the notarization service started last year, it was optional. Now Gatekeeper gives a more preferable message when an app is notarized, and it looks as if 10.15's Gatekeeper will refuse to install apps that haven't been notarized.

It's easy to feel threatened by this and imagine a future where Apple are vetting everything in the same way they do for the app store. For users that's a great thing, it guarantees them a certain standard of quality in any app they may be interested in. As a developer it feels like a constraint on my freedom to build and publish.

It genuinely seems geared towards reducing malware on the Mac. "This is a good thing" says John Martellaro in his column.

https://www.macobserver.com/columns-opinions/editorial/notarization-apple-greatly-reduce-malware-on-macs/?utm_source=macobserver&utm_medium=rss&utm_campaign=rss_everything