Thursday 31 May 2018

Options for archiving a website

Integrity / Scrutiny

Integrity (and Integrity Plus, Pro and Scrutiny) has long had an 'archive' option. It can save the html as it scans, originally with no frills at all. Recently I+, Pro and Scrutiny have received enhancement which mean that they can process the information a little to create a browsable archive.

It stops short of being a full 'Sitesucker' - it doesn't save images, for example, or download the style sheets etc. (It makes sure that all links and references are absolute, so that the site still appears as it should.) It was always intended as a snapshot of the site, automatically collected as you link-check, for the purposes of reference or evidence.

WebScraper

WebScraper for mac has loads of options and therefore it's not just 'enter a homepage url and press Go' like the other apps mentioned here. So it does allow you to do much more. You have much more control over what information you want in your output file, what format you want that in, and whether you want the content converted to plain text / markdown / html.

HTMLtoMD 

HTMLtoMD was a side project built using various functionality we'd developed in other apps. It scans a whole site and archives the content as Markdown. Once working, we released it for free and put it on the back burner.

Recently it's received more development. It's now up-to-date with the Integrity v8 engine, and has received some improvements to the markdown conversion via WebScraper. It can now save images and has more options for saving the information.

Again, it's not a Sitesucker. If you need to download a website for saving or browsing offline then use SiteSucker ($4.99), it's pointless us trying to reinvent that wheel.

But markdown has its advantages. It's a much more efficient way to store your content. It's just text with a little bit of markup (headings, lists etc). That also means that it's very transportable.

You may also find it a very readable format. See the shots below.

No comments:

Post a Comment