Friday 30 December 2016

Full release of Webscraper

WebScraper, our utility for crawling a site and extracting data or archiving content, is now out of beta.

There have been some serious enhancements over recent months, such as the ability to 'whitelist' (only crawl) pages containing a search term, the ability to extract multiple classes / id's (as separate fields in the output file) and a class/id helper which allows you to visually choose the divs or spans for extraction.

Now that the earlier beSta is about to expire, it's time to make all of this a full release. The price is a mere 5 US Dollars, for a licence which doesn't expire. The trial period is 30 days and the only limitation is that the output file has a limited number of rows so that you can still evaluate its output.

Find out more and download the app here, and if you try it and have any questions or requests, there's a support form here.

Monday 21 November 2016

Scrutiny for Mac Black Friday discount


50% off Scrutiny

For Black Friday


If you're a user of Integrity or Integrity Plus, or have trialled Scrutiny, we hope that this discount will help you to make the decision to add Scrutiny for Mac to your armoury. Exp 28-11-2016

Simply use the code:  5E861DD0

Look for the 'add coupon' link during the purchase process, either in-app, or using this secure link:
https://pay.paddle.com/checkout/494001

The code will work for the next week or so, please feel free to share the discount code, or use it to buy more licences if you have multiple users.

Monday 31 October 2016

Webscraper from PeacockMedia - usage

[updated 23 Apr 2018 for version 4]
[reviewed 29 Aug 2021]

I've had one or two questions about using WebScraperThere's a short demo video here  but if, like me, you prefer to cast your eye over some text and images rather than sit through a video, then here you go:

1. Type your website address (or starting url for the scan). Like Integrity / Scrutiny (Webscraper uses the same engine) the crawl will be limited to any 'directory' implied in the url.

2. Configure your output. If it's a single piece of information you want to extract from each page, you can use the Simple Setup. If you want to set up a number of columns, use the Complex Setup. Toggle between these two options below the address bar.



You must configure your output file before scanning, and then the app crawls your site, collecting the data as it goes. This is more efficient than the way that the first version of Webscraper worked but it does mean that if you want to change the configuration of your output file, you'll need to re-scan.

If you choose 'Complex setup' you'll need to configure your output file here. When you add a  column you can choose  basic metadata (title, description etc), a class or id, a regular expression (regex) or content (as plain text, html, markdown or an outline).



3. Test or run. You'll be able to either begin the scan, or run a short test. The 'Run test' button will perform a very short scan of a few pages and present your output as a quickview. If all looks well, you can press Go, or if you need to make changes, you can head back to the Output file configuration.

4. When the scan is complete, the Results tab will open. You can export this using the export button above the table. It uses the options you set in the 'Output file format' tab. 

Note that the Save Project option from the File Menu will only save your setup, not the scan data.



A common scenario is that the data you want isn't defined by a unique class or id. In these cases a regular expression can be used, there's a detailed tutorial here.

Sunday 30 October 2016

A sneak peek at some of the new features of Scrutiny v7

Scrutiny v7 is still very much in progress and being shaped, but here's a sneak preview, in case you'd like to feed back or make suggestions.

I demonstrate in the video:

 - document based (multiple windows open at once)
 - organise your websites into folders
 - simpler navigation
 - full autosave (view data for any site you've scanned previously)



[update] Version 7 is now well-established and has proved very popular. It's available here, and details about the very reasonable upgrade are also on the page.
http://peacockmedia.software/mac/scrutiny/

SaveSave

Friday 28 October 2016

Affiliate scheme :: Scrutiny, Integrity and other PeacockMedia apps

My licensing and payment partner is Paddle. We have an affiliate system, with affiliates earning from the sales they generate.
It's simple; you register as an affiliate, you'll have access to some general links and you'll be notified of any special offers. You post that link on a review / blog / article and earn money when people click through and purchase.

You get 20% commission on the sale, and a monthly payout by Paypal or bank transfer.

If you'd like to jump on board then here's the registration form.

Once you have your account set up with Paddle, this is where you'll find your links.

Friday 2 September 2016

Scrutiny v7 - closer!

The new version of Scrutiny for MacOS has now made it off the scraps of paper and as far as a working prototype (as far as the UI is concerned, which is where the major changes are).

The new features are:

  • Organise your sites into folders, with drag and drop to move them around (above)
  • Next and Previous buttons are gone; navigate by simply clicking what you want 
  • A new breadcrumb widget (top-left in the screenshots) allows you to navigate as well as giving a sense of location
  • The growing list of site-specific settings are organised into tabs. These had become so disorganised and ugly with at least two dialogs (advanced and schedules) accessed via buttons
  • Scrutiny becomes document based meaning as many windows open as you like showing different sites (just cmd-N or File-New to open a new window). Make multiple simultaneous scans
  • This also makes better handling of data, with windows remembering their state and their data (if autosave switched on)
  • Improved flow - From the summary / settings screen choose to make a new scan, view existing data (if available) or load data. Only after the scan do you choose which results you want to view. 


There will be few changes to the link checking, SEO check, sitemap and other functionality.

If you would like to have a click around this prototype and feed back, please just ask.
SaveSave

Saturday 23 July 2016

Flic button support in LIFXstyle and Hue-topia

Although it seems slightly ironic to install smart bulbs and then buy a physical button to switch them on/off, a 'smart button' is something I've been searching for for some time.

A LIFXstyle user asked me about support for Flic buttons and they looked just like what I've been looking for. And they've lived up to the promise. There's nothing for scale in the picture below. The buttons are pretty small, around an inch in diameter. They have a small battery within, which is said to have a life of 5 years and then is replaceable. They come with a fabulous sticky back and can be stuck onto a surface, pulled off and stuck to something else apparently ad infinitum (the sticky surface is washable). There's an optional clip which can be fitted if you want to wear the button.

(Let me say right out that I've had to buy my buttons like everyone else, and I'm not earning any kind of commission).

I should also point out that Flic buttons can work Hue and LIFX bulbs without my software, but maybe you're like me and would rather not be connecting your buttons and your bulbs to cloud services.

They've turned out to be very easy to use and integrate into my apps. Support on a Mac involves running the Flic Service which is currently beta.

There's now a window within LIFXstyle and Hue-topia which allows you to add and configure your buttons.

The above window is called up using the menu item View > Flic Manager (cmd-4)
Note also the Menu item LIFXstyle > Connect to Flic Service / Disconnect from Flic Service. Connection should be made automatically, but you can try toggling this if you have problems.
If your button isn't in the list, use the 'Scan' button and press your Flic button. It should appear in the Flic Manager list if it's discoverable. (Your computer may also ask for permission to pair if you've not already paired that button with the computer.) Stop scanning and check the box to tell LIFXstyle/Hue-topia you want to connect to it.
For each button in the list, choose presets (or 'All on / All off') for each action (single, double-click, hold)
Edit your button's name. If you use the keywords white, black, green, blue or yellow in the name (eg 'My black button #1') then the 'ready' icon will appear in the right colour.

The Flic service requires MacOS 10.10 or higher. Without using the Flic functionality, LIFXstyle should run on 10.9 or higher, Hue-topia should run on 10.8 or above.


All of this is available in v2.0 beta of both LIFXstyle and Hue-topia. At present these aren't available for download on the site, but please contact me if you'd like to help test either.

SaveSave

Wednesday 8 June 2016

Retro button style in OSX cocoa


While developing a 'breadcrumb' or NSPathControl type class, I created some buttons programmatically -  initially very quick and dirty;  [[NSButton alloc] initWithFrame] and not bothering to set anything.

This may be the first time I've ever done that because I wasn't prepared for what happened:
It takes me back to Amiga Workbench, and some interesting times that I had to use Windows at various day jobs.

It makes sense; I haven't set any button type or border style. But I would have thought I'd get the basic push button (the ones you get for OK and Cancel in alerts etc).

This isn't a complaint at all, I'm just bemused to see OSX drawing these retro-looking buttons.

[Update]  it's not looking bad...

Tuesday 7 June 2016

Scrutiny v7

Scrutiny v7 is in the 'thinking' stage.



Current plans involve a more streamlined UI, organising sites into folders, a combined 'summary / tasks' screen following the site selection (which offers the 'settings' as a choice rather than being presented with them every time). Breadcrumb widget rather than the next / previous system.

If you're a v5 or v6 user with any thoughts on this, I'd love your input - please get in touch.

Saturday 30 April 2016

New Project - WebScraper for OSX

WebScraper application icon - an earth scraper I love starting new things. This project uses the Integrity V6 Engine for the crawling which means that I could get right on and build the output functionality.

I noticed that this is something people have been trying to use Scrutiny's search functionality to achieve. Scrutiny will report which pages contain (or don't contain) your term in the text or the entire code. And you can export results to csv and choose columns.

But Scrutiny (currently) can't extract data from particular css classes or ids.

This is where WebScraper comes in. It quickly scan a website, and can output the data (currently) as csv or json. (Anyone want xml?) The output can include various meta data (more choices to be added), the entire content of each page (as text, html or markdown) and can extract parts of the pages (currently a named class or id of divs or spans).

Webscraper is new and in beta. Please use it for free and please get in touch with any requests, bug reports or observations.

http://peacockmedia.software/mac/webscraper/




There's a short demo video here

Tuesday 26 April 2016

OSX uses 1000 bytes = 1kb

I've just noticed that OSX is reporting its file sizes using 1000 bytes = 1kb (incorrectly IMHO, 1kb = 1024 bytes which is a nice round number in binary and hex)


Apparently this has been the case since Snow Leopard, never noticed.

Thursday 21 April 2016

Reselling opportunity - Scrutiny for Mac



Scrutiny is a suite of webmaster tools for Mac.

It extends the link checkers Integrity and Integrity Plus, adding SEO / scraping functionality, spelling and grammar checking (in a choice of languages), sitemap visualisation, page speed analysis, site sucking, scanning sites that require authentication, and much more.

Scrutiny retails for a one-off $95, and I believe it is a serious competitor for other tools that are more expensive or have ongoing charges.

It's a native (ie not Java) desktop app (ie not online). You install the app on your Mac, enter the licence key. You own it and it will always work. (There may be fees for upgrades but I've rarely done this and it would be a major new version.) We offer good support and that's free.

If you're interested in re-selling Scrutiny (you offer the product at a discounted price, and keep a generous share of the sale price) then please contact me.





Tuesday 12 April 2016

Easy clipboard sharing added to ClipAssist


Clipassist has had a bit of a makeover, it's now easier to manage those standard clips of text and organise them into folders. But it has a much more exciting new feature.

If, like me you like to run more than one mac then you very quickly feel the need to access a snippet of text on a different computer.

There are ways to achieve this, such as creating a text file and saving it to the other computer. As straightforward as that is, it's overkill if you have just an email address, so you type it on the other computer and risk a typo.

Copy and paste (cmd-C and cmd-V) are so fundamental to computing. We do it without thinking.

So why  shouldn't it be simple to just copy on one mac and paste on another? In fact in the past I've attempted to do this without realising I've switched computers.

I've found a way to share your 'copy' with other Macs on the local network, ready to just 'paste' at the other keyboard. Without any messy connecting / logging in.

Simply run ClipAssist v4 or higher on all Macs, set the send/receive preferences as appropriate and you're ready to go.
This applies to anything you copy to the general clipboard, not just within ClipAssist.

10.6 and upwards are supported, in line with our policy of supporting other fans of the beautiful Snow Leopard.

This is all new. You're welcome to download v4 and try it (please let me know how you get on) under the caveat that this is still beta.

http://peacockmedia.software/mac/clipassist/




Sunday 3 April 2016

New theme in Screensleeves



Welcome to the new 'Ambience' theme.

It goes against the idea of a screensaver a little to have a static coloured background (I've been toying with the idea of adding a little 'Ken Burns' to the background, but this would have little benefit if the album cover is fairly homogenous in colour.)

But there is a fade to black and repositioning of the album cover and details every 30 seconds, which existing users will be familiar with. And if you're listening to individual tracks rather than complete albums or audiobooks, then there's no problem as the coloured background will change significantly every few minutes. Plus as now, when the music stops you either get a black screen or random artwork (whichever you choose) so no problem there.

Above all, it just looks so cool. I'm so excited about it and love using it.

[Update] This theme is now available in Screensleeves Pro v5.1.  You can download and use the Pro version without paying, just certain Pro features will be disabled before you obtain a key.

Sunday 27 March 2016

Very cool new feature in ScreenSleeves

I have a Mac which is pretty much dedicated to playing my music; I regularly use iTunes (actually now more often Swinsian), Spotify and occasionally Radium for a particular internet radio station I like.



However, that mac doesn't always have a screen attached (I often use screen sharing to control the music). In any case, it's not the screen I like to have running the Screensleeves screensaver.

This problem has been on my mind for a long time. What's required is an elegant way of sharing what's playing on another computer. Without any messy setting up or messing around with passwords every time a computer is restarted or just decides it wants to be awkward.

And here it is. It's called Screensleeves Broadcaster. It's included in the Screensleeves Pro dmg, and you just need to pop it in the Applications folder of the computer that plays your music, start it (you may like to add it to your login items so it's just running whenever your computer is on).

Then install version 5 of the Screensleeves screensaver (obviously on the computer that you want to be displaying the saver). Go into its options and switch on "Listen for the Screensleeves Broadcaster" (under the Pro options). And that's it.

Version 5 of Screensleeves has various fixes / improvements, support for the Broadcaster, and it also restores support for 10.6 (Snow Leopard) which has been broken in recent versions.

Update: v5.0.1 Pro is now released, and the Broadcaster is officially released too.


Thursday 10 March 2016

A 500 server error in Scrutiny / Integrity for a page which is apparently fine

We recently looked at a support call where a large number of 500 errors were being reported - that's lots of red in Integrity or Scrutiny - for a site that appeared fine in the browser.

It caused some head-scratching here and lots of experimentation to find the difference between the request the browser was sending and the one Integrity was sending (cookies? javascript? request header fields?)

It was while investigating the request being sent by the browser, we noticed that although the page appeared as expected in the browser, the server was in fact returning a 500 code there too: (this is from Safari's web console)


It's a little odd that the requested page follows hot on the heels of this 500 response code. I don't know the reason for all of this, but if my user finds out and passes it on, I'll update this post.

The moral of the story is... don't take it for granted that if a page appears as expected in the browser, that nothing's wrong. (NB Google will also receive a 500 code when requesting this page.) Another good reason for using a scanning tool.

Wednesday 2 March 2016

testing linked files - css, javascript, favicons

This feature has been a very long time coming. Website link tester Integrity reaches back to 2007, Integrity Plus and Scrutiny build on it, using the same engine.

But none of these crawling apps have ever found and checked linked external files such as style sheets, favicons and javascript. (This isn't entirely true - Scrutiny's 'page analysis' feature which tests the responsiveness of all elements of a page does include these linked files).


So this is a well-overdue feature and now it's built into our v6 engine and can be rolled out into Integrity, Integrity Plus, Scrutiny and other apps which use the same engine.

As you can see in the top screenshot, the new checkbox sits nicely beside the 'broken images' switch (which has existed for a very long time). The option can be set 'per-site' (except for Integrity, which doesn't handle multiple sites / settings)

With that option checked, linked files should be listed with your link results (obviously there's no link text, that's given as '[linked file]').


This feature is in beta.

[update: the beta version of all three apps containing this new feature is available for download on the app's home page]

Saturday 27 February 2016

Finding http links on an https website - Part 2

Since writing this post, and given that secure (https) websites are becoming more popular, Scrutiny can now specifically look out for links to the http version of your site, alert you to the fact if there are any, and offer full details.

This new behaviour is all switched on by default, but there are two new checkboxes in Preferences

Taken separately, that first checkbox is quite important because if you're scanning an https website, you probably don't want http versions of your pages included in your xml sitemap.

All of this is in Scrutiny v6.4 which will be released as beta soon. If you're interested in checking it out, please just ask.

Wednesday 24 February 2016

Finding http links on an https website

A couple of Scrutiny support calls have recently been along the lines "Why is your tool reporting a number of http links on my site? All internal links are https://  Is this a bug?"

In both cases, an internal link did exist on the site with the http scheme. Scrutiny treats this link as internal (as long as it has the same domain) follows it, and then all relative links will of course have the http scheme as well.

[Update - since writing this post, new functionality has been added to Scrutiny - read about that here]

I'm thinking about three things:

1. The 'Locate' function is ideal for tracing the rogue link that shunts Scrutiny (and a real user of course) over to the http site. In the shot below we can see where that happened (ringed) and so it's easy to see the offending link url, the link text and the page it appears on. Does this useful feature need to be easier to find?



2. Does a user expect that when they start at a https:// url, that an http:// link would be considered internal (and followed) or external (and not followed) ? Should this be a preference? (Possibly not needed as it's simple to add a rule that says 'do not check urls containing http://www.mysite.com)

3. Should Scrutiny alert the user if they start at an https:// url and an http:// version is found while scanning? After all, this is at the heart of the problem described above; the users assumed that all links were https:// and it wasn't obvious why they had a number of http:// links in their results.

Any thoughts welcome; email me or use the comments below.

Tuesday 23 February 2016

Important new feature for those attempting to crawl a website with authentication

Scanning a website as an authenticated user is a common reason for people turning to Scrutiny.

The process necessarily involves some trial and error to get things set up properly, because different websites use different methods of authorisation and sometimes have unusual security features.

Scrutiny now has an important new feature. Some login forms use a 'security token'. I'm not going to go into details (I wouldn't want to deprive my competitors of the exasperations that I've just been through!)



There's a simple checkbox to switch this feature on (available since Scrutiny v6.4), and this may enable Scrutiny to crawl websites that have been uncooperative so far. (This may well apply to websites that have been built using Expression Engine).

All the information I have about setting Scrutiny up to scan your site (or member-only pages etc) which requires authentication is here.

Version  6.4 is in beta as I write this, if you're interested in trying it, please just ask.

Small print: Note that some care and precautions (and a good backup) are required because scanning a website as an authenticated user can affect your website. Yes, really! Use the credentials of a user with read access, not author, editor or administrator.

Sunday 14 February 2016

Scrutinize your website - Scrutiny for Mac 50% offer for the rest of Feb

As flagged up in a recent mailing to those on the email list, A 50% offer will be running on Scrutiny for Mac for the rest of the month.  The app is used by larger organisations and individuals alike. It seems fair to give the smaller businesses the opportunity to buy at a more affordable price.

Recent enhancements include 'live view' (shown below) and improved 'site sucking' page archiving while scanning.

So here it is - for 50% discount for the rest of February, please use this coupon.

693CFA08

This isn't a key, click 'buy' when the licensing window appears and look for the link that says 'check out with coupon and use the code above for a 50% discount.

Tuesday 9 February 2016

Improved archiving functionality in Scrutiny

I hadn't appreciated What a complex job sitesucker-type applications do.

In the very early days of the web (when you paid for the time connected) I'd use SiteSucker to download an entire website and then go offline to browse it.

But there are still reasons why you might want to archive a site; for backup or for potential evidence of a site's contents at a particular time, for two examples.



Integrity and Scrutiny have always had the option to 'archive pages while crawling'. That's very easy to do - they're pulling in the source for the page in order to scan it, why not just save that file to a directory as it goes.

Although the file then exists as a record of that page, viewing in a browser often isn't successful; links to stylesheets and images may be relative, and if you click a link it'll either be relative and not work at all, or absolute and whisk you off to the live site.

Processing that file and fixing all these issues, plus reproducing the site's directory structure, is no mean feat, but now Scrutiny offers it as an option. As from Scrutiny 6.3, the option to process (convert) archived pages is in the Save dialogue that appears when the scan finishes. Along with another option (a requested enhancement) to just go ahead and always save in the same place without showing the dialogue each time. These options are also available via an 'options' button beside the 'Archive' checkbox.


When some further enhancements are made, it'll be available in Integrity and Integrity Plus too.

Tuesday 26 January 2016

Integrity and Scrutiny displaying a 200 code for a url that doesn't exist

This problem is specific to the user (ie someone else somewhere else may correctly get an error reported for the same url). When pasted into the browser (or visited from within Integrity or Scrutiny) a page is shown, branded with the internet provider's logo, with a search box and maybe some advertising. 

What's happening? 

The user's internet service provider is recognising that the server being requested doesn't exist, and is 'helpfully' displaying something it considers more useful. My own provider says (quote from their website) "this service is provided free and is designed to enhance the surfing experience by reducing the frustration caused by error pages".

(Note the advertising - your provider is making money out of this.)

The content of the page they provide is neither helpful nor unhelpful, but the 200 code they return with the page is decidedly unhelpful when we're trying to crawl a website and find problems. A web crawler like Integrity or Scrutiny can only know that there's a problem with a link by the server response code.

Personally I think this practice is wrong. If you request a url where the server doesn't exist, it's incorrect to be shown a page with a 200 code.

This is similar to a soft 404 because a 200 is being returned when a request is sent for a page that doesn't exist. I'm tempted to call this a 'soft 5xx' because 5 codes are server errors, although in this case, if there is no server, then we can't have a server response code.

What can we do?

I now know of two providers that offer to switch this service off. Do some digging, your provider may have a web page that allows you to switch this preference yourself. If not, contact them and ask them to switch it off. Integrity / Scrutiny will then behave as expected.

If that fails, then you can use Integrity / Scrutiny's 'soft 404' feature. (Preferences > Links) Find some unique text on the error page (maybe in the page title) and type part or all of that text into this box:


The problem urls will then be reported with a 'soft 404' status which is better than the 200.

Saturday 23 January 2016

What happened to 'recheck bad links' in Integrity / Scrutiny?

Lots of people have missed this feature. 

Unfortunately it was always problematic. Besides bugs and problems with the actual implementation there were some more logical problems too. One example is if a user has 'fixed' an outdated link by removing it from the page. Scrutiny would simply re-test the url it has and continue to report it as a bad link. The fix for this is convoluted. Given that the user may have altered anything on the site since the scan, it's slightly flawed in principle to simply re-check urls that Scrutiny is holding.

There's often a more specific reason for wanting the feature rather than a broad-brush 'recheck all bad links'. For example, the server may have given a few 50x's at a particular point and the user just wants to re-check those. Or the user has fixed a specific link, or links on a specific page.

After working with a user on this, we found a solution that answered the above requirements, while being more sensible in principle. 

From Scrutiny 6.2.1 (and soon in Integrity 6 too) the Links views (by link, by page and by status) allow multiple selection. Context menus then give access to 'Mark as fixed' and 'Re-check selected'.
It is possible to select all urls, or all bad ones. All urls on a particular page or all urls with a specific status. It is still possible to re-check a link which may no longer exist on a page, but if the user selects the url and chooses the re-check option then it's illogical but it's a deliberate action on his part.

6.2.1 is currently in beta, the link below will download it and will give 30 days use.

Wednesday 20 January 2016

Live view in Scrutiny


I'm sure this will please a lot of people.

Scrutiny 5 included a new UI (in reviews, the interface of Scrutiny 4 and before was responsible for lost marks. The new previous / next system was well-received). But not being able to see the results as they happen has been a running theme in support calls since then.

There are many reasons for the lonely progress bar in v5. Not least is that constantly refreshing tableviews / outlineviews eats up cpu and resources. With a very large site, the engine goes faster and further if a tableview is not visible. (Since the v6 engine, there are Scrutiny users crawling millions of links in one sitting now. Efficiency is important!)

As practical as it is in those ways, one major downside of the bare progress bar is that if your scan goes a little pear-shaped (maybe because of timeouts, or because some settings need tweaking) you don't know that until the scan finishes, or until you realise that it's going on far longer than it should. The workaround' for this was a menu option 'View > Partial results' which you'll find in more recent versions of 5 and in 6 (up to 6.1.5). (You need to pause before this option can be used).

But there has still been demand to see what's happening. Maybe so that you can spot any problems as they're happening, maybe so that you can begin to visually scan the results while the scan is taking place, or maybe because it's just fun to watch the numbers change in front of your eyes!

So in 6.2 there's 'live view'. Alongside the progress bar is a button which unfolds a table (see screenshot above). There is a warning (once per session, hopefully not too annoying) that it's not recommended for larger sites. It's not possible to open up a link inspector, or expand one of the rows to see further details, but there's enough there to give that satisfying visual feedback and spot any problems as they happen.

[update] Scrutiny 6.2 is now out of beta, please download the current version from Scrutiny's home page

(New users - use it in demo mode for up to 30 days.)

Please let me know of any problems you might spot.