This is a big subject and gets bigger and more complicated as website become increasingly clever at preventing non-human visitors from being able to log in.
My post How to use Scrutiny to test a website which requires authentication has been updated a number of times in its history and I've just updated it again to include a relatively recent Scrutiny feature. It's a simple trick involving a browser window within Scrutiny which allows you to log into your site. If there's a tracking cookie, that's then retained for Scrutiny's scan.
It used to be possible to simply log in using Safari - Safari's cookies seem to have been systemwide, but after Yosemite, a browser's cookies seem to be specific to that browser.
The reason for this all being on my mind today is that I've just worked the same technique into WebScraper. I wanted to compile a list of some website addresses from pages within a social networking site which is only visible to authenticated users.
Webscraper doesn't have the full authentication features of Scrutiny but I think this method will work with the majority of websites which require authentication.
(This feature, and others, are in Webscraper 1.3 which will be available very shortly)
Tuesday, 3 January 2017
Sunday, 1 January 2017
17% off Integrity Plus
We'd like to wish you a happy and prosperous New Year.
Of course, that means having the best tools, and if you're a user of website link checker Integrity, or have trialled Integrity Plus, you'll enjoy the extra features of Integrity Plus for Mac. As well as the fast and accurate link check, you can filter and search your results, manage settings for multiple sites and generate an xml sitemap.
So we're offering a 17% discount to kick off 2017 (see what we did there?) Exp 14 Jan 2017
There's no coupon, simply buy from within the app or use this secure link:
https://pay.paddle.com/checkout/496583
Of course, that means having the best tools, and if you're a user of website link checker Integrity, or have trialled Integrity Plus, you'll enjoy the extra features of Integrity Plus for Mac. As well as the fast and accurate link check, you can filter and search your results, manage settings for multiple sites and generate an xml sitemap.
So we're offering a 17% discount to kick off 2017 (see what we did there?) Exp 14 Jan 2017
There's no coupon, simply buy from within the app or use this secure link:
https://pay.paddle.com/checkout/496583
Friday, 30 December 2016
Full release of Webscraper
WebScraper, our utility for crawling a site and extracting data or archiving content, is now out of beta.
There have been some serious enhancements over recent months, such as the ability to 'whitelist' (only crawl) pages containing a search term, the ability to extract multiple classes / id's (as separate fields in the output file) and a class/id helper which allows you to visually choose the divs or spans for extraction.
Now that the earlier beSta is about to expire, it's time to make all of this a full release. The price is a mere 5 US Dollars, for a licence which doesn't expire. The trial period is 30 days and the only limitation is that the output file has a limited number of rows so that you can still evaluate its output.
Find out more and download the app here, and if you try it and have any questions or requests, there's a support form here.
There have been some serious enhancements over recent months, such as the ability to 'whitelist' (only crawl) pages containing a search term, the ability to extract multiple classes / id's (as separate fields in the output file) and a class/id helper which allows you to visually choose the divs or spans for extraction.
Now that the earlier beSta is about to expire, it's time to make all of this a full release. The price is a mere 5 US Dollars, for a licence which doesn't expire. The trial period is 30 days and the only limitation is that the output file has a limited number of rows so that you can still evaluate its output.
Find out more and download the app here, and if you try it and have any questions or requests, there's a support form here.
Monday, 21 November 2016
Scrutiny for Mac Black Friday discount
50% off Scrutiny
For Black Friday
If you're a user of Integrity or Integrity Plus, or have trialled Scrutiny, we hope that this discount will help you to make the decision to add Scrutiny for Mac to your armoury. Exp 28-11-2016
Simply use the code: 5E861DD0
Look for the 'add coupon' link during the purchase process, either in-app, or using this secure link:
https://pay.paddle.com/checkout/494001
The code will work for the next week or so, please feel free to share the discount code, or use it to buy more licences if you have multiple users.
Monday, 31 October 2016
Webscraper from PeacockMedia - usage
[updated 23 Apr 2018 for version 4]
[reviewed 29 Aug 2021]
I've had one or two questions about using WebScraper. There's a short demo video here but if, like me, you prefer to cast your eye over some text and images rather than sit through a video, then here you go:
1. Type your website address (or starting url for the scan). Like Integrity / Scrutiny (Webscraper uses the same engine) the crawl will be limited to any 'directory' implied in the url.
2. Configure your output. If it's a single piece of information you want to extract from each page, you can use the Simple Setup. If you want to set up a number of columns, use the Complex Setup. Toggle between these two options below the address bar.
You must configure your output file before scanning, and then the app crawls your site, collecting the data as it goes. This is more efficient than the way that the first version of Webscraper worked but it does mean that if you want to change the configuration of your output file, you'll need to re-scan.
If you choose 'Complex setup' you'll need to configure your output file here. When you add a column you can choose basic metadata (title, description etc), a class or id, a regular expression (regex) or content (as plain text, html, markdown or an outline).
3. Test or run. You'll be able to either begin the scan, or run a short test. The 'Run test' button will perform a very short scan of a few pages and present your output as a quickview. If all looks well, you can press Go, or if you need to make changes, you can head back to the Output file configuration.
A common scenario is that the data you want isn't defined by a unique class or id. In these cases a regular expression can be used, there's a detailed tutorial here.
I've had one or two questions about using WebScraper. There's a short demo video here but if, like me, you prefer to cast your eye over some text and images rather than sit through a video, then here you go:
1. Type your website address (or starting url for the scan). Like Integrity / Scrutiny (Webscraper uses the same engine) the crawl will be limited to any 'directory' implied in the url.
2. Configure your output. If it's a single piece of information you want to extract from each page, you can use the Simple Setup. If you want to set up a number of columns, use the Complex Setup. Toggle between these two options below the address bar.
You must configure your output file before scanning, and then the app crawls your site, collecting the data as it goes. This is more efficient than the way that the first version of Webscraper worked but it does mean that if you want to change the configuration of your output file, you'll need to re-scan.
If you choose 'Complex setup' you'll need to configure your output file here. When you add a column you can choose basic metadata (title, description etc), a class or id, a regular expression (regex) or content (as plain text, html, markdown or an outline).
4. When the scan is complete, the Results tab will open. You can export this using the export button above the table. It uses the options you set in the 'Output file format' tab.
Note that the Save Project option from the File Menu will only save your setup, not the scan data.
A common scenario is that the data you want isn't defined by a unique class or id. In these cases a regular expression can be used, there's a detailed tutorial here.
Sunday, 30 October 2016
A sneak peek at some of the new features of Scrutiny v7
Scrutiny v7 is still very much in progress and being shaped, but here's a sneak preview, in case you'd like to feed back or make suggestions.
I demonstrate in the video:
- document based (multiple windows open at once)
- organise your websites into folders
- simpler navigation
- full autosave (view data for any site you've scanned previously)
[update] Version 7 is now well-established and has proved very popular. It's available here, and details about the very reasonable upgrade are also on the page.
http://peacockmedia.software/mac/scrutiny/
SaveSave
I demonstrate in the video:
- document based (multiple windows open at once)
- organise your websites into folders
- simpler navigation
- full autosave (view data for any site you've scanned previously)
[update] Version 7 is now well-established and has proved very popular. It's available here, and details about the very reasonable upgrade are also on the page.
http://peacockmedia.software/mac/scrutiny/
SaveSave
Friday, 28 October 2016
Affiliate scheme :: Scrutiny, Integrity and other PeacockMedia apps
My licensing and payment partner is Paddle. We have an affiliate system, with affiliates earning from the sales they generate.
It's simple; you register as an affiliate, you'll have access to some general links and you'll be notified of any special offers. You post that link on a review / blog / article and earn money when people click through and purchase.
You get 20% commission on the sale, and a monthly payout by Paypal or bank transfer.
If you'd like to jump on board then here's the registration form.
Once you have your account set up with Paddle, this is where you'll find your links.
It's simple; you register as an affiliate, you'll have access to some general links and you'll be notified of any special offers. You post that link on a review / blog / article and earn money when people click through and purchase.
You get 20% commission on the sale, and a monthly payout by Paypal or bank transfer.
If you'd like to jump on board then here's the registration form.
Once you have your account set up with Paddle, this is where you'll find your links.
Subscribe to:
Posts (Atom)






