PeacockMedia: tutorial

Showing posts with label tutorial. Show all posts

Wednesday, 8 August 2018

Getting started - Hue-topia for Mac

After following these steps which will only take a couple of minutes, you’ll know how to make and use presets, set your lamps to turn on and off on schedule and use effects.

(This tutorial was written on 8 August 2018 and supersedes the previous version of the same tutorial.)

The latest version of Huetopia is available here

1. If you’ve not already done so, make sure your Bridge and some bulbs are switched on and start Hue-topia. The first time that you start the app it will try to find your bridge and attempt to log in. Finding the bridge requires an internet connection.

The only thing that you should need to do is to press the button on the bridge when instructed, to pair it with the Hue-topia app. If there are any problems at this stage, see Troubleshooting in the Hue-ser manual.

Make and try two presets

2. turn the brightness and the whiteness of all of your lamps all the way up and make sure all are on.

3. Click the [+] button (Save preset) and type ‘All white’ for the name of the new preset. OK that.

4. Turn the brightness and also the whiteness of all of your lamps to three quarters of the way up.

5. Click the [+] button (Save preset) and type ‘All warm’ for the name of the new preset. OK that.

6. You now have two presets and can use these from the Presets button in the toolbar and also from the status bar. Try this.

Make a preset that affects only certain lamps

7. Go to 'Manage presets...' from the Presets toolbar button or the Lamps menu.

8. Choose your preset from the window that appears, and press 'Lamps affected'. You'll now see a checkbox alongside each lamp in the main control window. Uncheck some of the lamps, press 'OK'. Your preset will now only affect the lamps that remained checked.

Set your lamps to turn on and off on schedule

9. Press the Schedules button or ‘Show schedules’ from the View menu (command-2 also shows this window).

10. Press the [+] button at the bottom-left of the Schedules window.

11. Type ‘Daily’ for the name, select ‘On & Off’, select ‘group: all’, type 17:00 for on and 23:00 for off. Leave all days selected. Click somewhere outside of the small window to save and close those settings.

All lamps are now set to switch on at 5pm and off at 11pm. Note that this will work even when your computer and Huetopia aren't running, because Hue-topia copies its schedules to the bridge.

Make and try an effect

12. Press the Effects toolbar button, and press the [+] button below your list of effects.

13. Type the name 'Pastels', and press the [+] below the timeline strip a couple of times to add a couple more nodes. Space them out equally

14. Click inside the colour swatch of the first node and choose a nice pastel colour. Do the same for the other two. Adjust the cycle time to a value that you like and make sure 'Loop' is selected. The preview swatch should show the effect animating. When that's working as you like, OK the sheet.

15. Return to the main window. Choose a light or group that you want to apply your effect to. Look for the little 'effect' icon in the control strip (ringed below). Click that and a menu of your effects will pop up. Choose your new Pastels effect and Hue-topia should start animating that effect for the chosen bulb or group. While the effect is running, the little icon will rotate.

Saturday, 14 July 2018

Migration to secure (https://) site

This is an older article. The information in it is still correct but there's a newer article here with new screenshots and revised content.

This has been a big week but peacockmedia.software is now https://

This is a well-overdue move. Google have been offering small carrots for a long time, but at the end of this month, they'll be adding a stick as well. They're switching from informing users when a connection is secure, to warning users if a connection is insecure. Google Chrome is making this move but other browsers are expected to follow suit.

Well-informed web users will know whether they really need a connection to be secure or not, but I suspect that even for those users, when this change really takes hold, the red unlocked padlock will start to become an indicator of an amateur or untrustworthy site.

Once the certificate is installed (which I won't go into) then you must weed out links to your http:// pages and pages that have 'mixed' or 'insecure' content, ie references to images, css, js and other files which are http://.

Scrutiny makes it easy to find these.

1. Find links to http pages and pages with insecure content.

First you have to make sure that you're giving your https:// address as your starting url, and make sure that these two boxes are ticked in your settings,

and these boxes ticked in your Preferences,

After running a scan, Scrutiny will offer to show you these issues,

You'll have to fix-and-rescan until there's nothing reported. (When you make certain fixes, that will reveal new pages to Scrutiny for testing).

2. Fix broken links and images

Once those are fixed, there may be some broken links and broken images to fix too (I was copying stuff onto a new server and chose to only copy what was needed. There are inevitably things that you miss...) Scrutiny will report these and make them easy to find.

3. Submit to Google.

Scrutiny can also generate the xml sitemap for you, listing your new pages (and images and pdf files too if you want).

Apparently Google treats the https:// version of your site as a separate 'property' in its Search Console (was Google Webmaster Tools). So you'll have to add the https:// site as a new property and upload the new sitemap.

[update 15 Jul] I uploaded my sitemap on Jul 13, it was processed on Jul 14.

4. Redirect

As part of the migration process, Google recommends that you then "Redirect your users and search engines to the HTTPS page or resource with server-side 301 HTTP redirects" (full article here)

Saturday, 28 April 2018

Case study - using WebScraper to compile a list of information in a useful format from a website

Here's a frustrating problem that lent itself really well to using WebScraper for Mac.

This is the exhibitor list for an event I was to attend. It's a very long list, and unless you recognise the name, the only way to see a little more information about each exhibitor is to click through and then back again.

I wanted to cast my eye down a list of names and brief summaries, to see who I might be interested in visiting. Obviously this information will be in the printed programme, but I don't get that until the day.

(NB this walkthrough uses version 4.2.0 which is now available. The column setup table is more intuitive in 4.2 and the ability to extract h1s by class is a new feature in 4.2)

1. Setting up the scan. It's as easy as entering the starting url (the url of the exhibitor list). There's also this very useful scan setting (under Scan > Advanced) to say that I only want to travel one click away from my starting url (there's no pagination here, it's just one long list).

There's also a "new and improved" filter for the output. Think of this as 'select where' or just 'only include data in the output table if this is true'. In this case it's easy, we only want data in the output if the page is an exhibitor detail page. Helpfully, those all contain "/exhibitors-list/exhibitor-details/" in the url, so we can set up this rule:

2. Setting up the columns for the output table. The Helper tool shows me that the name of each business is on the information page within a heading that has a class. That's handy, because I can simply choose this using the helper tool and add a column which selects that class.

3. The summary is a little more tricky, because there's no class or id to identify it. But helpfully it is always the first paragraph (<p>) after the first heading. So we can use the regex helper to write a regular expression to extract this.

The easy way to write the expression is simply to copy a chunk of the html source containing the info you want, plus anything that identifies the beginning and end of it, and then replace the part you want with (.+?) (which means 'collect any number of any characters'). I've also replaced the heading itself with ".+?" (the same, but don't collect) because that will obviously change on each page. That is all more simple than I've made it sound there. I'm no regex expert (regexpert?) - you may well know more than me on the subject and there may be better expressions to achieve this particular job, but this works, as we can see by hitting enter to see the result.

Here's what the column setup looks like now:

(Note that I edited the column headings by double-clicking the heading itself. That heading is mirrored in the actual exported output file, and acts as a field name in both csv and json)

4. Press Go, watch the progress bar and then enjoy the results. Export to csv or other format if you like.

Friday, 27 April 2018

HTMLtoMD and Webscraper - a comparison converting & archiving a site as MarkDown

I noticed a few people visiting the website as a result of searching for "webpage to markdown" and similar.

Our app HTMLtoMD was designed to do exactly that, and is free. But it was experimental and hasn't received very much development in a long time.

It still does its job, but our app Webscraper can also do that job, with many more options, and is a much more active product, it's selling and is under regular development now.

I thought it was worth writing a quick article to compare the output and features of the two apps.

First HTMLtoMD. It's designed for one job and so is simple to use. Just type or paste the address of your page or homepage into the address bar and press Go. If you choose the single page option, you'll immediately see the page as markdown, and can copy that markdown to paste somewhere else.

If you don't choose the single page option, the app will scan your site, and then offer to save the pages as markdown.

Here's the big difference with WebScraper. HTMLtoMD will save each page separately as a separate markdown file:

So here's how to do the same thing with Webscraper. It has many more options (which I won't go into here.) The advanced ones tend not to be visible until you need them. So here's WebScraper when it opens.

For this task, simply leave everything at defaults, enter your website name and make sure you choose "collect content as markdown" from the 'simple setup' as shown above. Then press Go.

For this first example I left the output file format set to csv. When the scan has run, the Results table shows how the csv file will look. In this example we have three field names; url, title, content - note that the 'complex setup' allows you to choose more fields, for example you might like to collect the meta description too.

You may have noticed in one of the screenshots above that the options for output include csv, json and txt. Unlike HTMLtoMD, Webscraper always saves everything as a single file. The csv output is shown above. The text file is also worth looking at here. Although it is a single file, it is structured. I've highlighted the field headings in the screenshot below. As I mentioned above, you can set things up so that more information (eg meta description) is captured.

[Update]

It occurred to me after writing the article above, that collecting images is a closely-related task. WebScraper can download and save all images while performing its scan. Here are the options, you'll need to switch to the 'complex setup' to access these options:

If you're unsure about any of this, or if you can't immediately see how to do what you need to do, please do get in touch. It's really useful to know what jobs people are using our software for (confidential of course). Even if it's not possible, we may have suggestions. We have other apps that can do similar but different things, and have even written custom apps for specific jobs.

Thursday, 19 April 2018

Option to download images in WebScraper

WebScraper for Mac now has the option to download and save images in a folder, as well as, or instead of, extracting data from the pages.

WebScraper crawls a whole website and it can extract information based on a class or id, or dump the content as html, plain text or markdown.

By just checking that box and choosing a download location, it can also dump all images to a local folder. That includes those found linked (the target of a regular hyperlink), those in a <img src , and those found in an srcset.

If you only want particular images, you can select only those which match a partial url or regex.

This functionality is in version 4.1.0 which is available now.

Saturday, 14 April 2018

Scraping details from a web database using WebScraper

In the example below, you can see that the important data are not identified by a *unique* class or id. This is a common scenario, so I thought I'd write this post to explain how to crawl such a site and collect this information. This tutorial uses WebScraper from PeacockMedia.

1. As there's no unique class or id identifying the data we want* we must use a reguar expression (regex).

Toggle to 'Complex setup' if you're not set to that already.

Add a column to the output file and choose Regular expression. Then use the helper button (unless you're a regex aficionado and have already written the expression). You can search the source for the part you want and test your expression as you write it. (see below). Actually writing the expression is outside the scope of this article, but one important thing to say is that if there is no capturing in your expression (round brackets) then the result of the whole expression will be included in your output. If there is capturing (as in my example below) then only the captured data is output.

2. paste the expression into the 'add column' dialog and OK. Here's what the setup looks like after adding that first column.

3. This is a quick test, I've limited the scan to 100 links simply to test it. You can see this first field correctly going into the column. At this point I've also added another regex column for the film title, because the page title is useless here (it's the same for every detail page)

4. Add further columns for the remaining information as per step 1.

5. If you limited the links while you were testing as I did, remember to change that back to a suitably high limit (the default is 200,000 I believe) and set your starting url to the home page of the site or something suitable for crawling all of the pages you're interested in.

As always, please contact support if you need more help or would like to commission us to do the job.

* as I write this, webscraper checks for class, id and itemprop inside <div>, <span>, <dd> and <p> tags, this is likely to be expanded in future.

Monday, 22 May 2017

List of all of a site's images, with file sizes

A recent enhancement to Scrutiny and Integrity make it easy to see a list of all images on a site, with file size.

It was already possible to check all images (not just the ones that are linked, ie a href = "image.jpg" but the actual images on the page, img src = "image.jpg" srcset = "image@2x.jpg 2x" )

The file size was also held within Scrutiny and Integrity, but wasn't displayed in the links views.

Now it is. It's a sortable column and will be included in the csv or html export.

Before the crawl, make sure that you switch on checking images:

You may need to switch on that column if it's not already showing - it's called 'Target size'.

Once it is showing, as with other columns in these tables, you can drag and drop them into a different order, and resize their width.

To see just the images - choose Images from the filter button over on the right (Scrutiny and Integrity Plus)

If you're checking other linked files (js or css) then their sizes may be displayed, but will probably have a ? beside them to indicate that the file size shown has not been downloaded and the uncompressed size verified (the size shown is provided in the server header fields).

This last point applies to Integrity and Integrity Plus, and will appear in Scrutiny shortly.

Note that all of this is just a measure of the sizes of all files found during a crawl. For a comprehensive load speed test on a given page, Scrutiny has such a tool - access it with cmd-2 or Tools > Page Analysis

Tuesday, 9 May 2017

Hidden gems: Scrutiny 7's Autosave feature

Some of Scrutiny's coolest tricks may not be terribly well documented, so I thought I'd start a series of articles here.

Previous versions of Scrutiny had an 'Autosave' checkbox. With the feature turned on, any data that was in memory would still be there after quitting (deliberately or unintentionally) and re-starting the app.

(Note that this option is in Scrutiny's Preferences, not the site settings.)

In Version 7, the same option is there but the app goes much further. With the feature switched on, Scrutiny will save the data from the most recent crawl of each site. This means all data; link check results, sitemap, SEO data, spell check, everything.

If crawl data exists for a site, then you'll see the database icon with the site's configuration. This takes you straight to the Results selection screen.

Since version 7.3 this feature is on by default for new users. So if you are scanning very large sites and short of disc space, you may want to switch the feature off (note that you can still save and load data manually using File > Save data and File > Open). But you may not want to switch Autosave off, because this window (below) shows the data that Scrutiny is keeping (with dates and sizes) and you can use this 'data manager' to selectively trash the data that you don't need. The table can be sorted by size, scan date or website name.

This is found under Tools > Manage Autosaved Data (cmd-5)

Saturday, 1 April 2017

Finding redirect chains using Scrutiny for Mac

Setting up redirects is important when moving a site, but don't let it get out of hand over time!

John Mueller has said that the maximum number of hops Googlebot will follow in a chain is five.

Scrutiny keeps track of the number of times any particular request is redirected, and can report these to you if you have any.

Here's how:

First you need to scan your site. Add a config in Scrutiny, give it a name and your starting url (home page)

Then press 'Scan now'.

One you've scanned your site and you're happy that you don't need to tweak your settings for any reason, go to the SEO results.

If there are any urls with a redirect chain, it will be shown in this list:

(Note that at the time of writing, Scrutiny is configured to include pages in this count if they have greater than 5 redirects, but you can see all redirect counts in the Links 'by link' view as described later).

You can see the pages in question by choosing 'Redirect chain' from the Filter button over on the right:

That will show you the urls in question (as things stand in the current version as I write this, it'll show the *final* url - this is appropriate here, because this SEO table lists pages, not links, the url shown is the actual url of the page in question.)

A powerful tool within Scrutiny is to see a trace of the complete journey.

Find the url in the Links results. (You can sort by url, or paste a url into the search box.) Note that as from version 7.2, there is a 'Redirect count' column in the Links 'by link' view. You may need to switch the column on using the selector to the top-left of the table. You can sort by this column to fin the worst offenders:

.. and double-click to open the link inspector. The button to the right of the redirect field will show the number of redirects, and you can use this button to begin the trace:

Some of this functionality is new (or improved) in version 7.2. Users of 7.x should update.

There is a very reasonable upgrade path for users of versions earlier than 7.

Friday, 18 September 2015

Generating an XML sitemap for your website

This new video explains XML sitemaps and demonstrates how to generate one for any site using Integrity Plus. It also looks at some of the options.

Wednesday, 16 September 2015

Using Scrutiny to make important checks over your EverWeb website

In this new video we take a look at how to use Scrutiny to make some important UX and SEO checks over your Everweb website.

The checks themselves apply to any website, but EverWeb makes it easy to correct the issues, as we demonstrate here.

This tutorial uses Scrutiny for Mac and the EverWeb 'drag and drop' content management system.

Links become broken over time (link rot) so a regular link check is important. Everweb helps with this issue because it manages the links in your navigator, but links in your content are still vulnerable to your own or external pages naturally being moved, changed or deleted. Fortunately, finding them and fixing them is easy, as demonstrated.

The title tag and meta description are very important (and a good opportunity) for SEO. Scrutiny will highlight any that are missing, too long or too short. EverWeb makes it a breeze to update these where necessary.

Alt text for your images is also important (depending on the image). Once again, Scrutiny can highlight any potential issues / keyword opportunities and the video shows you how to update your site.

On the subject of keywords, it's very important to do your keyword research and ensure that your pages contain a reasonable amount of good quality content. Once Scrutiny has scanned your site, you can see any pages with thin content, keyword stuffed pages, check occurrences of your target keywords and even see a full keyword analysis:

Tuesday, 28 July 2015

403 'forbidden' server response when crawling website using Scrutiny

The problem: Scrutiny fails to retrieve the first page of your website and therefore gets no further. The result looks like this (above).

The reason: By default Scrutiny uses its own user-agent string (thus being honest with servers about its identity). This particular website (and the first I've seen for a long time to do this) is refusing to serve the website without the request being made from a recognised browser.

The solution: Scrutiny > Preferences > General

The first box on the first Preferences tab is 'User agent string'. A button beside this box allows you to choose from a selection of browsers (this is called 'spoofing'). If you'd like Scrutiny to identify itself as a browser or a version not in the list, just find the appropriate string and paste it in (if you can run your chosen browser, you can use this tool to find the UA string)

With the User agent string changed to that of a recognised browser, this problem may be solved.

Tuesday, 21 July 2015

Getting started with Scrutiny - first video

A milestone! I can't tell you how please I am with our first instructional video.

It's a quick tour of Scrutiny for Mac, performing a basic link check, reading the results, discussing a few settings and some troubleshooting. Much of this will be relevant to Integrity and Integrity Plus.

... top marks to tacomusic who has just become the voice of PeacockMedia!

Thursday, 9 July 2015

Using Integrity to scan Blogger sites for broken links - some specifics

I've recently been helping someone with a few issues experienced when testing a Blogger blog with Integrity.

Some of these things are of general interest, some will be useful to anyone else who's link-checking a Blogger site. These tips apply equally to Integrity Plus and Scrutiny.

1. Share links being reported as bad

You may have these share links at the bottom of each post.

As you'd expect, they redirect to a login page, so no danger of Integrity actually sharing any of your posts. The problem comes when you're testing a larger site with more threads. These links may eventually begin to return an error code. I don't know whether this is because of the heavy bombardment on the share functionality, or whether Blogger is detecting the abnormal use. Either way, you may begin to get lots of red in your results.

One solution is to turn down the number of threads to a minimum. This isn't desirable because the crawl will then take hours. A better solution is to ask Integrity not to check those links (it's pretty certain that they'll be ok).

(Note: Even though these link use a querystring with parameters, checking 'ignore querystrings' won't work because these links have a different domain to the blog address, thus they look like external links and the 'ignore querystrings' setting only applies to internal links.)

Add a 'blacklist rule' using the little [+] button (screenshot below). Make a rule that says 'do not check urls containing share-post'

While here, add similar rules for 'delete-comment' and 'post-edit'. It was a concern to see these urls appearing in my link-check results. They do indeed appear in the pages' html code, although they're hidden by the browser if you're browsing as a guest. But no need to worry - as you'd expect, they also redirect to a login screen and Integrity isn't capable of logging in. *

2. A large amount of yellow

Integrity highlights redirected urls in yellow. Not an error but a 'FYI'. Some webmasters like to find and deal with redirects, but the Blogger server uses redirects extensively and it's just part of the way it works. When testing a Blogger site, you will see a lot of these but it's not usually something you need to worry about.

If you like, you can change the colour that Integrity uses to highlight such links - you can change it to white, or better still, transparent. See Preferences > Views and then click the yellow colour-well to see the standard OSX colour picker with an 'opacity' slider.

3. Pageviews on your website

Given that Google Analytics uses client-side javascript to make it work (meaning that crawling apps like Integrity don't trigger page views **) I was surprised to find Integrity triggering page views with a Blogger site. I guess it counts the views server-side.

It seems that changing the user-agent string to that of Googlebot stopped these hits from registering.

The user-agent string is how any browser or web crawler identifies itself. It's useful for a web server to know who's hitting on it.

Posing as Googlebot by using the Googlebot user-agent string:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

... seems to work - it prevents hits from triggering page views in Blogger's dashboard

Deliberately using another string (known as 'spoofing') is technically mis-use of the user-agent string, but until Google recognises Scrutiny and Integrity as web crawlers, then I think this is forgivable. If you'd like to be a little more transparent then I've found that this alternative also works:

Integrity/5.4 (posing as: Googlebot/2.1; +http://www.google.com/bot.html)

I will be shortly building this Googlebot string into the drop-down picker in Preferences. In the mean time just go to Preferences > Global and paste one of those strings into the 'User-agent string' box.

* neither Integrity or Integrity Plus are capable of authenticating themselves, in effect they're viewing websites as an anonymous guest. Scrutiny is capable of authentication, it's a feature that's much in demand (if you want to test a website which requires you to log in before you see the content) but the feature must be used with care - it's not possible to switch it on without seeing warnings and advice.

** I guess that Scrutiny could trigger page views when its 'run js' feature is switched on, though I haven't tested that