Wednesday 26 September 2018

http requests - they're not all the same

This is the answer to a question that I was asked yesterday. I thought that the discussion was such an interesting one that I'd post the reply publicly here.

A common perception is that a request for a web page is simply a request. Why might a server give different responses to different clients? To be specific, why might Integrity / Scrutiny receive one response when testing a url, yet a browser sees something different? What are the differences?


user-agent string

This is sent with a request to identify "who's asking". Abuses of the user-agent string by servers range from sending a legitimate-looking response to search engine bots and dodgy content to browsers, through to refusing to respond to requests that don't appear to come from browsers. Integrity and Scrutiny are good citizens and by default have their own standards-compliant user-agent string. If it's necessary for testing purposes, this can be changed to that of a browser or even a search engine bot.

header fields

A request contains a bunch of header fields. These are specifically designed to allow a server to tailor its content to the client. There are loads of possible ones and you can invent custom ones, some are mandatory, many optional. By default, Scrutiny includes the ones that the common browsers include, with similar settings.  If your own site requires a particular unusual or custom header field / value to be present, you can add them (in Scrutiny's 'Advanced settings'). 

cookies and javascript

Browsers have these things enabled by default, They're just part of our online lives now (though accessibility standards say that sites should be usable without them) but they're options in Scrutiny and deliberately both off by default. I'm discovering more and more sites which will test for cookies being enabled in the browser (with a handshake-type thing) and refuse to serve if not. There are a few sites which refuse to work properly without javascript being enabled in the browser. This is a terrible practice but it does happen, thankfully rarely. Switch cookies on in Scrutiny if you need to. But always leave the javascript option *off* unless your site does this when you switch js off in your browser:
An image showing a blank web page, message: This site requires Javascript to work


GET and HEAD

There are a couple of other things under Scrutiny's Preferences > Links > Advanced (and Integrity's Preferences > Advanced).   'Use GET for all connections' and 'Load data for all connections'. Both will probably be off by default. 
Screenshot of a couple of Scrutiny's preferences, always use GET and load data for all connections

A  browser will generally use GET when making a request (unless you're sending a form) and it will probably load all of the data that is returned.  For efficiency, a webcrawler can use the HEAD method when testing external links (because it doesn't need the actual content of the page, only the status code). If it does use the GET (for internal connections where it does want the content, or if  you have 'always use GET' switched on) and if if doesn't need the page content, it can cancel a request after getting the status code. This very rarely causes a problem, but I have had one or two cases where a large number of cancelled requests to the same server can cause problems.  

'Use GET for all connections' is unlikely to make any visible difference when scanning a site. Using the HEAD method (which by all standards should work) may not always work. but if a link returns any kind of error after using the HEAD method, Integrity / Scrutiny tests the same url again using GET. 

Other considerations

Outside of the particulars of the http request itself are a couple of things that may also cause different responses to be returned to a webcrawler and a browser. 

One is the frequency of the requests. Integrity and Scrutiny will send many more requests in a given space of time than a browser, probably many at the same time (depending on your settings). This is one of the factors involved in LinkedIn's infamous 999 response code. 

The other is authentication. A frequently-asked question is why a link to social media link returns a response code such as 'forbidden' when the link works fine in a browser. Having cookies switched on (see above) may resolve this but we forget that when we visit social media sites we have logged in at some point in the past and our browser remembers who we are. It may be necessary to be authenticated as a genuine user of a site when viewing a page that may appear 'public'.  Scrutiny and Webscraper allow authentication, the Integrity family doesn't.

I love this subject. Comments and discussion are very welcome.

Friday 21 September 2018

New free flashcard / Visualisation & Association method for MacOS

Vocabagility is more than a flashcard system, it's a method. Cards are selected and shuffled, one side is shown. Give an answer, did you get it right? Move on. As quick and easy as using a pack of real cards in your pocket.



The system also encourages you to invent an amusing mental image linking the question and answer (Visualization and Association)

Cards that you're not certain about have a greater probability of being shown.



This is an effective system for learning vocabulary / phrases for any language but could be used for learning other things too.

Download Vocabagility for Mac for free here.

Sunday 16 September 2018

ScreenSleeves ready to go as a standalone app

In the last post I gave a preview of a new direction for ScreenSleeves and now it's ready to go.


Changes in MacOS Mojave have made it impossible to continue with ScreenSleeves as a true screensaver. Apple have not made it possible (as far as I know at the time of writing) to grant a screensaver plugin the necessary permission to communicate with or control other apps.

Making ScreenSleeves run as a full app (in its own window) has several benefits:

  • Resize the window from tiny to large, and put it into full-screen mode.
  • Choose to keep the window on top of others when it's small, or allow others to move on top of it
  • The new version gives you the option to automate certain things, emulating a screensaver:
    • Switch to full-screen mode with a keypress (cmd-ctrl-F) or after a configurable period of inactivity
    • Switch back from full-screen to the floating window with a wiggle of the mouse or keypress
    • Block system screensaver, screen sleep or computer sleep while in full-screen mode and as long as music is playing
As mentioned, Mojave has much tighter security. The first time you run this app, you'll be asked to allow ScreenSleeves access to several other things. It won't ask for permission for anything which isn't necessary for it to function as intended. You should only be troubled once for each thing that Screensleeves needs to communicate with.

The new standalone version (6.0.0) is available for download, it runs for free for a trial period, then a small price to continue using it. (Previously, the screensaver came in a free and 'pro' versions with extras in the paid version).

Friday 7 September 2018

Screensleeves album art screensaver as a standalone app

Screensleeves has been a popular screensaver for a number of years, but the security changes in the new Mojave OS may make its functionality impossible.

Over the years people have suggested that it could be a free-standing app rather than a screensaver. This comes with some advantages - eg you can keep it minimised and floating above other windows in the corner of the screen when it's not in full screen mode.

This may be the only way to keep the screensaver alive. I've been experimenting with the idea, ironing out some issues related to the change, and using it. I have to say that I like it very much.

Here's a very quick peek at what all this means.