Sunday, 9 July 2017
NSURLConnection won't die!
Integrity has been the most interesting project of my life.
I believe that the success of the web is down to the html standard and its flexibility *. It's human-readable, human-writable and web browsers do their best to render a page, whatever problems there might be.
I hate to think how many hundreds of hours of work that this has made for me over the last ten years.
In order to have a reliable web crawler which parses html itself, you have to investigate every problem and improve your code to handle whatever new unpredictable thing has been tripping it up.
It's taken ten years of this hard work for Integrity (and other related apps which use the same engine) to be as stable as it is. It has more users than ever and head-scratching problems are very few and far between now.
The worst times are where the problem happens at a deeper system level and you can get no debug information.
At a very high level, you can obtain data for a url in a single line. At the opposite extreme you can get involved with sockets etc. for Integrity I've taken the middle ground, creating the response and asynchronous connection and using delegate methods to monitor what's happening and be able to intervene if necessary.
But there comes a point where you say au revoir to the request / connection and wait for your various notifications. If the response / notification is unexpected then you're down to some educated guesswork and trial and error.
That's what's happened this week. At a certain point through scanning particular sites, all NSURLConnections would appear to 'lock up' and all would return timeout notifications. (And any further NSURLConnections created to any url within that app would also time out until the app was quit and re-started even though the same urls would respond in any other app.
As usual there are many questions and answers online, with many suggestions that aren't relevant or have no effect.
I eventually got somewhere with a process of elimination - stripping the relevant code down to bare essentials until the problem had gone, then adding the original code back in, chunk by chunk until it stopped working again.
It appears that the problem is related to connections staying alive. The app obviously manages the number of simultaneous connections, and either lets one connection load all its data and complete naturally, or cancel it (if that data isn't needed) before creating a new connection to replace it.
I think what's going wrong in these cases is that when you think you've let go of a connection with [connection cancel], that connection sometimes stays open, and the next one isn't replacing it but adding to the number until some limit is hit.
Removing all [connection cancel]s and allowing every connection to load and finish naturally completely solved the problem.
Making better use of the HEAD method (when you know that you only need the status but not the data) and explicitly making sure those requests have 'connection: close' in the request header should solve the problem but it doesn't entirely.
There's a lot I still don't know - why is a connection sometimes staying alive after it's been cancelled (or in the case of a HEAD request, when it has supplied the header info and says that it's done). If anyone knows, do tell!
* Despite Microsoft's and Netscape's best attempts to make it their own, it's survived as a truly universal standard - anyone can make a web page that can be read in any browser. It's an unusual thing and the IoT has a lesson to learn.
** There are some ridiculously unexpected things in the code of some websites (written by humans and written by machines)