Tuesday 14 September 2021

Frustrating loop with NSURLSession when implementing URLSession: task: willPerformHTTPRedirection: newRequest: completionHandler:

This isn't really a question, I think I've solved the problem. But I think I've seen this in the past and solved it then. So I'm posting this here in case anyone else - most likely future me - has this same problem and wants to save some time.

Context: I'm working on the next version of my crawling engine - it moves to NSURLSession rather than NSURLConnection**. Therefore (with the connection stuff at least) it's a ground-up rewrite. 

Here's the problem I hit this morning. A Youtube link is redirecting from a shortened link to a longer one. But some unexpected things are happening. 

My URLSession: task: willPerformHTTPRedirection: newRequest: completionHandler:   is being called and at first I simply put in the minimum code needed to simply continue with the redirection:

According to example code found in various places, that should work. The documentation suggests that this is fine: "completionHandler: A block that your handler should call with either the value of the request parameter, a modified URL request object, or NULL to refuse the redirect and return the body of the redirect response."

As you can see from the screenshot above, our url is being redirected over and over, each time with some garbage appended to the querystring. 

Integrity and Scrutiny handle this fine. (Though at this point they are using the old NSURLConnection.) However, they have a lot more code in the redirect delegate method than my new one. Notably, they make a mutable copy of the new request and make some changes. Why do they need to do that?

I've a funny feeling that I've seen this problem before. Indeed, adding code to make a clone of the new request and modify it is what cures this problem.

It's not enough to simply clone and return the new request.  This is the line that makes the difference:

[clonedRequest  setValue: [[request URL] host] forHTTPHeaderField: @"Host"];

(There's more to it than that, but there are reasons why my code isn't open source! Also, I use Objective C, apologies if you're a Swift person*).

In short, when we create the original request, we set the Host field. In the past I have found this to be necessary in certain cases. In fact, the documentation says that the Host field must be present with http 1.1 and a server may return a 4xx status if it's not present and correct.

If we capture the header fields of the proposed new request in our delegate method, the original Host field appears unchanged. Therefore, it no longer matches the actual host of the updated request url. Here, the url is being redirected to https://www.youtube.com/watch?v=xyz, but the Host field remains "youtu.be"

As I mentioned, I've written the fix into Integrity and Scrutiny in the dim and distant past, presumably because I have spent time on this problem before. 

I'm guessing that this isn't a problem if you don't explicitly set the Host field yourself in your original request, but if you don't then you may find other problems pop up occasionally. 

If future me is reading this after starting another ground-up project and running into the same problem: you're welcome.

** The important delegate methods of NSURLSessionTask are very similar to the old connection ones. Because I'm seeing similar behaviour with both, I do believe that beneath the skin, NSURLSession and NSURLConnection are just the same. 

* I'm aware that many folks are using Swift today, and it's getting harder to find examples and problem fixes written in Objective C. I'm expecting that (as with java-cocoa) Apple will eventually remove support. But I won't switch and I'm grateful for Objective C support as long as it lasts.

No comments:

Post a Comment