現在このページは選択された言語ではご利用いただけません。

HTTP/2, load balancers and the ambiguous ":scheme" header

The problem 

Recently, we changed the load balancer in front of a HTTP/2 web application in our testing environment and all requests received a 502 error. This usually happens when a web application is misbehaving and returns something that the load balancer did not expect. But it worked with the previous one. What happened here?

Investigation 

We double-checked the configuration, logs and metrics, but there was no apparent reason for this behavior. In our desperation we captured the network traffic between old and new load balancer and the web application. First thing we found: The web application denied all requests by returning a RST_STREAM message. That is a HTTP/2-specific message that closes the HTTP/2-equvalent of a connection. Luckily these messages come with an additional error code. In our case it was 1 (PROTOCOL_ERROR). This code is very unspecific, but at least it told us that whatever code parsed the HTTP/2 request on the server-side was not happy about it. 

By comparing requests line by line we found something odd: The old load balancer set the header :scheme: http while the new load balancer set :scheme: https. Why was that the case?

Detour: HTTP/2 and :scheme 

The HTTP/2 specification IETF RFC-9113 describes :scheme as follows: 

The ":scheme" pseudo-header field includes the scheme portion of the request target. The scheme is taken from the target URI (Section 3.1 of [RFC3986]) when generating a request directly, or from the scheme of a translated request (for example, see Section 3.3 of [HTTP/1.1]). Scheme is omitted for CONNECT requests (Section 8.5)." :scheme" is not restricted to "http" and "https" schemed URIs. A proxy or gateway can translate requests for non-HTTP schemes, enabling the use of HTTP to interact with non HTTP services. 

So the “scheme is taken from the target URI”. 

Without any intermediary it is quite straightforward:

But what about intermediaries like load balancers? The RFC text mentions them in the last sentence, but only in the context of HTTP to non-HTTP conversion. In our case the load-balancer terminated TLS. That means it was HTTPS on the front and HTTP out the back. So which one of the following is right?

You can make an argument for both cases. From the client’s point-of-view the scheme of target URL is https. But since the load balancer intercepts the request you could also argue that the target URL changes between it and the web application. Then the right scheme would be http

Information on why developers of load balancers chose one behavior or the other is hard to come by. At least for our new load balancer HAProxy we were able to find a rationale:

The :scheme is a design mistake of HTTP/2 inherited from SPDY. It should never had made its way into the protocol for client-to server requests since it's only useful to proxies to know how to forward the request. It is set to https in order to make sure that if a server ever wants to check what's there, at least it will look like what browsers put there, so it's a principle of least surprise.

The fix 

Back to our 502-error problem: Why should the web application even care about the contents of the :scheme header? It can see for itself if the request came through a TLS tunnel or not. Well most integrated web servers actually don’t care. At this point we had already migrated multiple HTTP/2 servers to our new load balancer and did not encounter any issues. But this one was the first with ASP.NET Kestrel - the web server of choice for many C# web applications. And apparently as part of its request validation it checks for requests with scheme https whether it really came via TLS or not. And in our case it did not, thanks to the termination of TLS by the load balancer. That caused the RST_STREAM, PROTOCOL error messages in our packet capture and the load balancer did not expect this message and returned 502 to the client. After some searching we found the option AllowAlternateSchemes, which is disabled by default: 

If false then the `:scheme` field for HTTP/2 and HTTP/3 requests must exactly match the transport (e.g. https for TLS connections, http for non-TLS). If true then the `:scheme` field for HTTP/2 and HTTP/3 requests can be set to alternate values and this will be reflected by `HttpRequest.Scheme`. 

For those interested in the implementation: Here you can find the place in the Kestrel source code that denies the request. We enabled this flag at a central place in our codebase and now it will be set for all current and future C# projects. Alternatively we could have forced the load balancer to rewrite the :scheme header. As Kestrel seems to be the only web server having an issue with that header we decided against it.

Learnings 

Of course diagnosing the problem was not as straightforward as this post may make you believe. We investigated a lot of different leads until we found the root cause. The following learnings will help us (and maybe you) diagnose issues like this earlier:

Create packet captures (e.g., via tcpdump) early. When your usual observability measures like logs and metrics fail, packet captures show you the raw truth. 

Add occurrences of RST_STREAM messages to your load balancer logs. The error code can really help to narrow down why a request was abruptly denied. 

Just because two software products are RFC-compliant does not mean they behave the same in every aspect. Sooner or later, there is a case the RFC did not properly define. And then each product behaves as its developers thought would be best. We’re human, and so these judgment calls won’t always be the same.


About the Author

Philipp Hossner, Staff Engineer, works on various infrastructure projects across DeepL. His work is focused on everything necessary to transport HTTP requests from client to server. He likes to learn about new features and quirks of load balancers, caches and the HTTP protocols in general.

Connect with Philipp at https://github.com/phihos.

共有する