The IETF HTTP working group has issued a last call for comments on the proposed HTTP/2 standard, which means that the process has entered the final stage of open community review before the current draft may become an RFC. Jetty has implemented the proposal already and this website is running it already! There is a lot of good in this proposed standard, but I have some deep reservations about some bad and ugly aspects of the protocol.
HTTP/2 is a child born of the SPDY protocol developed by Google and continues to seek the benefits that have been illuminated by that grand experiment. Specifically:
- The new protocol supports the same semantics as HTTP/1.1 which was recently clarified by RFC7230. This will allow most of the benefits of HTTP/2 to be used by applications transparently simply by upgrading client and server infrastructure, but without any application code changes.
- HTTP/2 is a multiplexed protocol that allows multiple request/response streams to share the same TCP/IP connection. It supports out of order delivery of responses so that it does not suffer from the same Head of Line Blocking issues as HTTP/1.1 pipeline did. Clients will no longer need multiple connections to the same origin server to ensure good quality of service when rendering a page made from many resources, which means a very significant savings in resources needed by the server and also reduces the sticky session problems for load balancers.
- HTTP headers are very verbose and highly redundant. HTTP/2 provides an effective compression algorithm (HPACK) that is tailored to HTTP and avoids many of the security issues with using general purpose compression algorithms over TLS connections. Reduced header size allows many requests to be sent over a newly opened TCP/IP connection without the need to wait for it’s congestion control window to grow to the capacity of the link. This significantly reduces the number of network round trips required to render a page.
- HTTP/2 supports pushed resources, so that an origin server can anticipate requests for associated resources and push them to the clients cache, again saving further network round trips.
You can see from these key features, that HTTP/2 is primarily focused on improving the speed to render a page, which is (as the book of speed points out) a good focus to have. To a lesser extent, the process has also considered through put and server resources, but these have not been key drivers and indeed data rates may even suffer under HTTP/2 and servers need to commit more resources to each connection which may consume much of the savings from fewer connections.
While the working groups was chartered to address the misuse of the underlying transport occurring in HTTP/1.1 (eg long polling), it did not make much of the suggestion to coordinate with other working groups regarding the possible future extension of HTTP/2.0 to carry WebSockets semantics. While a websocket over http2 draft has been written, some of the features that draft referenced have subsequently been removed from HTTP/2 and the protocol is primarily focused on providing HTTP semantics.
The proposed protocol does not have a clear separation between a framing layer and the HTTP semantics that can be carried by that layer. I was expecting to see a clear multiplexed, flow controlled framing layer that could be used for many different semantics including HTTP and webSocket. Instead we have a framing protocol aimed primarily at HTTP which to quote the drafts editor:
“What we’ve actually done here is conflate some of the stream control functions with the application semantics functions in the interests of efficiency” — Martin Thomson 8/May/2014
I’m dubious there are significant efficiencies from conflating layers, but even it there are, I believe that such a design will make it much harder to carry WebSocket or other new web semantics over the http2 framing layer. HTTP semantics are hard baked into the framing so intermediaries (routers, hubs, load balancers, firewalls etc.) will be deployed that will have HTTP semantic hard wired. The only way that any future web semantic will be able to be transported over future networks will be to use the trick of pretending to be HTTP, which is exactly the kind of misuse of the underlying transport, that HTTP/2 was intended to address. I know it is difficult to generalise from one example, but today we have both HTTP and WebSocket semantics widely used on the web, so it would have been sensible to consider both examples equally when designing the next generation web framing layer.
An early version of the draft had a header compression algorithm that was highly stateful which meant that a single streams headers had to be encoded/decoded in total before another streams headers could be encoded/decoded. Thus a restriction was put into the protocol to prevent headers being transmitted as multiple frames interleaved with other streams frames. Furthermore, headers are excluded from the multiplexing flow control algorithm because once encoded transmission cannot be delayed without stopping all other encoding/decoding.
The proposed standard has a less stateful compression algorithm so that it is now technically possible to interleave other frames between a fragmented header. It is still not possible to flow control headers, but there is no technical reason that a large header should prevent other streams from progressing. However a concern about denial of service was raised in the working group, and while I argued that it was no worse than without interleaving, the working group was unable to reach consensus to remove the interleaving restriction.
Thus HTTP/2 has a significant incentive for applications to move large data into headers, as this data will effectively take control of the entire multiplexed connection and will be transmitted at full network speed regardless of any http2 flow control windows or other streams that may need to progress. If applications are take up these incentives, then the quality of service offered by the multiplexed connection will suffer and the Head of Line Blocking issue that HTTP/2 was meant to address will return as large headers will hit TCP/IP flow control and stop all streams. When this does happen, clients are likely to do exactly as they did with HTTP/1.1 and ignore any specifications about connection limits and just open multiple connections, so that requests can overtake others that are using large headers to try to take an unfair proportion of a shared connection. This is a catastrophic scenario for servers as not only will we have the increased resource required by HTTP/2 connections, but we will also have the multiple connections required by HTTP/1.
I would like to think that I’m being melodramatic here and predicting a disaster that will never happen. However the history from HTTP/1.1 is that speed is king and that vendors are prepared to break the standards and stress the servers so that applications appear to run faster on their browsers, even if it is only until the other vendors adopt the same protocol abuse. I think we are needlessly setting up the possibility of such a catastrophic protocol fail to protect against a DoS attack vector that must be defended anyway.
There are many aspect of the protocol design that can’t be described as anything but ugly. But unfortunately even though many in the working group agree that they are indeed ugly, the IETF process does not consider aesthetic appeal and thus the current draft is seen to be without issue (even though many have argued that the ugliness means that there will be much misunderstanding and poor implementations of the protocol). I’ll cite one prime example:
A classic case of design ugliness is the END_STREAM flag. The multiplexed streams are comprised of a sequence of frames, some of which can carry the END_STREAM flag to indicate that the stream is ending in that direction. The draft captures the resulting state machine in this simple diagram:
+--------+ PP | | PP ,--------| idle |--------. / | | \ v +--------+ v +----------+ | +----------+ | | | H | | ,---| reserved | | | reserved |---. | | (local) | v | (remote) | | | +----------+ +--------+ +----------+ | | | ES | | ES | | | | H ,-------| open |-------. | H | | | / | | \ | | | v v +--------+ v v | | +----------+ | +----------+ | | | half | | | half | | | | closed | | R | closed | | | | (remote) | | | (local) | | | +----------+ | +----------+ | | | v | | | | ES / R +--------+ ES / R | | | `----------->| |<-----------' | | R | closed | R | `-------------------->| |<--------------------' +--------+ H: HEADERS frame (with implied CONTINUATIONs) PP: PUSH_PROMISE frame (with implied CONTINUATIONs) ES: END_STREAM flag R: RST_STREAM frame
That looks simple enough, a stream is open until an END_STREAM flag is sent/received, at which stage it is half closed, and then when another END_STREAM flag is received/sent the stream is fully closed. But wait there’s more! A stream can continue sending several frame types after a frame with the END_STREAM flag set and these frames may contain semantic data (trailers) or protocol actions that must be acted on (push promises) as well as frames that can just be ignored. This introduces so much complexity that the draft requires 7 paragraphs of dense text to specify the frame handling that must be done once your in the Closed state! It is as if TCP/IP had been specified without CLOSE_WAIT. Worse yet, it is as if you could continue to send urgent data over a socket after it has been closed!
This situation has occurred because of the conflation of HTTP semantics with the framing layer. Instead of END_STREAM being a flag interpreted by the framing layer, the flag is actually a function of frame type and the specific frame type must be understood before the framing layer can consider any flags. With HTTP semantics, it is only legal to end some streams on some particular frame types, so the END_STREAM flag has only been put onto some specific frame types in an attempt to partially enforce good HTTP frame type sequencing (in this case to stop a response stream ending with a push promise). It is a mostly pointless attempt to enforce legal type sequencing because there are an infinite number of illegal sequences that an implementation must still check for and making it impossible to send just some sequences has only complicated the state machine and will make future non-HTTP semantics more difficult. It is a real WTF moment when you realise that valid meta-data can be sent in a frame after a frame with END_STREAM and that you have to interpret the specific frame type to locate the actual end of the stream. It is impossible to write general framing code that handles streams regardless of their type.
The proposed standard allows padding to be added to some specific frame types as a “security feature“, specifically to address “attacks where compressed content includes both attacker-controlled plaintext and secret data (see for example, [BREACH])“. The idea being that padding can be used to hide the affects of compression on sensitive data. But as the draft says “padding is a security feature; as such, its use demands some care” and it turns out to be significant care that is required:
- “Redundant padding could even be counterproductive.”
- “Correct application can depend on having specific knowledge of the data that is being padded.”
- “To mitigate attacks that rely on compression, disabling or limiting compression might be preferable to padding as a countermeasure.”
- “Use of padding can result in less protection than might seem immediately obvious.”
- “At best, padding only makes it more difficult for an attacker to infer length information by increasing the number of frames an attacker has to observe.”
- “Incorrectly implemented padding schemes can be easily defeated.”
So in short, if you are a security genius with precise knowledge of the payload then you might be able to use padding, but it will only slightly mitigate an attack. If you are not a security genius, or you don’t know your what your application payload data is (which is just about everybody), then don’t even think of using padding as you’ll just make things worse. Exactly how an application is meant to tunnel information about the security nature of its data down to the frame handling code of the transport layer is not indicated by the draft and there is no guidance to say what padding to apply other than to say don’t use randomized padding.
I doubt this feature will ever be used for security, but I suspect that it will be used for smuggling illicit data through firewalls.
This blog is not a call others to voice support for these concerns in the working group. The IETF process does not work like that, there are no votes and weight of numbers does not count. But on the other hand don’t let me discourage you from participating if you feel you have something to contribute other than numbers.
There has been a big effort by many in the working group to address the concerns that I’ve described here. The process has given critics fair and ample opportunity to voice concerns and to make the case for change. But despite months of dense debate, there is no consensus in the WG that the bad/ugly concerns I have outlined here are indeed issues that need to be addressed. We are entering a phase now where only significant new information will change the destiny of http/2, and that will probably have to be in the form of working code rather than voiced concerns (an application that exploits large headers to the detriment of other tabs/users would be good, or a DoS attack using continuation trailers).
Finally, please note that my enthusiasm for the Good is not dimmed by my concerns for the Bad and Ugly. The Jetty team is well skilled to deal with the Ugly for you and we’ll do our best to hide the Bad as well, so you’ll only see the benefits of the Good. Jetty-9.3 is currently available as a development branch and currently supports draft 14 of HTTP/2 and this website is running on it!. Work is under way on the current draft 14 and that should be supported in a few days. We are reaching out to users and clients who would like to collaborate on evaluating the pros/cons of this emerging standard.