Is HTTP A Networking Protocol (Pt 2) - The Evolution of HTTP
Chidiadi Anyanwu
OCI Certified Architect Associate | AZ-900 | HCIA-Datacom | BEng Electrical Engineering, University of Port Harcourt | Polymath
In the last article, we introduced HTTP, talked about its working principle, methods and status codes. Now, we want to dig deeper into the HTTP protocol, its evolution and frame structure. We'll deal with:
HTTP has gone through about 4 versions now. Version 1.0, 1.1, 2.0, and now 3.0.
HTTP/1.0 -Released in 1996. For this version, every request sent to the same server required a new TCP connection.
HTTP/1.1 - Released in 1997. In this version, the "keepalive" mechanism was introduced and TCP connections could be used for more than one request. Pipelining was also introduced here, which means that you could send another request before the first one was replied to.
HTTP/1.1 suffered from a problem called head-of-line blocking (HOLB) because of the limitations on the number of TCP connections clients and servers could create at a time. This meant that if you had to send out 15 requests to a server and they follow a particular order, you would have to wait for the current TCP connections to be terminated before you establish more and send requests.
The 13th request for example will then be delayed when the 9th request has not gone because they have to go in order. The requests ahead of the line would block the ones behind from being sent. Pipelining reduced the problem, but it was still there.
HTTP/2.0 - Released in 2015. It was based on Google's SPDY protocol, which had been undergoing development a few years before. Rather than use text like the previous versions, this version transmitted requests in binary. It also introduced header compression, and a "push" feature that allowed servers to send content that was not requested by the client but may be necessary.
HTTP/2.0 also tried to solve the HOLB problem by introducing HTTP streams, a form of multiplexing where multiple requests were sent over a single TCP connection in no particular order and the responses can be received in no particular order. This solved the problem at the HTTP level, but still had problems. The HTTP protocol is an application layer protocol (TCP/IP stack). It sits on top of the TCP protocol as its transport layer protocol, and TCP needs acknowledgements before it parses packets. The TCP does not know that the requests are independent and can be processed differently. It only sees packets with sequence numbers, and tries to make sure all the packets are complete before it processes anything. So, if all 15 packets arrived safely and request number one was lost in transmission, all the others would wait for the first packet to be retransmitted before any processing is done. That is another form of HOLB, but on the TCP level. On the transport level, not the application level.
HTTP/3.0 - Released in 2022. This version did away with TCP and used a new hybrid form of transport protocol consisting of QUIC and UDP. The QUIC protocol sits on top of UDP, and since UDP does not care about reliability, there is no transport layer head-of-line blocking. The QUIC protocol then handles the retransmission, error and flow control in a way that can benefit HTTP and support streams on the transport layer.
The QUIC protocol also eliminates the need for using SSL/TLS with HTTP as TLS 1.3 is built into the QUIC protocol.
What is HTTPS?
HTTP on its own is an unencrypted protocol, so sending data over HTTP makes your system very vulnerable. Anyone who intercepts the traffic can see the data being sent. So, engineers came up with a new idea. We already had SSL (Secure Sockets Layer) encryption, now known as TLS (Transport Layer Security) encryption. They could just encrpyt HTTP in it and call it HTTP Secure. So, HTTPS was born.
Hypertext Transfer Protocol Secure is basically the HTTP protocol with the added security of SSL/TLS. That is how HTTP is usually implemented. The URL of the website you visit will tell you what protocol was used.
Browsers also try to warn us when we visit sites that are not properly secured. They usually have a padlock symbol. However, a website not having a padlock symbol does not mean that it is not an encrypted connection. It may be using a self-signed certificate, not a certificate signed by a Certificate Authority. Most browsers do not trust self-signed certificates.
Protocol Stack & Frame Structure
The HTTP protocol was built on the TCP/IP reference model. It basically consists of HTTP on top of TLS on top of TCP on top of IP for HTTP/2 and HTTP on top of QUIC on top of UDP on top of IP for HTTP/3.
The HTTP/2 frame looks like this
HTTP/2 frame {
Length (24),
Type (8),
Flags (8),
Reserved (1),
Stream Identifier (31),
Frame Payload (..),
}
So, the Length field is 24 bits, the Type field is 8 bits, Flag field is 8 bits, the Reserved field is 1 bit and the Stream Identifier is 31 bits. The Frame Payload (which is where the requests and responses sit) is variable in length.
From HTTP/2, the headers are compressed and sent in a a different frame from the data or body. Here are a few HTTP frame types and their values:
DATA Frame (0x00)
HEADER Frame (0x01)
PRIORITY Frame (0x02)
For HTTP/3, there is a RESERVED Frame instead of a PRIORITY Frame.
HTTP SEMANTICS
HTTP Semantics refer to all the communications happening around a HTTP resource. It includes all the intentions described in request methods and headers, the status codes received by the client, and all the control data and resource metadata exchanged.
REQUEST & RESPONSE MESSAGES
HTTP messages sit in the payload of the HTTP frame which is encrpted and sent in a TCP packet or QUIC packet in the case of HTTP/3. The messages consist of:
For a HTTP request, the request line consists of the request method, target URI and the protocol version.
For a HTTP response, the response line consists of the protocol version, status code and status message.
领英推荐
HEADERS
HTTP headers let the clients and servers include additional information in request and response messages. In HTTP/1.1, the headers and data are in the same frame, but from HTTP/2.0, the headers and body get separated into the HEADER frame and DATA frame. And the HEADER frames are compressed with HPACK for HTTP/2.0 and QPACK for HTTP/3.0.
The headers can be categorized either by context or by how proxy servers handle them. Here, we'll only look at the basic classes by context, and they are:
General headers are headers that are used in any type of message, request or response. General headers do not apply to the content of the message. Examples include:
The Request Method tells the method you used in a request. The Request URL header specifies the URI of the target resource.
Request headers are headers used to provide context in an HTTP request. Examples include:
The Accept header is used to tell the server what content types the client can understand.
The User Agent header identifies the application and operating system of the device sending the request.
Response headers are used to provide context in an HTTP response. Examples include:
The Date header is used to specify the date and time of the response. The Server header is used to specify the type of server that sent the response.
Entity headers are headers that are used to provide information relating to the content of the message. They can be divided into representation headers and payload headers, but I'm not going into that now. Examples include:
The Content Type header specifies the type of content carried in the message body. We have different values like text/html, text/css, application/json, image/jpg, image/png and others.
The Content Encoding header tells the format in which the content of the message is encoded.
QUIC
I saw someone on the internet say that QUIC meant Quick UDP Internet Connections, and I saw others give stern warnings that it is not an acronym, but a name. I didn't believe any of them, so I decided to check the documentation, and this is what it says.
The QUIC protocol has managed to partially take the place of TCP in HTTP/3, and everyone was talking about it.
The QUIC protocol is based on the gQUIC protocol which was started by Jim Roskind at Google in 2012. However, in 2016, the IETF established a working group to standardize the protocol which was finally completed by June 2022.
It aimed at solving some of the problems in HTTP/2.0 and making up for the shortcomings of TCP in relation to HTTP. The QUIC protocol offers some interesting features including in-built TLS 1.3, zero round trip time (0-RTT) and even connection migrations which we're going to explain in a bit.
The QUIC protocol took advantage of existing transport layer infrastructure by sitting on top of the UDP protocol. It also bypasses TCP head-of-line blocking by using UDP because UDP does not stop any packet or care to know what's going on. It just sends packets and forgets them.
With HTTP/3.0, the QUIC protocol handles the retransmission, flow control and error control, and makes use of QUIC streams (or STREAM frames) to send the data. This solves the L4 HOLB problem because the transport layer functions are done by the QUIC protocol which is designed for HTTP rather than TCP that was not customised for HTTP.
With QUIC, every packet has a unique packet number and a stream ID, so retransmitted packets cannot be confused with other packets. Also, the streams are independent so if there's blocking, only messages in the same stream are blocked. Other streams are unaffected.
Connection Migration
Sometimes, when you're connected to the internet through Wi-Fi and you switch to cellular, your IP address may change. When your IP address changes, your TCP connection gets terminated. With QUIC, each QUIC packet has a 64 bit connection ID with which it uses to identify connections. So, in a case where your IP address changes or you move locations, your QUIC connection will not be terminated. The connection ID mechanism also helps it better support network address translation (NAT).
0-RTT
Round trip time is the time it takes for a device to send a message and get a reply within a network. Simply put, it is the number of round trips it takes to communicate with another device.
With TCP, devices have to set up a 3-way handshake, set up a TLS session and then send the messages. That's about 3 or more round trips. With QUIC, devices can actually remember the last encryption keys and start up by the client sending an encrypted request right away withou setting up anything. This means it took the client zero round trips before it could start sending requests to the server. This is known as zero round trip time.
All these shiny new features don't mean that QUIC is going to replace TCP though. QUIC is meant for HTTP and TCP is a standard transport layer protocol. QUIC still has a few problems, especially with "middleboxes" like proxies, load balancers and firewalls. The fact that QUIC uses UDP makes it problematic. Most firewalls simply block UDP, making it impossible for QUIC to work in most scenarios.
The QUIC Packets
The QUIC protocol has basically two categories of packets. Long header packets and short header packets. Long header packets are used before the establishment of 1-RTT keys and short header packets are used after the version and 1-RTT keys are negotiated. I won't go deep into that, but you can read about it in Section 17 of the documentation (RFC 9000).
If you enjoyed the article, please like and comment. And subscribe to the newsletter.
Subscribe to the Telegram channel to get past and future content.
Telegram Channel: https://t.me/SpecificKnowledge
Also follow Specific Knowledge on Twitter: https://twitter.com/specificknowhow
Thank you.