Is HTTP A Networking Protocol (Pt 2) - The Evolution of HTTP
Photo by Miguel á. Padri?án at Pexels

Is HTTP A Networking Protocol (Pt 2) - The Evolution of HTTP

In the last article, we introduced HTTP, talked about its working principle, methods and status codes. Now, we want to dig deeper into the HTTP protocol, its evolution and frame structure. We'll deal with:

  • The evolution of HTTP
  • HTTPS
  • HTTP protocol stack & frame structure
  • The QUIC protocol

HTTP has gone through about 4 versions now. Version 1.0, 1.1, 2.0, and now 3.0.

HTTP/1.0 -Released in 1996. For this version, every request sent to the same server required a new TCP connection.

HTTP/1.1 - Released in 1997. In this version, the "keepalive" mechanism was introduced and TCP connections could be used for more than one request. Pipelining was also introduced here, which means that you could send another request before the first one was replied to.

HTTP/1.1 suffered from a problem called head-of-line blocking (HOLB) because of the limitations on the number of TCP connections clients and servers could create at a time. This meant that if you had to send out 15 requests to a server and they follow a particular order, you would have to wait for the current TCP connections to be terminated before you establish more and send requests.

The 13th request for example will then be delayed when the 9th request has not gone because they have to go in order. The requests ahead of the line would block the ones behind from being sent. Pipelining reduced the problem, but it was still there.

HTTP/2.0 - Released in 2015. It was based on Google's SPDY protocol, which had been undergoing development a few years before. Rather than use text like the previous versions, this version transmitted requests in binary. It also introduced header compression, and a "push" feature that allowed servers to send content that was not requested by the client but may be necessary.

HTTP/2.0 also tried to solve the HOLB problem by introducing HTTP streams, a form of multiplexing where multiple requests were sent over a single TCP connection in no particular order and the responses can be received in no particular order. This solved the problem at the HTTP level, but still had problems. The HTTP protocol is an application layer protocol (TCP/IP stack). It sits on top of the TCP protocol as its transport layer protocol, and TCP needs acknowledgements before it parses packets. The TCP does not know that the requests are independent and can be processed differently. It only sees packets with sequence numbers, and tries to make sure all the packets are complete before it processes anything. So, if all 15 packets arrived safely and request number one was lost in transmission, all the others would wait for the first packet to be retransmitted before any processing is done. That is another form of HOLB, but on the TCP level. On the transport level, not the application level.

HTTP/3.0 - Released in 2022. This version did away with TCP and used a new hybrid form of transport protocol consisting of QUIC and UDP. The QUIC protocol sits on top of UDP, and since UDP does not care about reliability, there is no transport layer head-of-line blocking. The QUIC protocol then handles the retransmission, error and flow control in a way that can benefit HTTP and support streams on the transport layer.

The QUIC protocol also eliminates the need for using SSL/TLS with HTTP as TLS 1.3 is built into the QUIC protocol.

What is HTTPS?

HTTP on its own is an unencrypted protocol, so sending data over HTTP makes your system very vulnerable. Anyone who intercepts the traffic can see the data being sent. So, engineers came up with a new idea. We already had SSL (Secure Sockets Layer) encryption, now known as TLS (Transport Layer Security) encryption. They could just encrpyt HTTP in it and call it HTTP Secure. So, HTTPS was born.

Hypertext Transfer Protocol Secure is basically the HTTP protocol with the added security of SSL/TLS. That is how HTTP is usually implemented. The URL of the website you visit will tell you what protocol was used.

Browsers also try to warn us when we visit sites that are not properly secured. They usually have a padlock symbol. However, a website not having a padlock symbol does not mean that it is not an encrypted connection. It may be using a self-signed certificate, not a certificate signed by a Certificate Authority. Most browsers do not trust self-signed certificates.

Protocol Stack & Frame Structure

The HTTP protocol was built on the TCP/IP reference model. It basically consists of HTTP on top of TLS on top of TCP on top of IP for HTTP/2 and HTTP on top of QUIC on top of UDP on top of IP for HTTP/3.

No alt text provided for this image
HTTP/2 vs HTTP/3

The HTTP/2 frame looks like this

HTTP/2 frame {

Length (24),

Type (8),

Flags (8),

Reserved (1),

Stream Identifier (31),

Frame Payload (..),

}

So, the Length field is 24 bits, the Type field is 8 bits, Flag field is 8 bits, the Reserved field is 1 bit and the Stream Identifier is 31 bits. The Frame Payload (which is where the requests and responses sit) is variable in length.

From HTTP/2, the headers are compressed and sent in a a different frame from the data or body. Here are a few HTTP frame types and their values:

DATA Frame (0x00)

HEADER Frame (0x01)

PRIORITY Frame (0x02)

For HTTP/3, there is a RESERVED Frame instead of a PRIORITY Frame.

HTTP SEMANTICS

HTTP Semantics refer to all the communications happening around a HTTP resource. It includes all the intentions described in request methods and headers, the status codes received by the client, and all the control data and resource metadata exchanged.

REQUEST & RESPONSE MESSAGES

HTTP messages sit in the payload of the HTTP frame which is encrpted and sent in a TCP packet or QUIC packet in the case of HTTP/3. The messages consist of:

  • Request/Response line
  • Headers
  • Body

For a HTTP request, the request line consists of the request method, target URI and the protocol version.

No alt text provided for this image
HTTP request message format

For a HTTP response, the response line consists of the protocol version, status code and status message.

No alt text provided for this image
HTTP response message format

HEADERS

HTTP headers let the clients and servers include additional information in request and response messages. In HTTP/1.1, the headers and data are in the same frame, but from HTTP/2.0, the headers and body get separated into the HEADER frame and DATA frame. And the HEADER frames are compressed with HPACK for HTTP/2.0 and QPACK for HTTP/3.0.

The headers can be categorized either by context or by how proxy servers handle them. Here, we'll only look at the basic classes by context, and they are:

  • General headers
  • Request headers
  • Response headers
  • Entity headers

General headers are headers that are used in any type of message, request or response. General headers do not apply to the content of the message. Examples include:

  • Request Method
  • Request URL
  • Status code

The Request Method tells the method you used in a request. The Request URL header specifies the URI of the target resource.

Request headers are headers used to provide context in an HTTP request. Examples include:

  • Accept
  • User-Agent

The Accept header is used to tell the server what content types the client can understand.

The User Agent header identifies the application and operating system of the device sending the request.

Response headers are used to provide context in an HTTP response. Examples include:

  • Date
  • Server

The Date header is used to specify the date and time of the response. The Server header is used to specify the type of server that sent the response.

Entity headers are headers that are used to provide information relating to the content of the message. They can be divided into representation headers and payload headers, but I'm not going into that now. Examples include:

  • Content-Type
  • Content-Encoding

The Content Type header specifies the type of content carried in the message body. We have different values like text/html, text/css, application/json, image/jpg, image/png and others.

The Content Encoding header tells the format in which the content of the message is encoded.

No alt text provided for this image
Headers as shown in Chrome when visiting Cavalry website

QUIC

I saw someone on the internet say that QUIC meant Quick UDP Internet Connections, and I saw others give stern warnings that it is not an acronym, but a name. I didn't believe any of them, so I decided to check the documentation, and this is what it says.

No alt text provided for this image
RFC 9000

The QUIC protocol has managed to partially take the place of TCP in HTTP/3, and everyone was talking about it.

The QUIC protocol is based on the gQUIC protocol which was started by Jim Roskind at Google in 2012. However, in 2016, the IETF established a working group to standardize the protocol which was finally completed by June 2022.

It aimed at solving some of the problems in HTTP/2.0 and making up for the shortcomings of TCP in relation to HTTP. The QUIC protocol offers some interesting features including in-built TLS 1.3, zero round trip time (0-RTT) and even connection migrations which we're going to explain in a bit.

The QUIC protocol took advantage of existing transport layer infrastructure by sitting on top of the UDP protocol. It also bypasses TCP head-of-line blocking by using UDP because UDP does not stop any packet or care to know what's going on. It just sends packets and forgets them.

With HTTP/3.0, the QUIC protocol handles the retransmission, flow control and error control, and makes use of QUIC streams (or STREAM frames) to send the data. This solves the L4 HOLB problem because the transport layer functions are done by the QUIC protocol which is designed for HTTP rather than TCP that was not customised for HTTP.

With QUIC, every packet has a unique packet number and a stream ID, so retransmitted packets cannot be confused with other packets. Also, the streams are independent so if there's blocking, only messages in the same stream are blocked. Other streams are unaffected.

Connection Migration

Sometimes, when you're connected to the internet through Wi-Fi and you switch to cellular, your IP address may change. When your IP address changes, your TCP connection gets terminated. With QUIC, each QUIC packet has a 64 bit connection ID with which it uses to identify connections. So, in a case where your IP address changes or you move locations, your QUIC connection will not be terminated. The connection ID mechanism also helps it better support network address translation (NAT).

0-RTT

Round trip time is the time it takes for a device to send a message and get a reply within a network. Simply put, it is the number of round trips it takes to communicate with another device.

With TCP, devices have to set up a 3-way handshake, set up a TLS session and then send the messages. That's about 3 or more round trips. With QUIC, devices can actually remember the last encryption keys and start up by the client sending an encrypted request right away withou setting up anything. This means it took the client zero round trips before it could start sending requests to the server. This is known as zero round trip time.

All these shiny new features don't mean that QUIC is going to replace TCP though. QUIC is meant for HTTP and TCP is a standard transport layer protocol. QUIC still has a few problems, especially with "middleboxes" like proxies, load balancers and firewalls. The fact that QUIC uses UDP makes it problematic. Most firewalls simply block UDP, making it impossible for QUIC to work in most scenarios.

The QUIC Packets

The QUIC protocol has basically two categories of packets. Long header packets and short header packets. Long header packets are used before the establishment of 1-RTT keys and short header packets are used after the version and 1-RTT keys are negotiated. I won't go deep into that, but you can read about it in Section 17 of the documentation (RFC 9000).

If you enjoyed the article, please like and comment. And subscribe to the newsletter.

Subscribe to the Telegram channel to get past and future content.

Telegram Channel: https://t.me/SpecificKnowledge

Also follow Specific Knowledge on Twitter: https://twitter.com/specificknowhow

Thank you.

要查看或添加评论,请登录

Chidiadi Anyanwu的更多文章

  • Route Tables: Cloud & Networking

    Route Tables: Cloud & Networking

    One of the major mechanisms through which routing works is the route table. Route tables exist both in traditional…

  • How Domain Name System (DNS) Works

    How Domain Name System (DNS) Works

    DNS is something the internet lives and breathes on. It is one of the fundamental technologies that makes our present…

    1 条评论
  • Is HTTP A Network Protocol?

    Is HTTP A Network Protocol?

    What is the difference between the web and the internet, or isn't it just two interchangeable names for the same thing?…

    9 条评论
  • The Point-to-Point Protocol

    The Point-to-Point Protocol

    The Point-to-Point Protocol (PPP) is one of many protocols used in Wide Area Network (WAN) connections. It is a…

    6 条评论
  • VPN Technologies - IPSec

    VPN Technologies - IPSec

    In my last article, we talked about GRE, its features and principles, its configuration and why it is usually used with…

  • VPN Technologies - Generic Routing Encapsulation (GRE)

    VPN Technologies - Generic Routing Encapsulation (GRE)

    Virtual Private Networks (VPN) are private networks built on top of public networks. These networks are called logical…

  • The Common, But Nameless Network Protocol

    The Common, But Nameless Network Protocol

    Unlike other protocols I’ve encountered in networking, there is one that people know but can't really tell you the…

    1 条评论
  • Virtual Local Area Network (VLAN)

    Virtual Local Area Network (VLAN)

    The concept of VLANs can be explained from two viewpoints; the viewpoint of scaling and that of isolation. I’ll explain.

  • The Dynamic Host Configuration Protocol (DHCP)

    The Dynamic Host Configuration Protocol (DHCP)

    Using the internet shouldn’t be so easy. You have to obtain an IP Address, configure it, configure your network…

    2 条评论
  • IP Addressing

    IP Addressing

    For hosts to communicate in a network, there are a couple of considerations. They need to identify who they want to…

    2 条评论

社区洞察

其他会员也浏览了