HTTP vs HTTP2 performance Showdown: Part 1 hidden latencies.
Introduction:
In the fast-paced world of microservices, communication efficiency between services is paramount. Recently, we encountered a performance bottleneck in our production environment during peak traffic times. Despite healthy system vitals, communication latencies between services soared. This article delves into the troubleshooting process, highlighting a key takeaway: how HTTP/2 can significantly improve response times in Node.js applications.
In part 1 I have simply detailed my observations and learnings of what was slowing down HTTP. If you are here for HTTP2, scroll down to the bottom, where you will find my Github project, which has both http and http2 implemntation. Please star the project if you found it useful
The Bottleneck:
Our initial investigation yielded no abnormalities in monitoring systems. Metrics reflected healthy resource utilization, yet communication speeds remained sluggish. To isolate the issue, I built a test rig aimed to replicate the production environment's behavior, by simulating client-server interactions through load tests.
The Surprise:
As part of the test rig, I set up a server to respond with whatever data was set to it, like an echo. When I ran the load test, for 1000 calls per second, it yielded an average response time of 37ms, which seemed too high for me. So I tinkered around and found out that if I made the API calls in serial manner (calling the next API call only after I received the response for the last call), the average response time dropped to around 140 to 200 us.
While I did expect the ideal response times to be in us, to see things were working in serial better than in parallel was very surprising.
The Clue:
After doing a few more experiments tweaking the test parameters, adding logs, and other things (talking to a rubber duck), I figured out what was causing the delay.
When I had setup the client to not re-use the HTTP agent, (I had the code re-use the HTTP agent by default), the code used to throw an exception of "ECONNRESET" for parallel calls greater than 200 calls per second. But for serial execution this was not a problem. So I thought maybe the HTTP agent was doing something to prevent this. Here is what my HTTP agent looked like
this.agent = new http.Agent({
keepAlive: true,
keepAliveMsecs: 1000,
maxSockets: 10,
maxFreeSockets: 75,
});
The Deep Dive:
The key here was the maxSocket field. This field indicates how many network sockets (client ports) the code can keep open at a time. When re-using the agent, this was limited to 10, but when not re-using the agent, this number did not have an upper limit. Turned out the operating system puts a hard limit on this number, and while it is possible to increase this number to infinite, It may not be advisable.
When I realized this, I figured out one more important detail. I was calcuating the average response time from a quality point of view, as in what was being measured in production. The formula was
```
Quality average time = (sum of time required for each call) / total number of calls
```
This explained why the system was behaving better in serial than in parallel. When doing load testing of API calls the general rule is to use the following formula for average time
```
Quantity average time = total test time/ total number of calls.
```
There is a fundamental difference between these 2, which you can't realize until you switch from parallel to series execution.
The quantity average time is fundamentally better for parallel execution, as you can have multiple calls running in parallel, the overlapping time for parallel calls will be counted only once. The quality average time is fundamentally bad for parallel execution, as it double (in this case multi) counts the time.
Regardless, the results were the same. When running in parallel, each API call on average would have 37 ms added to its response time. This for an echo API is really-really bad. Especially when you know, you can get to an average time of 200 us. So I concluded that while the server was capable of quickly responding to the API calls, the client was doing something equivalent of twiddling its thumbs for almost 37ms for higher traffic situations.
The Realisation:
When I thought of the words "twiddling its thumbs" I noticed the code does not have thumbs (it's not dumb), it's the coder (me) who has the thumbs. Combining the clue that the HTTP agent and the fast response times in parallel, allowed me to notice that even in parallel execution the calls were being queued, running only 10 calls in parallel at a time.
So I changed the max-socket config to 40, and drum roll the quality response time came to 34ms. A measly 3ms improvement.
Just to confirm my suspicion I changed the config to 2, and the quality response time increased to 47ms.
This confirmed it. If it is surprising why this confirms the issue, please make sure you like the blog.
Remember for quality average time we summed the time required for each call, and then divided it by the total calls, when the time was counted from the point that the API call function was called, to the time the response was received. The HTTP agent was queueing the calls, and for parallel API execution the queueing wait time was included in the quality average time.
The Learning:
Resources:
I will add all the Stackoverflow, chatGPT, and other Google links once I compile them.
For now, you can checkout the Github project where I have uploaded the test rig. Make sure to read the Readme before trying anything funny.