HTTP Inefficiency
Overview
Local requests take unnecessarily long when served by C#’s implementation of an HTTP server. This was discovered when an application server, during a heavy load test, was stalling on such request. A post with a 2KB body between local processes was taking on average .7 seconds, but sometimes as long as 1.2 seconds, though the machine was under a heavy load, CPU was not at 100% and the stall came much earlier than expected. This particular server, probably because it handles both local and remote communication, originally had all requests handled via a standard C# HttpListener. This prompted the following investigation into the performance of HTTP, as implemented by the HttpListener/HttpWebRequest classes in C#, for local requests on Windows machines.
Performance Testing
A simple testing scenario was constructed: asynchronously sending and receiving variously-sized, randomly-filled byte buffers. Server side code starts a local HttpListener and asynchronously waits for requests; once a request is received, the contents are asynchronously read and the end time is reported back to the client. Client side code creates a byte buffer of a specified size, fills it with random byte (to ensure there is no optimization for certain scenarios like zero-filled arrays), and sends this buffer via an HttpWebRequest to the listening server. It then waits for the server to respond with the time and displays the results. The time difference reported encompasses the difference (in ms) between the request being initiated and the data being fully read on the other end. See code samples at the end. A similar test scenario was constructed for C#'s TcpListener, and Mono's HttpListener for comparison. The graph below shows the results for this scenario with a 100MB body. The leftmost column, the baseline, is how much time it takes to simply copy a 100MB buffer from one memory location to another.
Alone not much can be gleaned from these results as the protocols are inherently different and therefore hard to fairly compare. However, consider the graph below which displays results for the same test conducted remotely, i.e. with the server and client on different machines. Here the performance between the different methods is nearly identical. This implies that there is a fundamental inefficiency in C#'s implementation of HTTP that causes this performance deficit in local requests.
The discrepancy between C#'s HTTP and Mono's HTTP implies that this inefficiency should be looked for in the lowest levels, as a main difference between the two is C#'s HTTP.sys kernel driver on which it's HttpListener implementation is built. Rerunning the above tests to measure CPU time instead of actual time provides some interesting insight: C#'s HTTP was using roughly three times the amount of CPU time as either of the other two methods.
More notable, this CPU time is roughly equal to the added time HTTP requests are taking via this implementation: just over 100 ms. Assuming that this is active CPU time, not the request waiting or being otherwise queued, this could imply as many as four additional copies of data being made during the processing of the request. And in fact it is safe to assume that this additional copying is exactly what is happening. Consider the graph below:
This graph shows CPU time versus payload size of the same C# HTTP request. Notice how the 200MB and 50 MB payloads take roughly double and half the CPU time of the same HTTP request with a 100 MB payload, respectively. This proves that the additional CPU time is not being taken by some standard overhead, but is directly related to the data size. This implies exactly that there is something in the C# HTTP implementation that causes additional data copies to be made on local requests.
Takeaway
What this investigation has discovered is an inefficiency in C#'s handling of local HTTP request causing unnecessary additional copying of data. As in the scenario that brought about this investigation, this can cause noticeable performance problems. More efficient alternatives include C#'s implementation of a TCPListener, Mono implementation of an HttpListener, or even memory mapped files where available. The problem mentioned in the introduction has been resolved using memory mapped files, bringing that particular request from a 90th percentile time of 1.15 ms to 0.50 ms. However, not wanting to leave a lingering inefficiency, all other local communication on this server was switched over to use the Mono HTTP implementation, with the migration taking little effort.
Code Samples
Server Side Sample Code, C#HTTP:
class HttpServer
{
private HttpListener _httpListener = new HttpListener();
public HttpServer()
{
_httpListener.Prefixes.Add("https://localhost:9999/");
_httpListener.Start();
RequestLoopAsync();
}
private async Task RequestLoopAsync()
{
while (_httpListener.IsListening)
{
var context = await _httpListener.GetContextAsync();
await ProcessRequest(context);
}
}
private async Task ProcessRequest(HttpListenerContext context)
{
try
{
byte[] requestData =
new byte[(int)context.Request.ContentLength64];
int bytesRead;
//read request data to the end
do { bytesRead = await
context.Request.InputStream.ReadAsync(
requestData, 0, (int)context.Request.ContentLength64); }
while (bytesRead != 0);
//record time after request is received
int endTime = DateTime.Now.Millisecond;
var endCPUTime = Process.GetCurrentProcess()
.TotalProcessorTime;
//report times back to client
using (var strmw = new StreamWriter(context.Response
.OutputStream))
{
strmw.Write(string.Format("{0},{1}", endTime, endCPUTime);
}
context.Response.Close();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
context.Response.Close();
}
}
}
Client Side Sample Code, C# HTTP:
public void makeConnection()
{
byte[] toPost = new byte[1024 * 50 * 1024];
Random r = new Random();
r.NextBytes(toPost);
try
{
var process = Process.GetProcessesByName("HttpServer")[0];
var request = (HttpWebRequest)WebRequest
.Create("https://localhost:9999/");
request.Method = "POST";
int startTime = DateTime.Now.Millisecond;
var startCPUtime = process.TotalProcessorTime;
request.GetRequestStream().Write(toPost, 0, toPost.Length);
string serverTime = new StreamReader(
request.GetResponse().GetResponseStream()).ReadToEnd();
var endTime = serverTime.Split(',')[0];
var endCpuTime = serverTime.Split(',')[1];
Console.WriteLine("real time difference: {0}",
Convert.ToInt32(endTime) - startTime);
Console.WriteLine("CPU time difference: {0} - {1}",
endCpuTime, startCPUtime);
}
catch (WebException e)
{
Console.WriteLine(e.Message);
}
}