Perfect recipe to take down hosting machine by memory leak: Just log an SqlServer DbUpdate exception
Ashutosh Pareek
VP - Lead Software Engineer|.Net Enterprise Cloud Platform Architect|Innovator|C#, AWS, .Net , SQL, Web APIs, IBM MQ, Kafka, Terraform | Banking domain, ATM, EMV(Chip card tech)
Welcome back to another intriguing tech tale! I'm part of a superhero squad that folks reach out to when they've got a problem that's more perplexing than a Rubik's Cube, or when the issue is so critical that it's quicker to analyze the Bermuda Triangle than fix it.
One day, a team sent out a distress signal: the memory usage of one of their Windows services was skyrocketing to 50GB+ in just a few minutes, and it was crashing their machine faster than a toddler on a sugar high. The only code change they'd made was updating all their libraries to .NET 6, and this memory monster only seemed to rear its head in the super-secret prod environment. They'd been wrestling with this beast for days.
I told our prod support team to take a memory dump for us. Their reply? "Boss, we can't do it." I asked why, and they said, "The moment we get a memory alert, by the time we log in, it's already at 40GB, and we don't have enough disk space to save that monster." After a few days of struggle, they finally managed to capture a 32GB memory dump.
Now, the problem was, my favorite memory dump analysis tool, WinDbg, wasn't on my machine. And neither was DebugDiag. So, I reluctantly turned to VS 2022, but it threw in the towel, saying the dump file was too massive. I even gave ReSharper's dotMemory a whirl, but it gave me the same cold shoulder. Desperate times call for desperate measures, so I decided to give dotnet-dump CLI a shot, and guess what? I'm head over heels in love with it after just one day!
I loaded the dump file into dotnet-dump, and the heap stats pointed me to four types from the Serilog namespace, collectively hogging 30GB of memory on the heap. It was crystal clear – this was a logging issue. But there hadn't been any check-ins that should have caused this madness. So, I played my ace: the "gcroot" command to trace the memory reference of these types on the heap.
What I found was jaw-dropping. There were millions of log event properties. But how could that even be possible? To dig deeper, I picked one memory reference and looked at the thread and stack trace. To my astonishment, there was a DbUpdateException that had never made it to the log file. But this exception, when logged, caused millions of log events. Mind-boggling!
These folks were using a library that wrapped around the popular Serilog logging library. It had a seemingly innocent method, "LogException(ex)," which internally destructured the exception object by logging every property and their subtypes. Like so: "Logger.Error("{@Exception}", ex)." But how could that simple line unleash millions of log events?
领英推荐
So, I conducted a quick test – I tried to save authors in the database with a property set to null for a column that was non-nullable. As expected, I got a DbUpdateException. But on the console, all the DbSet in the context was logged, which meant the entire database was loading into memory before being logged.
It finally dawned on me that the DbUpdateException had a reference to the DbContext, and during the destructuring process, Serilog started querying the DbSets of the DbContext. This was the exact same catastrophe that had unfolded in production, where the entire database was loading into memory!
Log.Logger = new LoggerConfiguration().WriteTo.Console()
??? .CreateLogger();
?
using (var context = new PubContext())
{
??? context.Database.EnsureCreated();
}
AddAuthorsWithBooks();
void AddAuthorsWithBooks()
{
??? using var pubContext = new PubContext();
??? pubContext.Authors.Add(new Author()
??? {
??????? FirstName = null, //====> Assigned null to non-nullable col
??????? LastName = "Kushwaha",
??????? Books = new List<Book>()
??????? {
?????? ?????new Book(){Title="MemoryLeak", PublishDate = DateTime.Now},
??????????? new Book(){Title="Ef+Serilog", PublishDate = DateTime.Now}
??????? }
??? }) ;
??? try
??? {
??????? pubContext.SaveChanges();
??? }
??? catch (Exception ex)
??? {
??????? Log.Logger.Error("{@Exception}",ex); //====> This will log entire DB
??????? throw;
??? }
})
Now, the big question was why they'd never encountered this before. As it turned out, in earlier versions, they were using EfCore 3.1.0, in which the exception object didn't return the Entries. But in the higher versions of EfCore, those entities came along for the ride. They only ran into this problem in production because they never got the exception in lower environments.
I've always had a beef with Serilog's destructuring feature due to performance concerns. But now, I've got another bullet in my chamber to never use it again. Instead, it's just going to be a simple "Log.Logger.Error(ex, "Exception");".
Software Engineer at Total Quality Logistics
10 个月This very same problem has crippled two of our developers for a week now. We still haven't uncovered an elegant solution, and what's worse, there's something under the hood that's logging even before it hits our catch block. No global exception handler, it's either in a package or library that we're using, or logging by default to something and just popping that context in each of our "Entries". Nightmare of a bug, and all thanks to 3rd party libraries.