登录查看更多内容

If It Works... Don't Touch It?

Ashutosh Pareek

VP - Lead Software Engineer|.Net Enterprise Cloud Platform Architect|Innovator|C#, AWS, .Net , SQL, Web APIs, IBM MQ, Kafka, Terraform | Banking domain, ATM, EMV(Chip card tech)

发布日期: 2024年5月25日

A few weeks ago, I found myself in the middle of a interesting tech mystery: our UI was randomly throwing 502/504 errors. Diving into the logs, I noticed something bizarre—lightweight calls, like our liveness check, were taking up to 15 seconds. That's like asking someone for the time and them taking a coffee break before answering. I started to suspect our requests were being throttled at Kestrel, causing them to queue up so long that the load balancer threw a fit and gave us those lovely 502/504 errors.

The team mentioned this issue had cropped up with the latest release and had never been seen before. So, I began the fun task of scrutinizing the new code changes for any blocking calls. Surprise! There were none. Instead, several changes had been made to convert blocking calls to non-blocking ones, which really threw a wrench into my initial hypothesis.

Determined to crack the case, I set up the API locally and bombarded it with production load using JMeter. I wielded dotnet-counters like Sherlock's magnifying glass to check for thread pool starvation. Sure enough, the metrics showed that after the initial load, the thread pool was as busy as a one-armed paper hanger. The code was straightforward: one API call led to a database call, followed by spawning 10 to 20 tasks with Task.Run, each making another database call and an HTTP API call. All these were non-blocking, with awaits in the right places, so I expected threads to return to the pool quickly.

But nope! After running the load and taking multiple memory dumps, I discovered that all threads were on an extended coffee break while creating Oracle DB connections and adding them to the DB pool. Oracle finally fixed this after a month, which explained why we were seeing issues during initial load.

Still puzzled why this issue was tied to the new code since the Oracle client library version hadn't changed, I revisited the pull request. There it was, hiding in plain sight: a PR review comment stating, "It's best practice to use Task.Run instead of Task.Factory.StartNew." Previously, the team used Task.Factory.StartNew with the long-running option, which didn't use thread pool threads. The change to Task.Run used thread pool threads, effectively putting them on babysitting duty when Kestrel needed them to serve incoming requests.

领英推荐

Compilation Method for Rockchip Driver

Forlinx Embedded Technology Co.,Ltd. 5 个月前

Accessing the Record Type in C# for Earlier .NET…

Nick Cosentino 2 年前

FLaNK Weekly for 11 December 2023

Tim Spann 1 年前

Key Takeaways:

Why was the change suggested from Task.Factory.StartNew to Task.Run?

Almost all articles will tell you Task.Factory.StartNew is bad and should be replaced with Task.Run for I/O calls. But few mention that Task.Run uses thread pool threads. In web APIs, Kestrel and your code share the same thread pool. Blocking threads means Kestrel might be left twiddling its thumbs, waiting for new threads to process incoming requests. So, avoid using Task.Run for anything that might block a thread for a long time.

What was the potential benefit of changing from Task.Factory.StartNew to Task.Run?

This change would have increased the app's scalability since all calls were non-blocking I/O calls. But did we need this scalability for the business functionality? Nope. This change was made as part of a best practice. Which brings us back to our software engineering mantra: "If it works, don't touch it." Always ensure that changes are necessary and beneficial through rigorous testing—otherwise, you might end up fixing something that wasn't bgroken in the first place.

Ashish Mishra

9 个月

Good one.. thanks for sharing Ashutosh Pareek

1 次回应

要查看或添加评论，请登录

Ashutosh Pareek的更多文章

The Day the Database Disappeared: Lessons Learned from a GitLab Incident

2024年9月7日

The Day the Database Disappeared: Lessons Learned from a GitLab Incident

Imagine a typical day at GitLab, with everything running smoothly. Suddenly, in a split second, chaos ensues.
A Conversational Guide to Certificates and Secure Communication Protocols

2024年4月17日

A Conversational Guide to Certificates and Secure Communication Protocols

Introduction One of my friend Ashish asked me for a secure vault where he can store certificate, although I knew where…

2 条评论
Fastest way to DeepCopy the objects in C#

2024年2月18日

Fastest way to DeepCopy the objects in C#

Introduction Just a few days back, a development team approached me with a sticky situation—they were wrestling with…
Optimizing Oracle Database Performance in C#/jdbc: FetchSize

2024年2月6日

Optimizing Oracle Database Performance in C#/jdbc: FetchSize

Who should read this? Anyone who want to increase performance of their DB Queries by decreasing round trips to DB…

1 条评论
Hijacking a Method in C# - The Thrilling Tale of CorpB's Misadventure!

2023年10月20日

Hijacking a Method in C# - The Thrilling Tale of CorpB's Misadventure!

Disclaimer: The following story is entirely a work of fiction and should not be attempted in the real world. The events…

4 条评论
Perfect recipe to take down hosting machine by memory leak: Just log an SqlServer DbUpdate exception

2023年2月4日

Perfect recipe to take down hosting machine by memory leak: Just log an SqlServer DbUpdate exception

Welcome back to another intriguing tech tale! I'm part of a superhero squad that folks reach out to when they've got a…

2 条评论
Mutual TLS Auth inside Cloud Environment and may be a possible bug in HttpClient .NetCore library

2022年8月6日

Mutual TLS Auth inside Cloud Environment and may be a possible bug in HttpClient .NetCore library

Use Case I was calling a Rest API from EKS Pod and lambda function which is protected by basic authentication. Lambda…

3 条评论
Dynamic Grouping of Entities

2022年1月20日

Dynamic Grouping of Entities

Objective Our objective here is to do dynamic input-based grouping of data entities via APIs/Services. Its very common…

1 条评论
Theory of Black Swan

2021年7月4日

Theory of Black Swan

Hey there, fellow seekers of the quirky and unexpected! Picture this: a cozy evening, me diving into a book, thinking…

3 条评论
An enlightening conversation over naming conventions with an oldMan

2021年3月19日

An enlightening conversation over naming conventions with an oldMan

The age-old debate on naming conventions in coding - a topic that's sparked countless arguments and even led to the…

1 条评论

See all articles

If It Works... Don't Touch It?

Ashutosh Pareek

VP - Lead Software Engineer|.Net Enterprise Cloud Platform Architect|Innovator|C#, AWS, .Net , SQL, Web APIs, IBM MQ, Kafka, Terraform | Banking domain, ATM, EMV(Chip card tech)

领英推荐

Key Takeaways:

Why was the change suggested from Task.Factory.StartNew to Task.Run?

What was the potential benefit of changing from Task.Factory.StartNew to Task.Run?

Ashutosh Pareek的更多文章

社区洞察

其他会员也浏览了

C# 9.0 Features

Node 22 is the new current!

New Attributes with C++20

List of 25 Load runner functions which solved 80% of scripting challenges

FLaNK Stack Weekly for 18 September 2023

Modern C++: Safety and Expressiveness with override and final

Understanding #ifndef and #define in C++ Header Files: Why They Matter

Setting Reasonable Limits (and testing them)

A simple pair of eBPF tracepoint handlers

C++20: Coroutines with cppcoro

领英推荐

Key Takeaways:

Why was the change suggested from Task.Factory.StartNew to Task.Run?

What was the potential benefit of changing from Task.Factory.StartNew to Task.Run?

Ashutosh Pareek的更多文章

The Day the Database Disappeared: Lessons Learned from a GitLab Incident

A Conversational Guide to Certificates and Secure Communication Protocols

Fastest way to DeepCopy the objects in C#

Optimizing Oracle Database Performance in C#/jdbc: FetchSize

Hijacking a Method in C# - The Thrilling Tale of CorpB's Misadventure!

Perfect recipe to take down hosting machine by memory leak: Just log an SqlServer DbUpdate exception

Mutual TLS Auth inside Cloud Environment and may be a possible bug in HttpClient .NetCore library

Dynamic Grouping of Entities

Theory of Black Swan

An enlightening conversation over naming conventions with an oldMan

社区洞察

其他会员也浏览了

C# 9.0 Features

Node 22 is the new current!

New Attributes with C++20

List of 25 Load runner functions which solved 80% of scripting challenges

FLaNK Stack Weekly for 18 September 2023

Modern C++: Safety and Expressiveness with override and final

Understanding #ifndef and #define in C++ Header Files: Why They Matter

Setting Reasonable Limits (and testing them)

A simple pair of eBPF tracepoint handlers

C++20: Coroutines with cppcoro