"We Must Improve Our System Performance - What Should We Do?"?

"We Must Improve Our System Performance - What Should We Do?"

In this article, let's dive into the 1-2-3 of improving system performance. To start with, allow us to identify the significant performance bottlenecks:

  • Response time - the time between submitting a request and getting a response, usually in UI/UX-driven systems.
  • Latency - the time between process start to finish, for instance, in gaming or trading systems.
  • Throughput - number of requests or processes completed within a given time slot.

Yes, this might be oversimplified and lacking more cases. But what we need is a functional architectural approach, preferably that we take it from both business and engineering perspectives.

How to Identify Bottlenecks?

No alt text provided for this image
https://commons.wikimedia.org/wiki/File:Three_paths_from_A_to_B.svg

Once we identified the desired performance improvement, next came measurement and establishing the initial state.

We are after finding the exact deviations that take us away from the fastest path from A to B.

If we want improved response time, we need to identify those pieces of the application pathway that introduce delays. For instance, in microservices architectures, those can be unnecessary repeat service invocations, not optimized database queries, skipped caching, bad algorithms, and so on.

If we strive for lower latency, we can look at unnecessary I/O, network trips, access locks, and context switches.

And if we wish to increase throughput, then definitely in-memory data and distributed caching to reduce database reads or writes, 1st and 2nd level caches close to our clients, per service data stores with correctly and wisely implemented CQRS data pattern can go a long way.

But how would you be able to identify the right next step? So that we do not lose ourselves in the mind-boggling complexities of modern architecture? The keywords are observability and APM (application performance monitoring).

Observability and APM - Two Pillars of Controlled Change

I want to quote this very nice and concise article from IBM:

No alt text provided for this image
https://www.ibm.com/topics/observability

And to add, let me introduce the APM concept:

In the fields of information technology and systems management, application performance management (APM) is the monitoring and management of the performance and availability of software applications. APM strives to detect and diagnose complex application performance problems to maintain an expected level of service.

(taken from Wikipedia)

In essence, we are after exposing the exact details of all the pathways our application performs when running its functions.

  • What are the services invoked along the way?
  • What are the payloads transferred?
  • What are the time frames between ingress and egress?

In short, to make educated decisions, we must know. We must see where we are and what can be improved.

For instance, the engineer performing analysis of the observability output can identify a multitude of service invocations, which could be either reduced, batched, or parallelized. He could see database queries running over missing indexes or heavy joins. He could spot repeat data, which could easily be cached in CDN, a distributed cache, or even in in-process caches.

But only when we can list these real exposed issues and give them guestimates as to the value of their eradication and the effort to do so can we create a viable action plan we can trust.

I recommend reviewing this article to check out some of the leading APM tools on the market. Some of them also provide Observability features. For Cloud-hosted systems, Observability is also available from the Cloud providers themselves - great article on this topic on Medium here.

Plan of Actions

Let's start with the 3-steps plan:

1. Decide where you want to improve your system performance.

Be it by business value, technical debt, cost optimization, etc.

2. Use Observability and APM tools to identify serious bottlenecks causing the deviation from the minimum.

Redundant round trips and I/O, bloated optimized database access, high latency memory, CPU utilization, etc.

3. Choose your first targets using by simple formula max(value/effort).

Both values and efforts should be agreed upon between the key stakeholders plus internal and external experts to get a consensus for the action plan to be viable.

Here you go!

About the Author

Alexander Stern has combined more than 25 years of development, software, enterprise, and solution architecture experience while working closely with C-level executives to ensure software architecture and performance adherence to business needs, vision, and strategy.

He is available for short-term?Enterprise-Architect-as-a-Service?consultations to help businesses make their next evolutionary leap without putting the company they are entrusted with under unnecessary risk.

He can be contacted via email at alex.bfree@gmail.com or by text message (Whatsapp/Viber/Signal/Telegram) at +372 56815512

Kristina Chaurova

Head of Business Transformation | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team

1 å¹´

Alexander, thanks for sharing!

赞
回复

要查看或添加评论,请登录

Alexander Stern的更多文章

社区洞察

其他会员也浏览了