Reducing Cost to Serve

Reducing Cost to Serve

Improve Software Performance - Do More with Less

My name is Matthew Woodings, and after many years in the software industry, I feel it’s time to share and help those who could benefit from the experiences and insights I have gained.? I have built software solutions from the ground up, overcoming the scaling and performance challenges at each pivotal moment from the first user to over two million users per day.? My journey has traversed every aspect of the software and hardware stack from the private data center to the public cloud.? I’m experienced in many software languages and supporting technologies, including mobile devices, networking, and databases.

This article is about software performance in general digestible terms.? I could go into technical jargon, however, it would be at the expense of readability and fundamentals.? For the seasoned troubleshooter, this may seem a little rudimentary, but we’ve all got to start somewhere.

Defining Performance

When we talk about performance, we’re talking about how we can maximize the resources we have available.? This may be to aid in growth, user experience, cost to serve, or other equally important metrics for your organization.

Resources are finite, and those in the data center will be acutely aware of this.? Even within the cloud with “limitless” resources there are still boundaries that need to be adhered to.

You may have been told, “It’s running slow,” or conversely, “It’s fast but too expensive. Can we reduce the footprint?” This situation can be challenging, and you may have wondered where to begin.

The first step in this journey is to take the anecdotal perception off the table.? How many times has someone’s “feeling”, or influencers’ experiences inadvertently become your performance metric?? It’s not grounded in science or data, so any attempt to disprove it becomes an almost unreachable goal, especially when the past may be romanticized.

Data is essential in creating a baseline, so future data points can be cross-referenced.? Agreeing on the baseline with stakeholders is almost as important as the data itself.? By aligning yourself with your stakeholders, you lay the foundations for success.

What’s My Baseline?

So where does one find this magical data?? My forte is client/server interactions, but the principle is going to be the same across most disciplines.? Before you start collecting, you need to determine areas of interest.? Useful areas could be:

  • Data: the information that is passed from system to system in various forms, and potentially manipulated on its journey.
  • Bandwidth: the potential amount of data that can be sent between systems within a period.? It is potential, as the theoretical limit may not always be achievable.
  • Base resources: the available foundational components of a system, such as CPU, memory, disk, etc.? In the world of cloud computing and the abstraction of these resources, you may be fooled into thinking you don’t need to worry about them, but that would be a mistake.? The cloud will let you expand and consume, for a price, but how would you know if you’re using those resources effectively?
  • Software-specific metrics: metrics from the various software and appliances that make up your solution.

You’ve identified some areas of interest, and it doesn’t have to be exhaustive.? Remember, you’re solving a problem, and as such, it is similar to solving a puzzle.? You’ll be shining a spotlight on areas and gaining valuable insights.?

There are many ways to collect this data, from built-in to open-source tools, exposing log files, third-party observation sites, and many more.? You will need to research and determine which tools and solutions best suit your requirements and budget.? Once you are collecting data, you also need to analyze this information to create plans for potential remediation, resource rightsizing, feature enhancements, and feature retirement.

To visualize some of the interpretations, we’re going to look at API responses for a theoretical site, and we’re going to reference standard request logs for data size, response times, and response codes.

Once you are looking at this data, you need to determine if it’s good, bad, or indifferent.? For example, seeing a response time for a web call is purely a snapshot, so assuming purely on that singular data point could be misleading.? You need to start tracking the mean and standard deviation of those calls.? If you slice those by times of day, you can see those metrics change throughout the day from peak loads to quiet times.? In this example, you should monitor multiple endpoints. ? Are some significantly different?? Would you expect this?? Does the data change greatly throughout the day?

As with many of these investigations, you will notice that you’re peeling back layers of the proverbial onion.? What can some of these observations mean?

  • Variability in response times between peak and low times.
  • Variability of response times between endpoints.
  • Variability of usage between endpoints.

You have gathered data and performed some initial analysis, building a picture of your system.? Was this expected?? Are there any surprises?? Have more questions arisen?? Were there endpoints you thought were in use but weren’t?? Were there endpoints that were used far more than anticipated?

Many of these questions would warrant an article themselves, and it’s these questions that drive the next steps.? So, what can you do once you have this data?? Well, you have the “what,” now you need the “why,” and from there you need “remediation.”

Why is this Happening?

Again, we need to pin to an example.? For this, we’ll look at the variable response times with occasional timeouts.? Timeouts, in this case, are when the API endpoint takes too long to respond.? If the endpoint in question, and there could be several, experiences this behavior you could have:

  • Too many users/queries hitting limited API server concurrency limits.
  • Query parameters that trigger too much data retrieval.
  • Backend storage pressure.
  • Slow connectivity to third-party systems.

There are many other alternatives, but let’s look at data storage pressure.? Are we querying too much information?? Are we querying suboptimally?? Are we querying too often?? Are we repeatedly querying for the same information more than once?? The data layer tends to be the most expensive as well as the most resource-constrained.? Minimizing the reliance on this layer is paramount in keeping performance high.? If you’re not happy with your observations and discoveries, congratulations, you’ve just been presented with an area of opportunity.

As you travel on this journey, you will gain insights you hadn’t realized and potentially go down some “rabbit holes.”? Always remember, this is an iterative process that you need to continually measure against and update your baselines accordingly.? Once you are more comfortable with the data, start placing alerts and triggering events when metrics start to deviate from the expected, and of course determine the “why.”

This is an essential process in the development/product lifecycle that aids in customer satisfaction and the financial viability of your solution.? Once integrated into your environment, many stakeholders can drive their roadmaps and decision-making from a solid data-driven foundation.

Next Steps

This article is a conversation starter that can take many roads, however, if this resonates with you, and you need help with your software and environments, let’s have a conversation.

Recommended reading: The Phoenix Project, The Goal

要查看或添加评论,请登录

Matthew Woodings的更多文章

  • The Humble Web Request

    The Humble Web Request

    The Unsung Hero of the Internet My name is Matthew Woodings, and after many years in the software industry, I feel it’s…

  • Migrating to the Cloud

    Migrating to the Cloud

    Be known for your solution, not the tools that support it. My name is Matthew Woodings, and after many years in the…

  • Grow Appropriately

    Grow Appropriately

    Scaling Software to Meet Demand My name is Matthew Woodings, and after many years in the software industry, I feel it’s…

  • Maximizing Potential

    Maximizing Potential

    What does it mean to lead a technical team? My name is Matthew Woodings, and after many years in the software industry,…

    1 条评论
  • The Sound Byte

    The Sound Byte

    Hosting operations and politics have much in common. They are both complex systems with many data points and metrics.

    2 条评论

社区洞察

其他会员也浏览了