Reducing Cost to Serve
Matthew Woodings
CTO/Engineering VP ? Architecting & Bringing Enterprise Software/Hardware Products to Market ? Information Architecture ? Cloud/SaaS ? Automation ? Virtualization ? Operations/Vendor Management ? Strategic Planning
Improve Software Performance - Do More with Less
My name is Matthew Woodings, and after many years in the software industry, I feel it’s time to share and help those who could benefit from the experiences and insights I have gained.? I have built software solutions from the ground up, overcoming the scaling and performance challenges at each pivotal moment from the first user to over two million users per day.? My journey has traversed every aspect of the software and hardware stack from the private data center to the public cloud.? I’m experienced in many software languages and supporting technologies, including mobile devices, networking, and databases.
This article is about software performance in general digestible terms.? I could go into technical jargon, however, it would be at the expense of readability and fundamentals.? For the seasoned troubleshooter, this may seem a little rudimentary, but we’ve all got to start somewhere.
Defining Performance
When we talk about performance, we’re talking about how we can maximize the resources we have available.? This may be to aid in growth, user experience, cost to serve, or other equally important metrics for your organization.
Resources are finite, and those in the data center will be acutely aware of this.? Even within the cloud with “limitless” resources there are still boundaries that need to be adhered to.
You may have been told, “It’s running slow,” or conversely, “It’s fast but too expensive. Can we reduce the footprint?” This situation can be challenging, and you may have wondered where to begin.
The first step in this journey is to take the anecdotal perception off the table.? How many times has someone’s “feeling”, or influencers’ experiences inadvertently become your performance metric?? It’s not grounded in science or data, so any attempt to disprove it becomes an almost unreachable goal, especially when the past may be romanticized.
Data is essential in creating a baseline, so future data points can be cross-referenced.? Agreeing on the baseline with stakeholders is almost as important as the data itself.? By aligning yourself with your stakeholders, you lay the foundations for success.
What’s My Baseline?
So where does one find this magical data?? My forte is client/server interactions, but the principle is going to be the same across most disciplines.? Before you start collecting, you need to determine areas of interest.? Useful areas could be:
You’ve identified some areas of interest, and it doesn’t have to be exhaustive.? Remember, you’re solving a problem, and as such, it is similar to solving a puzzle.? You’ll be shining a spotlight on areas and gaining valuable insights.?
There are many ways to collect this data, from built-in to open-source tools, exposing log files, third-party observation sites, and many more.? You will need to research and determine which tools and solutions best suit your requirements and budget.? Once you are collecting data, you also need to analyze this information to create plans for potential remediation, resource rightsizing, feature enhancements, and feature retirement.
To visualize some of the interpretations, we’re going to look at API responses for a theoretical site, and we’re going to reference standard request logs for data size, response times, and response codes.
领英推荐
Once you are looking at this data, you need to determine if it’s good, bad, or indifferent.? For example, seeing a response time for a web call is purely a snapshot, so assuming purely on that singular data point could be misleading.? You need to start tracking the mean and standard deviation of those calls.? If you slice those by times of day, you can see those metrics change throughout the day from peak loads to quiet times.? In this example, you should monitor multiple endpoints. ? Are some significantly different?? Would you expect this?? Does the data change greatly throughout the day?
As with many of these investigations, you will notice that you’re peeling back layers of the proverbial onion.? What can some of these observations mean?
You have gathered data and performed some initial analysis, building a picture of your system.? Was this expected?? Are there any surprises?? Have more questions arisen?? Were there endpoints you thought were in use but weren’t?? Were there endpoints that were used far more than anticipated?
Many of these questions would warrant an article themselves, and it’s these questions that drive the next steps.? So, what can you do once you have this data?? Well, you have the “what,” now you need the “why,” and from there you need “remediation.”
Why is this Happening?
Again, we need to pin to an example.? For this, we’ll look at the variable response times with occasional timeouts.? Timeouts, in this case, are when the API endpoint takes too long to respond.? If the endpoint in question, and there could be several, experiences this behavior you could have:
There are many other alternatives, but let’s look at data storage pressure.? Are we querying too much information?? Are we querying suboptimally?? Are we querying too often?? Are we repeatedly querying for the same information more than once?? The data layer tends to be the most expensive as well as the most resource-constrained.? Minimizing the reliance on this layer is paramount in keeping performance high.? If you’re not happy with your observations and discoveries, congratulations, you’ve just been presented with an area of opportunity.
As you travel on this journey, you will gain insights you hadn’t realized and potentially go down some “rabbit holes.”? Always remember, this is an iterative process that you need to continually measure against and update your baselines accordingly.? Once you are more comfortable with the data, start placing alerts and triggering events when metrics start to deviate from the expected, and of course determine the “why.”
This is an essential process in the development/product lifecycle that aids in customer satisfaction and the financial viability of your solution.? Once integrated into your environment, many stakeholders can drive their roadmaps and decision-making from a solid data-driven foundation.
Next Steps
This article is a conversation starter that can take many roads, however, if this resonates with you, and you need help with your software and environments, let’s have a conversation.
Recommended reading: The Phoenix Project, The Goal