Really need to look at the bigger picture
Craig Watts
Believes Support is more exciting, dynamic and much more interesting than Implementation. But doesn't understand why others disagree.
Welcome to a piece born out of frustration. That frustration being Subject Matter Experts continually providing advise without taking into account all the factors which could impact the decisions. I harp on about it all the time, when it comes to System Performance you need to go beyond the technical data and combine it with an understanding of how the solution is being used. Today I'll use a solitary example to prove a point and show how simply following the rules of engagement could have a detrimental impact on your solution. Today's subject matter is likely one of the most misunderstood and often misused pieces in the Dynamics AX System Performance resolution kit bag. Today I'm talking about Cache.
For the purposes of my example I think I need a customer as a reference point. After thinking long and hard I seem to have found one. Through all my reading they likely have one of the most diverse and highly skilled IT teams in the world. They are a Bricks and Mortar retailer who have made the leap into e-Commerce with great success resulting in rapid growth. Some of you may have heard of them, as the example company I'll use today is called Tailwind Traders. For those who haven't heard the name, this is the company Microsoft use in a lot of their training materials and alas I also have to partake in a bit of training every now and then. For the purpose of what I'm about to go through their structure is almost perfect.
Here's the scenario, Tailwind have a performance issue. Well actually they have a few and we'll get to that. The particular problem I want to discuss is related to INVENTTABLE as there is a query against it which is running 5 million or so times a day and it's causing a little bit of havoc in the database. The answer here is pretty obvious, looking to lift load from the database for a frequently executed query, well let's put it in Cache. I'd most likely use the CacheLookup option of Found for this one. All good now, issue sorted.
Actually no it isn't, you see out of the box the INVENTTABLE already has the cache option set to Found. So why do we still have so many database calls? Let's dig a little deeper. Cache Misses are sitting at about 99% even though we have a large number of cache writes. In fact we are writing so much to cache that the cache file has now gone to disk. Outside of the additional strain on the AOS what is the performance impact of this? Assuming each cache read takes 1 millisecond remembering that the first thing for it to do regardless of whether we are writing or reading cache is to see if the value is already there. This time, of course, will increase if we've gone to disk but 1 millisecond is easier to work with. It also takes the same amount of time to write. All of this for no net gain as we're going to the database regardless. So that's an extra 10 million milliseconds per day processing time. Almost 3 hours of processing time we didn't need to spend. Of course the solution here is pretty simple, remove the table from cache and address the issues at the database level.
I know a number of people who would use the above as a reason not to use caching as a performance rectification step at all, although most of them are DBAs so when cache is successfully implemented it becomes sight unseen for them, they don't tend to like not being able to see queries executing . The thing is caching does have it's place in the tool kit and can at times be incredibly effective, so why didn't it work in this scenario? This is where we need to look at system usage as opposed to technical output. The query in question is generated by the find() method on the INVENTTABLE. That method is called from multiple places be it the generation of a Sales Order line or looking up the item details on the web page. Tailwind have operations in 10 countries and a SKU count of 150,000+. On a positive note the item numbers are common across all regions. Let's assume they're in Apparel. For cache in this scenario to be effective everyone would need to be purchasing the same item, from the same region, in the same colour, in the same size, within the same cache window and all the transactions would need to be executed on the same AOS. Well maybe not everyone, although a fair few would be quite handy given that for cache to be effective you need to have commonality in the requests being made.
Guess we fix this one at the database then and we manage to achieve that to a degree. But as I said earlier this is one of a number of issues and at some stage Tailwind concede defeat and call in the Experts. The experts come armed with not only expertise (it's in the name) but also of set of rules and guidance often taken from a document titled Advanced Performance Optimization Guidelines. Before I go any further with this I'd like to quickly point out that I also utilise this particular document and most, if not all, of the recommended outputs, there are a few other things I use as well but will save that for another day. So the experts are in and they're gathering data, analysing said data and will produce a document at the end of the process which outlines a combination of observations and recommendations. While the format of this output will differ from Vendor to Vendor the general approach is usually quite similar in that an issue will be observed and a recommendation will be given to correct it.
This is the point where my general frustration kicks back in. While not always the case there is a strong tendency for the recommendations to take a one size fits all approach. They are often purely technical in nature, which I can understand as all the information that is being gathered is also of a technical nature. But as we know or at the very least should have picked up on by now, when it comes to System Performance it is very rarely a one size fits all resolution. Or to quote Monty Python, 'We are all individuals. We are all different.'. Would the person at the back of the room right now with their hand in the air yelling, 'I'm not' please sit down again because you are. Especially when it gets to the point of having to get the experts in to review the system as now you're at the stage where you are not sure what to do next. Wouldn't you expect the local team to be able to fix the problem if the resolution was globally common? At the very least you'd be able to find it on Google by now. Which leads nicely into another point of frustration.
This one comes from my end customer time and I'm determined not to fall into the same trap now that I'm back on what I've on occasion heard called the dark side, by the way it was me who called it that, another hangover from the end customer time. As I said earlier, Tailwind conceded defeat and called in the experts. Think that through, they must have tried something to fix their issues, likely many things. It's also worth remembering they're diverse and highly skilled. So pray tell why does nearly every version of this type of review document seem to have a strong focus on base principles? Don't get me wrong I've seen systems which don't seem to follow any principles, let alone the base ones but very rarely at this stage of the process. Put another way, teaching a DBA how to suck eggs will most likely not get you a second invite. It also has to be remembered that we just conceded defeat here. How about a bit of a chat about what we did and more importantly why we did it? It's also not a good look to make a recommendation to do something which has already been tried and failed.
That feels better, as they say a frustration shared is a frustration halved. Before I go there's one last thing I'd like to say. Any of you who may have seen my other articles have probably picked up on that when I use scenarios they tend to be fairly real and that is also the case here. Tailwind did get the experts in, they gathered all their data and generated a report full of recommendations. They also managed to pick up on the inventory query and made a strong recommendation. I don't think I need to tell you what that recommendation was, let's just say the word Found could be seen amongst the text.
Thank you for reading and good-bye until next time.
|| Business Improvement Specialist || Business Strategy || Enterprise Architecture || Business Architecture
4 年Craig I have seen nothing in the past 20 years that suggests projects do not need the 3Rs. Requirements, roles, responsibilities. SMEs are NOT experts in business in general. They are experts in THAT business and need to identify and open doors for architects, solutions designers, and analysts. Org change people are NOT experts in business in general and need to stick to their role and responsibility in changing the HUMAN structures. THE PROBLEM is and always will be the lack of understand of what is architecture and what is engineering. And, you DO NOT have ANY architecture if you do not have business architecture. If you don't have your SWOT, strategy, business plans, processes, people, correctly identified and documented to the level where it is data driven you don't have any architecture. What you have is pretty PowerPoints and wall charts that mean nothing and don't drive anything in any usable format that can inform people who implement software.
Senior Product Manager Automation - Sonic Healthcare
4 年I feel privileged to have had those car conversations with you Craig. A great tension relief and always insightful