Performance and Scalability Essentials -- Selecting an APM product
It is today but a common fact in the industry circles that scalability and performance is the big gorilla in the room - Right from that large 200,000 employees strong organization occupying prime real estate in the silicon valley or in Bangalore's outer ring road, to that just around the corner start-up whose idea has become instantly successful overnight. Given the huge choice of employment opportunities available to engineers today, especially the ones who go the extra mile in making scalability and performance happen; it is but natural that they would expect the best of the class APM products are made available to them at the workplace. Engineering productivity enablement has never been such a big factor in employee motivation and retention as is today. Instantaneous provisioning, containerized development environments, automated harnesses, cloud productivity tools.... you name it, you can bet your last rupee that companies are working ever harder to bring the best of these engineering productivity tools to their employees. APM is but an essential part in the mix, having used several home-grown and off the shelf products over the last several years I can definitely tell you it is a must-have tool made readily available to ALL stakeholders in the ecosystem. If you think that an APM product is but another entry to have a tick mark next to it in the annual goal list of some higher up, then you could not be further disconnected from the reality of high-performance engineering.
With a wide variety of established players in the market and a whole line-up of "also-ran" APM products out in the wilderness, how do you put a method to the madness of selecting a good product for your organization and its teams? Here are some key factors to keep in mind while you are at it,
- Pedigree matters: I cannot but overemphasize the importance of this - DO NOT BECOME A BETA TESTER of an APM product!!! There is a glut of wannabe "APM" products out there in the wilderness. Every Tom Dick and Harry wants to have a slice of the APM pie. Let's face it, APM licensing is an expensive game and a high margin one at that, everybody wants in. Your typical APM product isn't one of those products you can quickly prototype and put together in a few months and then incrementally build on the top of. The bus for that has left a long time back. Building an APM product needs a distinct pedigree of several years of precise instrumentation of high-performance systems and ecosystems, right from the honchos at the top down to that engineer building that APM product. Even the established players with their time tested and battle-hardened instrumentation stacks and algorithms struggle to get the basics right with an ever-changing landscape which can be best classified as a moving goalpost. I know of cases where players who have a near decent product in some other area "forayed into" the APM game to provide "seamless end to end visibility" of the ecosystem. The argument that "ecosystems are large spaces and we help cover the entire picture" isn't exactly the core mandate of APM solutions. That, precise instrumentation of tech stacks is a brutal beast to master is but for sure a fact that would be cast aside in such cases. You just cannot hire a Vice President from a CRM information systems background and a product owner from a financial systems background, equip them with an engineering team and expect them to deliver a precise instrumenting APM product. That just does not happen on a regular day! You do not want to be the customer explaining the basics of JVM crashes, imprecise call stack trees and pointcuts to a "Vice President" or a "Chief Product Owner" or a hapless/technically challenged sales/account management chap - Your stack performance issues are enough of a pain already! Evaluate the pedigree of the organization first and foremost!!! Have they been in the APM game for a decent period of time? Are they thought leaders in the APM space? Does their hierarchy have individuals stacked up who can hold a good conversation with you on the topic of performance engineering? Do they have good established teams working on the product? Or would they rather focus on their core products and relegate their APM solution customers to a step-motherly treatment? Is the internet strewn with user questions of their products from individuals logging in from Elbonia providing vague/clueless responses to the same? You are in the market for a product/tool which can help you solve scalability/performance issues - make sure you select one with a good pedigree in this case. No, you cannot quickly and successfully "open source" an APM solution into your organization's teams without significantly investing into it, again with high pedigree performance engineering talent who have built a successful APM product in the past. Now good luck finding that in short notice!
- The APM essentials:
For any APM product to pass the smell test, the below are absolute essentials to have - Now - this also depends to some extent on the tech stack at hand. The lines between Application core performance and cloud/infrastructure monitoring is blurring by the day. Never for a minute underestimate the importance of the former. The "overall" picture as some vendors pitch is incomplete and superficial if you do not know what is precisely happening in the trenches of your individual applications and their associated code ecosystem.
a. Precise tech stack instrumentation: Be it Java, Node or Ruby - Is the APM product able to give you accurate instrumentation in the language and associated you have running? Is it able to give you precise call traces, values of the variables in question, the various threads/processes at play in that precise snapshot, the associated infrastructure aspects contributing directly to the problem at hand etc. No, we are not talking vague flashy colorful dashboards or "machine learning" alerts, we are talking blood and guts here, the down-to-the-atom detail your engineer is going to understand and have it reflect in the code. I know of a product whose flashy brochures indicated accurate Java instrumentation, the devil in the obscure documentation was that doing so would add about 40-50% overhead on the system's performance. Several JVM crashes and rounds of explaining it to their "chief product owner" who did not understand what a properly put together call stack looks like in a high-performance JVM - we began to wonder if we were the beta testers or the customers whose problems had to be solved by the vendor! Don't let shiny brochures, "executive" level bar connects/conversations and sweet talking sales folks who wouldn't know the difference between a VM and a JVM tell you otherwise, there is no getting away from the blood and guts aspect of APM products. Take my word for it, Don't find this out later for yourself with frustrated engineers walking out of the door not wanting to put up with superficial information made available to them.
b. Real User Monitoring: No, this is not a gimmick and no I am not going to "quickly" and "in brief" explain it to you. It's that critical in making sure your APM product is useful to all the stakeholders in the game! Spend some time understanding what real user monitoring capabilities are and make sure you don't overlook it in your decision. Make sure the APM product offers several views and perspectives on this crucial information. Also, check for possibilities in close integration with synthetic monitoring products if its already available in your organization's landscape.
c. Rapid search/visualization for core details: Make sure the stakeholders in your ecosystem can quickly build dashboards for the core information that they require. For example, can a developer quickly re-use/tweak an existing JMX dashboard or can a UI designer quickly visualize details of usage patterns around page load times? If achieving any of this requires several man-hours of "release activities" or writing reams of "custom" instrumentation code or installing a million tiny agents in the ecosystem then you have better things to do with your time.
d. Real-world on-the-ground Intelligent insights into potential problem areas: Don't be "flashy brochured" or "ML/Blockchained" - Instead take a deep look into if the APM product can give you real-world intelligent suggestions/trends/insights into the problem area. Deeply woven capabilities such as this into your high-level visualizations can do wonders in ensuring trends are detected and systemic/architectural changes brought in to prevent a recurrence.
e. Underlying plumbing and architecture: A system built to instrument and help you solve performance issues better be built on a rock-solid platform having followed rock-solid high pedigree engineering principles. I once happened to dive deep into the "open-sourced" code of a APM solution after it gave us quite a good time in adding enough overhead on our system and thus pounding it into the ground. Upon inspection, I was appalled at the overall approach by which the code had been put together. It was clear that this was a hastily put together pile of code with nary a consideration of the overhead it would add to the underlying system it was designed to instrument/monitor. The "chief product owner" managed to pretty well beat the bush around such concepts as sidecar architecture or a single agent approach instead of a million agents doing individual things each with its own set of associated problems. ! A discussion with the "lead" engineer left much to be desired on their handling of class loader specifics and the associated problems therein. Worse, their core engine was purchased off the shelf from some third-party provider!!! It was as primitive as things could get in the APM universe, now do you want to get into such a situation after deals have been signed and the cash has begun to flow to the vendor?
When you evaluate an APM product, make sure you put in some evaluating how the product's architecture looks like. The established players have no issues publishing some info on the open internet. Of course, their patented algorithms are not out in the open but no decently built APM product would have its details hidden from the public. During product evaluation decisions, make sure you take someone very knowledgeable in the plumbing of the language/tech stack you are dealing with. Some probing questions to the product owners/solution architects should quickly give you an idea of the pedigree/thought process that went into the APM product. Do not miss out on this step, you do not want suit toting executives taking these core technical decisions for you, some things are best left to the experts in that specific field.
f. Precise overhead numbers matter!!: A system which is designed to precisely instrument and monitor your systems better have precise overhead numbers. If you see vague numbers in brochures or in discussions with the vendor, spend the next few seconds looking for the nearest exit to the meeting room. Or you could find out the exact overhead numbers in production systems with the full sympathies of your customers sent your way as the APM product works its magic!
g. Homegrown capabilities matter!!: Face it, your company/organization tried building a half-decent APM capability of its own in the past and left much to be desired. This probably explains why you are in the market for an established off-the-shelf solution. Should you throw away your home-grown solutions once you sign the agreements on this new product? Some of the stuff in there would have been for some very specific contextual needs of your ecosystem/business realities. Make sure integration doors and gates exist in the new shiny APM product which allows you to cherry-pick off your homegrown stuff and enable synergies where required.
3. Feedback from the teams matter: One important thing to do is to make sure you release initial trial license offerings of the APM product to your engineering teams and solicit feedback. No one can call the bluff on a wannabe APM product like your engineering teams. Make sure your sample size involves stakeholders of every variety. Make sure you do this before inking agreements and get locked in with a vendor.
4. Talk to other customers/organizations with a pedigree: Wait, isn't this one of those regular advise items from any plain vanilla product evaluation checklist? Wrong! Be pretty sure that an APM product should have a decent set of customers and be pretty sure that some of them would have scalability/performance challenges worth writing home about. Identify and talk to those internal/external organizations on the real world performance of the APM product. You would immensely benefit from such conversations especially the blood and guts details.
To summarize, the APM evaluation process isn't a regular plain vanilla game. Make sure you have enough technical skills firepower and real world PoCs thrown into the mix before you ink that agreement. After all, becoming a beta tester to a flashy shiny new APM product is probably last on your TO-DO list!
Assistant Manager at Bank
4 年Interesting, seems to have done a lot of? market research, and not some just, paperwork. good,? keep it up
VP | Office of the CTO at SAP | Cloud | IoT | Industry 4.0 | GenAI | Sustainability | Product Mgmt | Author | Public Policy | Harvard PLC Student
4 年Good read Yogi!!