The Art and Science of AIX Performance -- Part V: Intuition and Instinct
An article series dedicated to AIX and the methodology of attacking and resolving performance problems.
Article that originally appeared in "IBM Systems Magazine". Copyright MSP Communications and Mark J. Ray.
By Mark J. Ray
03/01/2018
Over the course of this article series, we’ve developed a methodology to attack and resolve performance problems. We've laid the foundation of our remediation efforts by developing a coherent and complete history-taking formula. We've learned about the importance of keeping current on system and device firmware, along with the importance of keeping careful and up-to-date system configuration files and logs of many different types. By examining the basic, high-level utilities we can use to monitor the four basic performance resource groups―CPU, memory, networking and storage―we've learned that developing a personal methodology is critical to success. Building further, we've explored some of the more advanced drill-down tools that are available in AIX to pinpoint the issues that the more basic utilities only hint at.
In short, the first four installments in this series should give you a feel for how your system behaves under various conditions and how we, as performance managers, must react to that behavior. Hopefully you're also starting to understand something I've mentioned throughout this series, but will emphasize in this, the concluding article: solving AIX performance problems requires healthy doses of intuition and instinct.
While not technical skills, intuition and instinct are nonetheless as important as any nuts and bolts ability. They are a big part of the “art” in "The Art and Science of Performance." I can't stress this enough: without intuition and instinct, you will go nowhere in your performance practice.
In any endeavor, be it scientific or mathematical, book learning only takes you so far. In the case of AIX performance, you may know the stats, monitor and trace-based diagnostic utilities inside and out. You may thoroughly know what every CPU, memory, storage and networking kernel tunable does. This is all well and good, but it's not nearly enough. You must also develop a feel for the systems you manage.
I realize this might sound strange coming from someone who's spent nearly a quarter century in this field―I managed my first AIX system (v2.0!) in 1994―but it's a lesson I've learned over and over. It doesn’t matter whether you administer the same systems every day or if you’re a consultant hopping from one job to the next. Without carefully cultivated intuition and instinct, you can’t truly understand your system’s operation.
Intuition Versus Instinct
So let’s examine each of these qualities and how they relate to computing environments. Intuition, basically, is knowing when something is right as well as when something is wrong. Intuition is that gut feeling that something is amiss even though you've checked perhaps dozens of AIX logs that indicate everything is fine. Intuition is the confidence you have to face a roomful of management types and explain why you’re right and the logs are wrong. Intuition is running every operational scenario through your head and determining which is the best for your environment.
To illustrate my point, consider this typical scenario: A problem has occurred in your environment. It’s serious enough that a meeting has been convened to discuss a plan of action for remediation. You find yourself in a room with a lot of people: database administrators, network folks, developers, storage gurus, security techs and, of course, management. In addition, your hardware and software vendors will likely join in via speaker phone. After a period of tense discussion, a general consensus is reached as to what is amiss with your AIX environment and a hasty plan is made on how to address it. Recognizing the need to hear everyone's opinions, you've for the most part held your tongue. But you know something’s wrong. You understand that there's a better way.
Everyone has facts and figures to back up their arguments, but only you can put this information into proper context. Maybe the facts and figures don’t jibe with your intuitive grasp about how your systems are put together and function. Maybe performance isn't actually bad; maybe the stats are simply skewed because you installed an exotic driver on a storage or network adapter that makes those devices perform like a champ. Perhaps some time ago you made a change-control approved adjustment to a VMO, SCHEDO, IOO or NO tunable. You're the only one who would know this, but now you realize that this must be factored in to the plan of attack. The smallest configuration change can potentially create cascading effects. No one else in the room could possibly know this, because no else knows your systems better than you.
Intuition is an art form. It's something that every successful systems administrator has acquired. Developing this skill requires dedication, experience and patience. So start by getting to know every last bit of hardware and software you manage. Leave nothing to chance. Understand not just how those components operate, but how they would operate given any number of adverse conditions. Finally, don’t manage in a silo: Talk to vendors about the thought that went into writing this or that module to a production application. Talk to the network, database and application admins at your site and question them about every last quirk in their respective environments. Eventually, it will all come together. You'll simply “know” when something's wrong with your systems, despite all evidence to the contrary.
Instinct, in the context of systems administration, is different. While intuition is knowing what’s right or wrong, instinct is knowing what to do once a problem manifests. As with intuition, your instinct may run counter to what all the manuals and support personnel tell you is the proper course of action. However, if you've developed your intuition, you should trust your instincts, because, again, no one knows your systems better than you. While everyone around that meeting table may think that steps A, B and C are the best ways to attack your performance problem, because you know every last quirk and characteristic of your systems, you understand that those steps won’t work. So after everyone has had their say, you propose your own remediation steps, and cite sound arguments for each. You prove why you’re right and everyone else is wrong.
On countless occasions in my own performance practice I’ve received advice that ran counter to how I felt the problem should be addressed. My instinct told me to do X when everybody else insisted I should do Y. I knew my systems, and I was right, every time. Seriously, my instincts and intuition have literally never failed me.
Go With Your Gut
Again, intuition and instinct―and confidence―come with time, and after total immersion in your system’s operation. You can't learn them by studying manuals, or, for that matter, by reading this article. They can only be developed through long hours with and careful attention paid to your environment.
Once you've acquired these skills, you'll know. Actually, you'll feel it. When a serious performance problem occurs in your environment, your intuition and instinct―your gut―will provide you with the sound, rational approach to remediation. Once you've put in the effort and gained the experience―once you know your systems better than anyone―you owe it to yourself to go with your gut. Always.
IT technology analytik
3 年Yes. Intuition and Instinct are the best tools for senior admins ???? It is from real life and I agree. Thanks for really great series about amazing OS AIX.
Sr. Datacenter Architect at Insight, IT Systems Engineering and IT Strategy
3 年Nicely written,.. I’ve felt this way many times as an admin in aid or Solaris. Thank you for he article!
Excellent article, I've been calling it muscle memory ??