APM vs Logging

I've heard this discussed as an either or ; I really don't think that's the case. For APM I have a solid background in AppDynamics which, whilst expensive, is kind of a Rolls Royce for APM. It's great to bolt into legacy code and you get some cracking - and sometimes surprising - insights into what's going on under the hood. 

Where I see the real value of APM is in statistical analysis of performance. It's great to be able to dive into the stack and see what's obviously wrong, but we can also get insidious performance degradation that comes over time, as databases expand and so on. Perhaps you didn't set up index maintenance, or clean down data too old to be useful. That kind of problem can sit there for years, long past anyone's ability to remember, and even that assumes you still have the original staff around!

I've used AppD to great success in a number of organisations, from circa 100 to 10,000+ employees, and with the right touch you can gain as much information from what it doesn't tell you, as what it does. Latency can be a silent performance killer and it's tricky to spot except by omission. 

All that being said, it's a different focus than logging. Lack of logging is one of my pet peeves, and it's something few developers seem to get right. It always seems to be an afterthought, if it gets done at all, and all too frequently it tells me things I could have guessed at easily enough, whilst not telling me anything I actually need to know. 

When I'm designing software, the very first step is to think about how I'm going to log information. It doesn't matter if you're using a framework like log4j, or rolling your own, the core principles are the same. 

Personally I like the classic Error/Warning/Information/Verbose set up. If you start with this I find it almost works like pseudocode. I can plan out my decision making process for my app just using logging. It helps me think about process, data flow, and lets me set up assertions and tests for those assertions - all before I'm really getting into code cutting.

I like lots of logging at the verbose level, and I've on occasion hit 80% of my code being debug logs. I've taken flak from other developers for this style, but whoever is throwing it, has never been in Operations. That's where you really learn the value of logs. I cannot overstate that, and it's a common failing of DevOps teams to focus purely on CI/CD to the detriment of problem anaysis.

Now, obviously you don't want verbose logging switched on all the time, as it does have a performance impact. Yet in today's always on world, it's often not that easy to just flick a switch and get more info. No, the devs didn't consider that, and only check at start up. You have to reset the process - oops, not allowed, change control board needed. In some organisations that can take weeks to get through, and whilst there's usually a shortcut process, do you really want to be hitting the emergency button that often?

So make sure you have a mechanism to change the log level on the fly, and crucially, have the logging clean itself up. At the very least have a janitor service.

Yet all the logging in the world is only going to help you after its gone sideways, which brings us back to APM as an early warning system. If you consider events inside your code worthy of logging, perhaps its worth firing them at your APM solution as well. 

Consider your log format too. To me, datetime, error level, an error code and a description are critical. Write your log output with the assumption that something automated is going to be reading those logs. Doesn't matter if you're feeding it into Splunk or A.N.Other, give that app a fighting chance to interpret your logs. Consider stripping out all the CRLFs to keep everything on a single line. 

Statistical analysis may well warn you of a developing situation, whether it be a misbehaving server or a bad release. Has the warning count started jumping? Can you compare last week with this week to see where to start pointing the finger? Shortening the MTTD is vital and well worth the investment. 

Next step - and often mistakenly thought of as the be all and end all, is shortening MTTR. It's time to get dirty and get back to those logs.

So to wrap up, for me at least, it's not APM vs logging - it's both. If you don't have this in place, you'll be flying blind at the worst possible time ; when customers are being impacted. 

I'd love to hear other's thoughts on this!

要查看或添加评论,请登录

Walter M.的更多文章

  • Thoughts on customer service

    Thoughts on customer service

    I was thinking about customer service on the flight over to Stockholm, having had yet another ridiculous experience at…

  • Latency Multiplies Like Tribbles - Then You've Got Trouble

    Latency Multiplies Like Tribbles - Then You've Got Trouble

    One of the more insidious things I've seen involved low levels of latency multiplied by a high number of calls. At…

    1 条评论
  • RPO, RTO and DR.

    RPO, RTO and DR.

    Having a conversation the other day, and the topic turned to RPO and RTO. Now, considering the guy I was talking to is…

    2 条评论
  • Google Keep is great, but it could be greater

    Google Keep is great, but it could be greater

    I recently discovered Google Keep reading an article on the new Gmail interface ( which I am also loving by the way…

  • The biology of success

    The biology of success

    As I have asserted before, in order to succeed, first we must fail. If you look at the airline industry, their…

  • WHY you should question everything

    WHY you should question everything

    In 2005 an Australian Doctor named Barry Marshall received a Nobel prize for medicine. More than two decades ago he was…

  • To succeed, we must fail.

    To succeed, we must fail.

    I know what you’re thinking, but bear with me. I shall explain.

    1 条评论
  • Stop writing dynamic SQL!

    Stop writing dynamic SQL!

    One of my many hats is database administrator, and in my time I have designed and maintained a fair sized whack of…

  • The trouble with Wall Street

    The trouble with Wall Street

    I think we can all agree that Costco is a successful brand. Deutsche Bank told FORTUNE magazine that Costco continues…

  • The importance of coaching

    The importance of coaching

    I have been a fast driver ever since I got my license. I have been accused of recklessness by more than a few people…

社区洞察

其他会员也浏览了