2017 was web operations
Cover Image - Web Operations Dashboards, Monitoring, and Alerting

2017 was web operations

One of the main themes of 2017 was Web Operations.

I don't mean this was the year I started with Web Operations, it was the year it paid back on the investment.

The knowledge gained through a sustained investment in monitoring and alerting can be distilled into a few simple principles and practices.

  • The Three Fs of Event Log Monitoring
  • Incident Causation Principles
  • Alerting Principles
  • Monitor Selection Principles

I wrote about these in Web Operations Dashboards, Monitoring, and Alerting, and I have also wrote about these techniques in my blog posts on monitoring; but I wanted to highlight the benefits that we got by actually applying them. To give some context, these have all been used in anger, on a SaaS product used in 20 countries, and 15 languages; and hosted out of multiple data centres.

  1. Fixing event logs (making it so you can see real exceptions, by reducing noise) means you can solve real customer problems. Your live environment has great insights for you if you work at it. We used the Three Fs to reduce error-level logs until we were able to investigate each class of error.
  2. Understanding how to work back to a root cause means you can fix the problem. It is tempting to restart a machine to fix a problem; and then allow the investigation into the incident to fall aside because it isn't urgent any more (as we know, the only way to stop everything being urgent, it to understand what is important).
  3. When something is wrong, an alarm must sound, but ideally it shouldn't sound when there isn't a problem. This is a fundamental tension when it comes to Web Operations.
  4. You have to fine-tune your monitoring strategy, so you know you have to react to an alarm. The local shopping mall tests fire alarms every morning; and now nobody pays any attention to them. When the siren goes off, it should get your attention and that means making it go off at the right times (as defined by the alerting principles)
  5. You need to be selective with what you put on your dashboard, usually by picking leading indicators of problems. Although it is tempting to have lots of dashboards shown in rotation, a better idea is to have a fixed dashboard. Anomalies really stand out if you leave the same dashboard in the same place.

And finally, while Web Operations can be a technical exercise; an incredibly powerful technique is to add meters for business metrics into the same tool you use for the technical stuff. If you can see sales, leads, and other business goals alongside the technical information, you have a chance to correlate it.

Have a great 2018, and if you'd like to read more you can grab Web Operations Dashboards, Monitoring, and Alerting on Amazon.

Chris Brobin

Senior Project Manager at NATS

7 年

Robert Borland - something we were discussing shortly before Christmas was how to do something useful with event logs. Steve's article is worth a read.

要查看或添加评论,请登录

Steve Fenton的更多文章

  • 2024 was doing it my way

    2024 was doing it my way

    In 2023, I decided to push myself to do things that made me uncomfortable. I decided that any opportunity that made me…

    2 条评论
  • 2023 was stepping out of my comfort zone

    2023 was stepping out of my comfort zone

    I've written one of these cathartic end-of-year posts every December since 2015. The idea is to celebrate the wins…

  • 2022 was writing

    2022 was writing

    Last year, I announced that I was leaving behind my role as director of product, data, and development. Having built…

  • 2021 was the end of a chapter

    2021 was the end of a chapter

    As 2021 gets tossed into the bin of history, a newly minted 2022 arrives filled with opportunities. This year was a…

    4 条评论
  • 2020 was putting words into action

    2020 was putting words into action

    It is no secret that I read a few books. The intesting thing about all this reading is that you can't expect instant…

  • 2019 was joining the dots

    2019 was joining the dots

    One of the most valuable exercises in an organisation is joining the dots, which is the theme for my annual…

  • 2018 was reviewing goals

    2018 was reviewing goals

    Ever since I worked on a high-performing team in healthcare, working on clinical decision support software, I have been…

    10 条评论
  • Extreme Programming, Sinek Style

    Extreme Programming, Sinek Style

    I'm a long-term fan of Extreme Programming (XP). I read all of the original books, even the ones with the colour-clash…

  • 2016 was people

    2016 was people

    One of the big themes for me in 2016 was people. The solid advice of Tom DeMarco in Peopleware and Slack, of Doug…

  • Risk is Not a Function Of Time (Or Money)

    Risk is Not a Function Of Time (Or Money)

    In software development, there is a simplistic model that says that risk is a function of time, or money. Lots of…

    2 条评论

社区洞察

其他会员也浏览了