今天就学习课程吧!
今天就开通帐号,24,100 门业界名师课程任您挑!
Operational feedback: Incident response and retrospectives
- Remember how we said that all of our systems are sociotechnical systems and humans are a part of their resilient operation? - Yeah, well you can do all the other stuff, right. You can have great design and development and testing and great monitoring, but things are still going to break. - Since this is absolutely no surprise. Part of the job is to get really good at responding to and remediating problems in your production system, which we affectionately refer to as incidents. - Incident response is an activity that needs to be practiced. It's the place where in-depth system knowledge and a cool head make all the difference. - There are three general activities you want to be good at for incident response, troubleshooting, understanding the system enough to be able to diagnose and remediate the problem. Automation having tooling already created to speed up and make safe information gathering and re remediation activities and communication. Incident response often requires a team of…
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。
内容
-
-
-
-
-
-
-
-
(已锁定)
What is site reliability engineering?3 分钟
-
(已锁定)
Building for reliability: Theory3 分钟 45 秒
-
(已锁定)
Building for reliability: Practice5 分钟 57 秒
-
(已锁定)
Operational feedback: Observability4 分钟 42 秒
-
(已锁定)
Operational feedback: Incident response and retrospectives4 分钟 42 秒
-
(已锁定)
Your DevOps SRE toolchain6 分钟 22 秒
-
(已锁定)
-
-