The Simian Army, Principles of Chaos Engineering & building resilient construction projects

Deepak Mistry.

Risk Director at HKA | Infrastructure & Capital Projects Advisory | International

发布日期: 2024年5月30日

Chaos Engineering Part 3

This is Part 3 of a series of articles where I continue to explore what the discipline of risk management can learn from other industries to help us better manage risk, deal with blind spots and build resilience on construction projects.

The term Chaos Engineering may conjure up a sense of randomness and disorder but the discipline is far from this. Through the process of planned and controlled experimentation we can observe and learn about the behaviour of a system (insert “project”) in order to improve performance and risk mitigation efforts but also help design our project plans with resilience in mind.

As a recap, in Part 1 I introduced Chaos Engineering which is “the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions”. I described some similarities to risk management.

In Part 2 I described the nature of “distributed systems” and related this to construction projects, listed several benefits of Chaos Engineering and explored the idea of “injecting failure” into a system by way of experiments to observe the impacts. My aim was to establish some common ground between both disciplines.

In this more in depth article I attempt to delve deeper into the advanced principles of Chaos Engineering and have focused on what IBM considers to be best practice but before I do this I’d like to pay homage to Netflix which has a long history with Chaos Engineering.

The Simian Army

My entire article series was inspired by Netflix which is one of the pioneers of Chaos Engineering. The company began to use Chaos Engineering in 2008 when it introduced a tool called “Chaos Monkey” that randomly disabled the live production environment in a controlled manner to help identify weak points to fix and make the system more resilient. Inspired by this they went on to develop a new "Simian Army" (with novel names!) that induced various kinds of failures, or detect abnormal conditions and to test their ability to survive them.

“A virtual Simian Army to keep our cloud safe, secure, and highly available” Netflix TechBlog

Here’s a selection of some of the tools deployed:

Chaos Monkey – a tool which terminated virtual machine instances running in production (helped Netflix identify and fix issues with its auto-scaling, redundancy, and monitoring systems)

Latency Monkey – a tool which simulated network latency in order to test the resilience of the system to slow network connections (helped Netflix identify and fix issues with its timeouts and retry logic)

Chaos Kong - tested the resilience of Netflix’s data storage system by randomly killing entire data center regions (helped Netflix identify and fix issues with its data replication and recovery processes)

This caught my imagination as I could see similarities with some of the principles of risk management and it got me thinking what if anything we could learn from Chaos Engineering and apply it to improving how we manage risk on construction projects to help build more resilience, especially for those risks which remain hidden creating blind spots. I liked the concept of experimentation, injecting failure and some of the random aspects to this and wondered if this could help.

It also got me thinking of an idea about the equivalent of a Symian Army but deployed in the risk management space. Interestingly, AI assistants have hit our working worlds so there's huge potential.

Advanced Principles of Chaos Engineering

The following list represents the advanced principles of Chaos Engineering and best practice approach by IBM. I intend to dive into each one in turn and apply this to risk management on construction projects:

Understand the system and establish the “state-state” behaviour
Embrace failure
Identify real-world incidents
Create a game day
Use automation
Be mindful of the blast radius

Feel free to swap out the term “injecting failure” to “injecting risk” and consider this as running “risk experiments” instead of “chaos experiments”. I’ll attempt to relate each heading to risk management and explore concepts, leave questions for you to ponder over (I don’t have all the answers!) and see if it offers any inspiration for improvement or innovation.

Warning, it’s longer than previous articles so you might want to grab a brew first, skip to parts which interest you or exit now if you feel so inclined (no offence taken-ish!). Personally, it’s been a really useful exercise for me as it’s got out of my head the things I wanted to explore and has left me with several ideas to experiment with and explore.

Understand the System and establish the steady-state behaviour

“Defining a steady state hypothesis is a crucial step in the chaos engineering process, as it sets the foundation for all subsequent experiments.” How Netflix embraced Chaos. As distributed systems have become more… | by Haasita Pinnepu | Medium

This should describe what the system should be doing under “normal conditions” and for construction projects a proxy for this could be what has been planned or is expected. The construction schedule is one representation of this as it provides a project plan for all of the key activities and tasks needed to be completed to deliver projects outputs and outcomes. The plan represents a “deterministic” virtual model of how the project is expected to be delivered and how the system should behave, all other things being constant and assumptions made.

Understanding the system also means having an appreciation of the entire project not just one element of it. The project plan helps here because of how it’s organised (Work Breakdown Structure), it’s characteristics (estimated activity durations/resources), it’s properties (constraints/dependencies) and logical rules to follow. This project plan enables us to see the bigger picture, more granular detail as well as the impact of any changes to the rest of the project. Plans also offer up quantitative metrics which can be observed when change occurs.

I think the project plan seems to fit well with providing a comparable “system” example for the application of Chaos Engineering principles in a practical way. It can provide a measurable baseline against which to observe changes in behaviour during chaos or risk experiments. The good news it that we’re already adopting this approach on projects through Project Controls reporting and the Planning teams capturing and tracking progress of activities who understand the finer details. However, it often seems retrospective and reactive.

The risk function tackles the forward looking aspect by injecting uncertainty and risk against the deterministic plan but the question I have is whether we’ve really challenged the status quo and whether we could squeeze more insight and value out from what we’re measuring and observing when we perform risk experiments in the form of risk analysis and scenarios.

The following additional questions are floating about in my head:

Have we established all the right quantitative metrics to flag our attention to what really matters?
Are we content that observing and measuring these metrics when injecting failure (risk events) into the plan is offering us sufficient decision making value?
Have we properly identified and applied any thresholds or tolerances to measure when the system begins to falter?
Have we considered evaluating any qualitative metrics and what would these look like?

It’s important to understand the nature of the project system but also deciding what we want to measure when observing change and performing risk analysis. Establishing this upfront before projects begin is vital because they guide and inform decision making from day one.

Embrace failure

“Disruptions will always occur in IT services and it's better to experience them in a controlled environment to identify the solution pre-emptively”. What is Chaos Engineering? | IBM

As the saying goes “change is the one constant in life” and life on construction projects is a brilliant example of this. Whilst projects are the vehicle of change in our society, people working on them experience it too and plenty of it! Some of this is planned and controlled but much is driven by uncertainty and risk leading to poor performance and being caught on the back foot.

What’s not brilliant is the well publicised fact that the construction industry is rubbish at learning lessons from the past. It’s disappointing to continually observe that even with the wealth of information and experience we have at our disposal we’re failing to address known repeat offending risks or causes of poor project performance. We’re not embracing failure, we’re continuing to invite it in! The situation is compounded further because we’re also not great at imagining risks we’ve not previously encountered before either.

“We’re not embracing failure, we’re continuing to invite it in...we’re also not great at imagining risks we’ve not previously encountered before”

What I’m hopeful of is that Chaos Engineering will help inspire us to think of new ways to address some of these blinds spots or at least minimise the impacts when they cause us pain. I’m not calling for a revolutionary new approach to risk management (there’s plenty of people doing that already!) but more of an exploration of incremental improvements or opportunities to innovate building on what we currently know or have at our disposal.

On a positive note we’re actually embracing the principle of failure already when we run quantitative schedule risk analysis (QSRA) which is a monte carlo simulation modelling technique. This is where we take the construction schedule and apply uncertainty ranges to activity durations and discrete risks pinned to activities to observe how they impact performance and the delivery of key milestones and the project overall.

The effectiveness of QSRA has been well debated over the years and it’s common knowledge that it can suffer from human biases and a whole host of other factors but I think it nevertheless does offer value by going through the process itself. It’s a good learning opportunity as you engage subject matter experts to test the credibility of identified risks and socialise appreciation of the potential impacts.

“At times using QSRA and embracing this type of failure feels a bit predictable, linear and prescriptive”

This may sound strange given that the methodology is underpinned by probability theory and it’s all to do with uncertainty but it at times using QSRA and embracing this type of failure feels a bit predictable, linear and prescriptive. Seasoned risk professionals can almost anticipate what the outcome of a QSRA will be before they’ve even run the model. They’re not like Neo from the Matrix but most of it is common sense. Add an identified risk or scenario to the model and it will probably extend activities to the right of the schedule potentially leading to delay. What about those risks or scenarios which remain hidden or have not been imagined?

There is certainly scope to improve how we could perform risk analysis more effectively than how we’re doing it today and it’s not all about technology. The main point is that it should afford the project sufficient time to make key decisions and take action to mitigate risk and build resilience to deal with events we may not have considered.

A few questions to consider:

What’s constraining our ability to learn lessons from the past and apply them forwards and how do we overcome them? (dealing with “known knowns” and “known unknowns”)
Given that we cannot identify all possible risks why are we obsessed with injecting discrete risks into the project system and what are the alternatives? (e.g. using characteristics and/or profiles of risks)
What’s impeding our ability to better imagine risks and test our blind spots and how do we overcome this?
Are there more deep rooted issues to tackle which relate to what we’re able to test for failure? (i.e. scenario “x” is not a palatable one?)
Are our current methodologies, techniques and tools fit-for-purpose to run failure experiments?

Identify real world incidents

“Chaos engineering experiments should hew as closely as possible to what might happen on a normal day instead of creating unlikely situations.” What is Chaos Engineering? | IBM

This is all about developing hypotheses about potential deviations from the “steady state” mentioned above and introducing realistic failure into the system, however, I’m in two minds about this statement.

Chaos Engineering focuses on exploring events on a live system such as network and infrastructure failures, bad code, power issues and traffic overload. Identifying and simulating these “real world” incidents enables you to test the system’s resilience and identify potential technical weaknesses.

The issue I have is with two parts of the above statement, the first being “normal day”. The focus of risk management is anything but focusing on a normal day. In fact, to my mind, normal day is akin to what the baseline construction represents – the plan and expected performance. Some assumptions about risk and uncertainty may be build in to what we consider to be expected and even costed into the budget (known knowns or known unknowns).

领英推荐

From papyrus to pixels: The story of project management

Height 10 个月前

How to navigate complex projects

Systematiq 1 年前

Design and Manage Grant Programs Using Engineering…

Yilmaz O. 2 个月前

As mentioned above, risk quantification is also prone to and influenced by a factors such as biases, politics, different agendas and more which may dictate what ends up in a risk register. However perhaps there’s also an issue with the way in which we interpret and apply the definition of what a credible risk or scenario is.

The second part of the statement which is quite interesting to me is “instead of creating unlikely situations”. The start of the process on construction projects is the development of a risk register capturing all of the credible risks and scenarios we may foresee occurring on a project. Sure, some of these may be unlikely with a very low probability but we also have a tendency either intentionally or not to shy away from scenarios which might be considered unrealistic or perhaps unpalatable. There may also be genuine blind spots.

Projects also operate in an environment where information can be imperfect especially at the start yet many push forward regardless given the urgency of delivering the outcomes desired. In order to proceed assumptions are made but these assumptions can represent a significant source of uncertainty/risk yet don’t appear in the risk register. I would argue we’re also not great at monitoring these assumptions during the life of a project or recognising their potential impacts if they don't hold true.

To put it another way, just because we haven’t identified any credible risks or scenarios there’s no guarantee that certain activities or scopes of work on our project won’t ever be impacted by some kind of event or assumptions not holding true. Maybe we if we explore and adopt the random element of Chaos Engineering we might overcome some of this? (i.e. the application of injecting failure is partly randomised which removes an element of selective application).

Questions that come to mind are:

What if we develop a methodology to randomly (or intentionally) target “sensitive” areas of the plan where we have not identified any risks to observe what the impacts could be?
What if part of our risk experiments focused on testing the confidence and impact of our assumptions not holding true?
How about “stressing” known repeat offenders such as design or scope change/scope creep even though confidence may be high?

Create a game day

"Expect to be surprised, and not just the first time….Systems change over time, and chaos [engineering] game days keep your knowledge of your systems fresh." TechTarget

In the software industry a “game day” is one where a series of failure experiments are pre-planned and applied to the live operational system. A team of people who would normally operate the system are assembled and this could range from operational staff through to business or client facing roles.

They develop the failure scenarios and then “inject the failure” into the system, observe and react to remedy any issues. The game day aims to test systems, processes and responses. The team should perform their roles as if the unexpected event occurred for real. After the event a review is undertaken, lessons learnt and recommendations made. Interventions are applied to the live system to avoid a “real life” repeat.

We clearly don’t do this on live construction projects. Beyond the safety considerations mentioned previously risk professionals and project teams simply don’t have the bandwidth to entertain this. Perhaps before a project kicks off there’s some kind of scenario analysis and stress testing undertaken of forecasts but this is limited and basic. Then during the life of a project, if you’re lucky, a QSRA might be run at periodic intervals or even adhoc in approach but given these are few and far between they lose value to make the timely decision making impact they need to.

It feels, to me at least, like we’re missing an opportunity to regularly test our project system’s resilience in a more meaningful and useful way. Methods and approaches already exist which we could easily adopt and adapt. Some of these can actually leverage expertise to offer both an inside and outside view to offer friendly or constructive challenge and review.

Questions I have are:

How could we develop relevant scenarios to consider for our risk experiments? (not always the obvious ones)
Why aren’t we regularly performing activities like “Red Teaming” which enlists outside help (constructive challenge) to get a fresh perspective on the plans to explore what could go wrong?
What about performing “Pre-mortems” where activities are tested before they are undertaken to identify potential failure points?
Who would need to be part of this “game day” team and what expertise is relevant?
Which parts of the project or timelines could we target this at and why?
What are we doing to keep out understanding about the project system fresh given regular changes and challenges we face?
How could we socialise this learning or the insights gained to enable other teams to benefit from this and potentially help strengthen weak links elsewhere?
Whilst this requires investment of people and time, is it not worth it given the potential consequences?

Use automation

“Organisations of all sizes can use chaos engineering by automating experiments, which would be too labour intensive if companies manually conducted them…Experiment design, failure injection and infrastructure provisioning are all aspects of experimentation that organisations can automate.” What is Chaos Engineering? | IBM

In the software industry chaos experiments are actually undertaken in a live operational environment which demands the use of automation as impactful timescales are miniscule but the concept can be applied to the relevant granularity of timescales on projects. What I’m proposing is that we leverage existing risk modelling software and/or newer AI technology to run these risk experiments automatically and virtually ahead of time but also during the life of a project to gain useful insights. Currently it’s a very manual process.

The idea here is to use automation to better facilitate running risk experiments continuously as opposed to running them at the start of a project, across lengthy intervals or in an adhoc manner. This breaks the burden of the energy intensive project controls reporting cycle. This has always felt retrospective in nature and transactional. I’ve also never quite understood why the monthly reporting cycle exists! Perhaps it more to do with the time it takes to pull information together than anything more meaningful?

Automation can help here because it could lead to more timely insights to support decision making early enough (proactively) to make an impact. For clarification, I believe human intervention is still required to make sense of the insights and help translate this for stakeholders but the labour intensive activities can be minimised to free up more time to do the value adding work.

This continuous approach also enables you to capture data and feedback on the project system’s behaviour over time which can be used to refine and improve those risk experiments.

With recent developments in AI risk software and one software vendor in particular even having access to hundreds of thousands of past construction schedule data (nPlan) this may help offer insights from the experiences of other projects that we may not have personally observed before (in our working lives). The ChatGPT element to this also offers access to quicker insights or recommendations which of course need to be scrutinised carefully but give us a head start.

However, tools like this are primarily driven by the data they’re trained on so there’s potentially more blind spots yet to uncover and I’m uncertain whether they push the extremes out beyond this. Also, not everyone is fortunate to have access to these tools so in the meantime could we do more with what we already have?

Questions which come to mind are:

Which parts of the existing risk analysis process could we target for automation and why?
What technology at our disposal today do we need leverage to support this kind of automation? (e.g. leveraging Power Apps to automate workflows, machine learning etc...)
What kinds of data sets would be useful to offer more context to analysis produced? (not just risk register data or P6 plans)
How could we develop and test simulated fall back plans in response to failure scenarios?
How do we knit data sets together to facilitate a continuous flow of data from core systems to feed the software or tools? (e.g. linking to cloud stored data – data lakes)
What kinds of outputs and insights would actually be useful to aid decision making and action taking?

Risk Army

In more recent times we've seen the emergence AI Assistants and their potential future use cases. What if we could develop a "Risk Army" equivalent to the Simian Army but to help manage risk and build more resilient projects? I envisage this as a combination of automation but overseen by humans being freed up to offer more value adding contributions.

I think it's only a matter of time because it's no longer just traditional software developers having a "strangle hold" on developing tools now but it's happening on mass by not only industry professionals but the public around the world. I feel the risk software landscape is being disrupted so existing vendors will need to up their game if they're going to offer value.

I also think use cases should focus less on prediction and more on producing insights to help build more resilient projects and facilitate better use of scenarios and stress testing. We can't identify all risks but we can think more deeply about understanding the nature of systemic risks, their characteristics and apply these principles across a project to test resilience early and continue to iterate and feedback.

AI offers an opportunity to automate much and getting the Risk Army to continually do this and test potential blind spots feels like a worthwhile endeavour.

Be mindful of the blast radius

“This principle emphasizes the importance of minimizing the impact of chaos experiments on the production environment and end-users. In other words, you should ensure that the experiments are isolated and do not impact any critical systems or services.” How Netflix embraced Chaos. As distributed systems have become more… | by Haasita Pinnepu | Medium

Okay, this quote again demonstrates running chaos experiments in a live operational environment which I’m not advocating. The “blast radius” refers to the consequential impacts of the failure being injected into the system (i.e. what goes wrong, how does it manifest, what other activities does it impact?).

Chaos Engineering seeks to minimise the “blast radius” by:

Targeting a subset of services
Running the experiment for a finite time
Running the experiment away from peak traffic
Running the experiment in the development environment
Experimenting on every component

The one thing in risk experiments we don’t want to do is to limit the blast radius as that defeats the object of what we’re trying to understand, assess and measure. There is a point after our risk experiments have been executed where we do in fact want to limit the blast radius but that’s when we’ve revealed the potential weaknesses in the project we’ll seek to eliminate or strengthen.

It’s not that we’re trying to maximise the blast radius but when we inject failure or risk into the project system we want to understand all of the possible consequential impacts. On that basis I’ll try to flip the meaning about a bit to generate some relevant application to construction projects. I’ll cover off each of the above points in turn.

On construction projects we could consider targeting a subset of activities or tasks to test by applying risks to them. We already do this in QSRAs but we could go further such as simulating the failure of controls or parts of the plan which are based on a set of assumptions holding true. Often these are excluded from risk analyses but their impacts could be significant. Some of these are difficult or uncomfortable for project teams to think about so we need to overcome this. Perhaps illustrating the potential impacts might be sufficient to convince others to take action to protect against adverse impacts.

It makes sense that specific experiments are time or event bounded. What I mean by this is that if the risk experiment fails to induce any kind of impact it make senses to terminate it and move on to the next one. However, one thing to note is that just because the experiment failed to produce an impact on this occasion doesn’t mean it won’t the next time the project schedule has been updated! Worth keeping in mind.

Sometimes risks are like buses. You don’t see any for ages but then they all arrive together at the same time. So, injecting multiple risks at “peak traffic” project times might be worth considering too. How you define peak traffic on a project is up for grabs but there should be some rationale to it. Perhaps picking points when many activities are predecessors and converge? Or perhaps even shifting this around the schedule to see how sensitive the project is to them?

On software projects, the argument against running Chaos experiments in development environment (like we do when we perform monte carlo simulation) is that the conditions will differ to the live operational environment potentially leading to a false picture of what might happen in reality. On construction projects we are reliant on using models of reality (plans/schedules) so all we can do is ensure that the inputs and logic are as accurate or realistic as possible.

Given resource constraints, it's not feasible to experiment on every component of the project system so this comes back to my earlier point about defining and agreeing upfront what is important to measure when we observe changes caused by our risk experiments (i.e. whatever metrics are appropriate).

Closing Remarks

If you’ve made it this far and followed this article series hopefully you’ll have seen there are many similarities or parallels between Chaos Engineering and Risk Management. There are opportunities to learn and apply some of the key principles which to a certain degree are universal.

Embracing failure in this way has benefits which include being able to identify hidden dependencies, developing more of a scenario based approach to testing for resilience, helping projects to prepare better to deal with unplanned change and challenges and leveraging automation to free up more of our time so we can perform more value adding work.

I’ve personally been inspired to adopt more of a continuous and experimental approach to risk management and the writing process has helped me to capture more questions which triggered off many innovative and creative ideas for me to explore too which I’m really excited about!

There's plenty of research out there on how we might overcome some of our blind spots but I think there's some potential with the random injection of failure approach Chaos Engineering adopts. This is something I'm keen to explore further too. Automation via the "Risk Army" of AI Assistants is something that really excites me too.

I was initially keen to learn more technical skills such as Python as felt I was behind the tech curve but I've come to realise it's actually more about our ability to develop and interact with these AI Assistants which will be key so I'm doubling down on playing with tools like ChatGPT and Copilot.

Hope you found it interesting, inspiring or useful in some way. As always, I welcome and invite any comments. I've already got a more "entertaining" idea for my next article so watch this space...

Syed Asad Ali

Software Engineer @ Rightcharge | Python, JavaScript, Node.js, React.js, AWS | Innovating EV Charging?

9 个月

Deepak, your post got me thinking about how chaos engineering could apply beyond just tech. Imagine applying these principles to other areas like business processes or even personal development strategies! It's all about embracing uncertainty and building resilience. Thanks for sparking these thoughts!

2 次回应

Deepak Mistry.

Risk Director at HKA | Infrastructure & Capital Projects Advisory | International

9 个月

As a follow up, how resilient are we today and might be tomorrow? With increasing reliance on AI/cloud based technology I wonder if we've really given this careful consideration in the construction industry. Outages appear to be the norm. What if we experience 1 hour, a few hours, a day, a few days? Credible? Realistic? Plan B? Food for thought, check out some headlines below: ?? Microsoft Copilot fixed worldwide after 24 hour outage ?? Microsoft Teams hit by second outage in three days ?? AT&T network outage draws government discussions ?? Barclays bank payments restored after app went down in outage ?? Sainsbury’s and Tesco resolve technical issues that disrupted deliveries ?? McDonald's blames global outage on third party ?? Facebook outage: what went wrong and why did it take so long to fix after social platform went down? ?? NatWest banking app is back online following a three-hour outage that left thousands of frustrated customers across the UK without access to funds ?? Cloud providers suffered nearly 500 critical outages in 2022 ??? TSB banking app suffered major outage as customers couldn't log into accounts ?? Three apologises after network outages affect 10,000 customers across UK

1 次回应

Stephen R.

Head of Commercial at Nodes & Links | Driving revenue growth

9 个月

Deepak Mistry. This series has been insightful. Thank you for sharing

2 次回应

查看更多评论

要查看或添加评论，请登录

Deepak Mistry.的更多文章

Chaos Engineering Part 2 : Risk, Resilience, Distributed Systems and Experiments on Construction Projects

2024年4月10日

Chaos Engineering Part 2 : Risk, Resilience, Distributed Systems and Experiments on Construction Projects

Risk and Resilience I think it’s fair to say that with finite resources and time we’re unable to “gold plate”…

7 条评论
What do monkeys, engineers and armies have to do with project risk and resilience?

2024年3月13日

What do monkeys, engineers and armies have to do with project risk and resilience?

Kicking off with Chaos Sounds like the start of a crude joke doesn’t it? OK, perhaps an attention grabbing headline but…

11 条评论
Aim high run fast, learn from everything and be radically truthful

2021年11月23日

Aim high run fast, learn from everything and be radically truthful

Exploring an 'algorithmic leader' from the inside out - why is nPlan disrupting the world of project risk management…

28 条评论
Kids, risk management and marine biologists

2021年10月6日

Kids, risk management and marine biologists

Having two young kids at home I often get into conversations about what they'd like to be when they grow up. Like many…

8 条评论
Algorithmic Leaders

2021年6月27日

Algorithmic Leaders

As mentioned in my previous article, ‘algorithmic risk management’, several key technological trends have converged to…

14 条评论
Algorithmic Risk Management

2021年6月4日

Algorithmic Risk Management

The Technological Revolution In recent years there have been several key technological trends which have converged to…

9 条评论
Poor project performance - A risk based case for change

2021年5月14日

Poor project performance - A risk based case for change

There is a plethora of research and news headlines providing evidence of poor performance in the delivery of projects…

46 条评论

See all articles

The Simian Army, Principles of Chaos Engineering & building resilient construction projects

Deepak Mistry.

Risk Director at HKA | Infrastructure & Capital Projects Advisory | International

Chaos Engineering Part 3

The Simian Army

Advanced Principles of Chaos Engineering

Understand the System and establish the steady-state behaviour

Embrace failure

Identify real world incidents

领英推荐

Create a game day

Use automation

Risk Army

Be mindful of the blast radius

Closing Remarks

Deepak Mistry.的更多文章

社区洞察

其他会员也浏览了

Case Study: Project Management of the Joint Light Tactical Vehicle (JLTV) National Rollout and Training Center Establishment

How to Create a Good Quality P50 Risk-based Baseline Schedule

Time/Cost Tradeoffs: The Importance of Scope, Drag, Drag Cost and TCW

Gaining Maximum Productivity of the Resources Who Work Simultaneously in Several Projects: Practical Recommendations

PM101-12 WHAT is it and WHO Cares About YOUR Project?

Project Assumptions & Risks

The Story of Every Project Ever

Embedding Cost Management in Engineering Culture

Incorporating enterprise search performance as an element of a project plan

Making the Impossible Possible, Leading Extraordinary Performance

Chaos Engineering Part 3

The Simian Army

Advanced Principles of Chaos Engineering

Understand the System and establish the steady-state behaviour

Embrace failure

Identify real world incidents

领英推荐

Create a game day

Use automation

Risk Army

Be mindful of the blast radius

Closing Remarks

Deepak Mistry.的更多文章

Chaos Engineering Part 2 : Risk, Resilience, Distributed Systems and Experiments on Construction Projects

What do monkeys, engineers and armies have to do with project risk and resilience?

Aim high run fast, learn from everything and be radically truthful

Kids, risk management and marine biologists

Algorithmic Leaders

Algorithmic Risk Management

Poor project performance - A risk based case for change

社区洞察

其他会员也浏览了

Case Study: Project Management of the Joint Light Tactical Vehicle (JLTV) National Rollout and Training Center Establishment

How to Create a Good Quality P50 Risk-based Baseline Schedule

Time/Cost Tradeoffs: The Importance of Scope, Drag, Drag Cost and TCW

Gaining Maximum Productivity of the Resources Who Work Simultaneously in Several Projects: Practical Recommendations

PM101-12 WHAT is it and WHO Cares About YOUR Project?

Project Assumptions & Risks

The Story of Every Project Ever

Embedding Cost Management in Engineering Culture

Incorporating enterprise search performance as an element of a project plan

Making the Impossible Possible, Leading Extraordinary Performance