登录查看更多内容

Basics of chaos engineering

Ganesh Ghag

Digital Transformation - API Integration | Microservices | DevSecOps | AI/ML/DL | Mobility | Cloud | Security-Crypto | Blockchain

发布日期: 2020年12月21日

“Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions” –Wikipedia

Chaos engineering tests an application’s resilience. Resilience is the ability of a system to provide and maintain acceptable level of services in the face of faults and challenges.

Given a steady state load of end users on the application, chaos engineering attempts to randomly cause faults in the application deployment system and then measures, the extent of failures and latencies experienced by the end users of the application, during the time interval of the chaos. The application can be monitored, to find the critical services, root causes of failures and fixes can be applied in the form of deployment parameters, config changes or even code chages, to ensure, that the application’s resilience increases.

Since chaos engineering does not concern with application’s functional defects, neither with negative testing nor with load testing, the main environment variables that are tested in chaos engineering are resources such as cpu, memory, network and IO. Inducing failures related to cpu and memory starvation, network outages and storage failures are some primary usecase for chaos engineering.

Resource faults can occur at numerous levels due to the following complexities of cloud native deployments:

Microservices devops based architctures: tens of unique microservices, hundreds of instances (pods) of services
Highly virtualized stacks on infra, k8s worker nodes host pods which host containers, which run processes
Resources like cpu, memory and storage are pooled, virtualized and dynamically allocated/deallocated within a k8s cluster
A few instances of a service can overdraw cpu, memory and storage, thereby cannibalizing these resources from other pods
Network usage is now, shared by applications with stack compoments like istio service mesh, sidecar proxies. K8s networking

Litmus - A good cloud native chaos platform, with reusable experiments in chaoshub

要查看或添加评论，请登录

查看全部

Basics of chaos engineering

Ganesh Ghag

Digital Transformation - API Integration | Microservices | DevSecOps | AI/ML/DL | Mobility | Cloud | Security-Crypto | Blockchain

更多精彩文章

社区洞察

其他会员也浏览了

What is Chaos Engineering

Chaos Engineering: Taming Complexity in the $3.9 Billion Resilience Revolution

Conquering Chaos: Your Guide to Chaos Engineering in Kubernetes

#19: What will platform engineering look like in 2023? ??

#94: 5 must-read Platform Engineering books

Time-driven Development & Building Distributed Engineering Processes

Quality Engineering: Transforming The Business Landscape

#77: The wild west of platform engineering is ending

#6: All aboard the platform engineering hype train ??

The Future of Platform Engineering: Embracing the Platformless Approach

National Digital Health Mission High Level Architecture

2021年9月27日

OAuth2 Flow explained, with minimal code, for clarity

2021年7月30日

Resilience in a Kubernetes Cluster

2020年12月29日

Basics of CORS and SOP

2020年10月5日

Public exam - Maharashtra #SSC - merit list marks 2020, data visualization

2020年8月30日

Throttling Kafka Consumption - Dispelling Myths

2020年8月21日

Digital Transformation - Authenticity of senders & integrity of data, using Digital Signatures

2020年7月14日

Exception Handling - Part 1

2020年5月18日

Open letter to graduating students

2020年3月15日

Process Mining - Practical Use Cases

2019年11月26日