My road to Gremlin Chaos Engineering Practitioner Certificate

My road to Gremlin Chaos Engineering Practitioner Certificate

Chaos Engineering is one field that always draws my attention. I came to know about it after I heard about the Netflix Simian Army toolkit?https://github.com/Netflix/SimianArmy?. At an initial glance, it’s hard to believe that someone using the Chaos tools in production randomly shut down any production server(chaos monkey). Later on, I watched Tammy Bryant Butow video on youtube and came to know about Gremlin. What Gremlin does is provides a hosted service that lets you run the Chaos experiment. Finally, after one week of study, I am now Gremlin Chaos Engineering Practitioner Certified.

Exam Resources

I only followed below two resources below to prepare for the exam.

Exam Format

NOTE:?Exam is free of cost; you can register via below link?https://gremlin.coassemble.com/unlock/7Jan8Su

Exam Preparation

  1. Get familiar with how to install a gremlin agent

  • For you to attack a host, the gremlin agent needs to install on that host. Gremlin support various operating system(Ubuntu, Centos, RHEL, Windows), you can even download the Docker image?https://hub.docker.com/r/gremlin/gremlin?or use the helm repo.

helm repo add gremlin https://helm.gremlin.com        
No alt text provided for this image

  • This is how the architecture will look like

No alt text provided for this image

  • In the case of Ubuntu, these are the steps you need to follow, as shown in the above diagram.

* echo "deb https://deb.gremlin.com/ release non-free" | sudo tee /etc/apt/sources.list.d/gremlin.list
* sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys XXXX
* sudo apt-get update && sudo apt-get install -y gremlin gremlind        

  • Once these steps are done, you need to Register the installed Gremlin with the?Gremlin Control Plane?using your Team ID and Secret Key in?Team Settings. To do that, go to the Team Settings page, make a note of TeamID and SecretKey(In case you don’t know it, click on the Reset button)

No alt text provided for this image

  • Run gremlin init command and enter the Team ID and Secret you copied in previous steps

$ gremlin init
Metadata set for [ gremlin-client-version: 2.20.0 ]
Metadata set for [ os-type: Linux ]
Metadata set for [ os-name: Ubuntu ]
AWS metadata may be present
Metadata set for [ instance-id: i-0550fdb260931639b ]
Metadata set for [ local-hostname: ip-172-31-28-103.ec2.internal ]
Metadata set for [ local-ip: 172.31.28.103 ]
Metadata set for [ public-hostname: ec2-184-73-139-79.compute-1.amazonaws.com ]
Metadata set for [ public-ip: 184.73.139.79 ]
Metadata set for [ azid: use1-az4 ]
Metadata set for [ cloud: AWS ]
Metadata set for [ image-id: ami-09e67e426f25ce0d7 ]
Metadata set for [ instance-type: t2.micro ]
Metadata set for [ region: us-east-1 ]
Metadata set for [ zone: us-east-1c ]
Unable to describe AWS tags.  The error message is: No such file or directory (os error 2)
Azure metadata may be present
Please input your Team ID: <--------
XXXXXXXX
Please input your Team Secret: <--------
Using XXXXXX for Team Id
Using 172.31.28.103 for Gremlin identifier        

  • Go to the gremlin dashboard, and you will see your newly added host.

No alt text provided for this image

  • You were all set to perform various attacks by just clicking on the attack button.

Get familiar with various types of attacks you can perform via Gremlin

Using Gremlin, you can trigger various attacks depend upon the Infrastructure to target(Hosts, Containers, or Kubernetes)

For Hosts

Resource:?Test against sudden changes in consumption of computing resources.

  • CPU:?Test that your application behaves as expected even when CPU capacity is limited or exhausted
  • Disk:?Test system and application behavior when storage space is limited or unavailable, and validate dynamic storage provisioning systems
  • IO:?Test against heavy IO operations to understand their effect on your applications
  • Memory:?Test your systems against memory consumption to ensure they can tolerate and perform given a sudden increase in usage

No alt text provided for this image

State:?Test against unexpected changes in your environment, such as power outages, node failures, clock drift, or application crashes.

  • Process Killer: Test against application crashes and similar events by terminating specific sets of processes
  • Shutdown: Test resilience to host failures by rebooting or shutting down targeted host operating systems
  • Time Travel: Test for scenarios such as Daylight Saving Time (DST), clock drift between hosts, and expiring SSL/TLS certificates

No alt text provided for this image

Network: Test against unreliable network conditions.

  • Blackhole: Test against unreachable dependencies by dropping network traffic between services
  • DNS: Test against DNS outages, and validate both fallback DNS servers and DNS resolver configurations
  • Latency: Test your system’s responsiveness under varying network conditions by injecting a controlled delay into outbound network traffic
  • Packet Loss: Test your system’s end user experience when a percentage of outbound network packets are dropped or corrupted

No alt text provided for this image

Try to test and perform some of these attacks before the exam. E.g., to test shut down, go to State and click on shutdown; you have an option to introduce delay and reboot the host after shutdown.

No alt text provided for this image

  • You can go to the host and see what command it’s executing.

$ ps aux|grep -i gremlin
gremlin     2142  0.0  0.9  23420  9328 ?        Ssl  04:42   0:00 /usr/sbin/gremlind
gremlin     2362  0.0  0.8  23612  8516 ?        Sl   05:07   0:00 gremlin attack shutdown -d 1 -r        

  • Gremlin also provides a friendly UI, where you can view this.

No alt text provided for this image

  • Similarly, you can perform other kinds of attacks like CPU attacks. In the scenario below, we run the test for 60 sec, for CPU utilization of 50% and on all cores.

No alt text provided for this image

  • You can go back to the host and check the CPU utilization using the top command.

No alt text provided for this image

3. Get familiar with the gremlin command line.

$ gremlin -h
gremlin
USAGE:
gremlin <SUBCOMMAND>
FLAGS:
-h, --help    Prints help information
SUBCOMMANDS:
attack                Run a new gremlin attack against this host
attack-container      Run a new gremlin attack against the specified container
check                 Show runtime troubleshooting data
help                  Prints this message or the help of the given subcommand(s)
init                  Initialize a new client session with the Gremlin service
logout                Remove this client from the Gremlin service
measure               Measure then report dynamic system data
rollback              Interrupt an active attack, or revert the last impact
rollback-container    Interrupt an active attack against a Docker container
status                Show the status of all gremlins or a specific attack
syscheck              System check was a feature in Gremlin 2.8.x and is no longer supported
validate              Validate a gremlin
version               Show version information for the gremlin binary        

In the end, I will say this exam is straightforward, go through Gremlin doc and youtube(Bonus: If you can attend their Bootcamp), and you should be good to go.

The best way to connect with me is via any of the below mediums









要查看或添加评论,请登录

社区洞察

其他会员也浏览了