New testing how-tos, CI/CD office hours, and how to deal with layoffs
??How-tos and best practices
Find out how to use Gremlin to proactively test a service with multiple dependencies, including learning how to prepare your services for dependency failures and how to ensure your services can withstand losing dependencies.
?
Learn how to prepare for availability zone outages by proactively detecting services operating in a single zone. You’ll see how Gremlin detects this reliability risk for you, how you can mitigate it using commonly available cloud computing tools, and how you can simulate zone and region outages to prove your resilience.
?
Find out how to configure Kubernetes to automatically detect and restart failed containers. You’ll learn how to set a container restart policy, how to create liveness probes, and how to test that these systems will work as expected and when you need them to.
?
The reliability discussion often ignores a significant and ever-growing part of nearly all modern software: dependencies. This blog goes over the role dependencies play in reliability, how they can fail, and how you can build resilience against unstable and unreliable dependencies.
?
What happens when one of your nodes fails? This blog post covers node redundancy in Kubernetes, then goes into how one of Gremlin’s built-in Recommended Scenarios can help you verify your resilience.
?
After analyzing successful programs, we found that every successful program was supported by three pillar roles. Find out more about the three roles—and their responsibilities for improving reliability.
领英推荐
——
??? Featured Article
It’s never easy when layoffs hit your organization. In addition to the personal impact of losing friends and coworkers from your team, those who remain are left trying to achieve the same business goals with less people and resources.
Unfortunately, layoffs and restructuring have become a common part of business. But you’re not alone. Your partners (including Gremlin) are here to help you navigate your new reality.
Check out this article from Principal Engineer Jeff Nickoloff for three ways to do more with less.
——
??? Office Hours
DATE: June 13th TIME: 11am PT/2pm ET
Zone failures are rare, but they still happen often enough that your systems must account for them. When an entire zone fails, many of the most common redundancy techniques fail. How do you avoid outages like these, especially if they affect an entire data center?
In this webinar , we’ll show you how to prepare for zone outages using Gremlin. In a live demo, you’ll learn how Gremlin’s built-in reliability tests and Scenarios test your services against zone failures. You’ll also learn how to customize these tests to target different zones, how to recreate an outage in a different zone from the ones your systems are running in, and how to monitor your services throughout using Health Checks.
?
Ad-hoc Chaos Engineering experiments are great for learning more about how your systems work, but they don’t tell you how your systems behave over time. As new features get deployed, environments change, and regressions get introduced, even the most resilient systems can gain reliability risks. QA and performance testing are already built into CI/CD - why not reliability?
In this on-demand webinar , we’ll show you how to run Chaos Engineering experiments as part of your CI/CD process. We’ll show how to use Gremlin’s REST API to trigger experiments from Jenkins, monitor active experiments, and how to check whether the test completed successfully or failed.
——