?? Tips to help you avoid your worst reliability nightmares
??How-tos and best practices
Zone redundancy is an essential part of designing resilient, reliable architectures, but how do you go from believing that you’re resilient to knowing that your systems can handle a zone going down?
In this blog from Gremlin Principal Engineer Sam Rossoff , you’ll learn best practices for setting up zone redundancy and how you can verify that redundancy using Fault Injection testing.
?
Fault Injection is a method of testing a system’s resilience by creating controlled failures. Most Fault Injection tools require an agent running on a host, but the design of serverless platforms makes this approach impossible.
Gremlin Failure Flags is a code-level Fault Injection solution that injects faults directly into your applications. In this blog , you’ll learn how it can help you uncover and address three common reliability risks in serverless applications.
?
Gremlin’s default suite of reliability tests analyzes critical functions of modern services: scalability, redundancy, and resilience to dependency failures. Services that pass this suite of tests can be trusted to remain available during unexpected incidents. But what happens when a service fails a test? How do you take failed test results and turn them into actionable insights?
This blog aims to answer that question . We’ll walk through all seven tests in the Gremlin Recommended Test Suite and explain what they test, what happens if your service fails, and what actions you can take to turn that failure into success.
——
?? Customer Webinar On-Demand
ON-DEMAND
In this Gremlin-hosted webinar, Chris Kempster, a Sr. NFT Engineer at Visa Cross-Border Solutions, shares their journey from early Chaos Engineering experiments to integrating reliability test suites into their staging environments and build pipelines.
Chris will share the lessons he’s learned and best practices for building an effective testing process—and rolling it out across the organization.
领英推荐
——
??? Office Hours
DATE:? November 14th TIME: 11am PT/2pm ET
To get the most value out of Chaos Engineering and reliability testing, you need a way to observe your service’s behavior. Observability tools offer insight into how your systems are performing, but observability on its own isn’t enough. You need a way to monitor your systems while testing their reliability so you can determine whether your service passed or failed a test.
In this Office Hours session , we’ll show you how to connect Gremlin to your observability tool via Health Checks. We’ll also discuss which metrics you should choose when creating Health Checks, and why they’re important for reliability.
Have questions about observability and Gremlin? Just reply to this email and we’ll make sure to cover them in the live Q&A portion of the webinar.?
?
ON-DEMAND
Serverless applications are ideal for deploying scalable applications without having to manage infrastructure, but this also makes it difficult to test their reliability.
Failure Flags is Gremlin’s answer to serverless reliability. In this Office Hours session , we’ll show you how to run Chaos Engineering experiments on AWS Lambda functions. You’ll see how you can safely inject faults directly into your applications, how to scope experiments from individual functions to entire availability zones, and even how to create your own custom faults.?
Have questions about Failure Flags? Just reply to this email and we’ll make sure to cover them in the live Q&A portion of the webinar.?
——