Check out these new releases! Plus: why observability and testing go together
?? Latest Releases
With Intelligent Health Checks, simply click a checkbox, and Gremlin creates a full set of Health Checks that can be used to determine service health during reliability tests—no third-party observability tools required.
In this blog post , we’ll explain how Intelligent Health Checks work, how they automate reliability testing, and how you can get them up and running in just a few minutes.
?
Designed around cloud reliability principles and best practices, the Well-Architected Cloud Test Suite gives you a testing foundation that covers the most common reliability failures out of the box. Based on cloud best practice guides like the Well-Architected Framework, it helps you automate and standardize resilience testing to make your system more reliable.
Read the blog post to find out more about test suites and which tests are included in the Well-Architected Test Suite.
——
??How-tos and best practices
Load balancers are some of the most important load-bearing (pun intended) components in cloud environments. They perform multiple critical tasks: network switching, packet inspection, and of course, routing. Most cloud-based load balancers focus on load balancing within a single zone, but what if you have resources spread across multiple zones?
In this blog , we’ll explain how cross-zone load balancing works, why it’s important to reliability, and how you can enable it in your own cloud deployments.
?
Accidental deletions, misconfigurations, and “fat-fingering” are unfortunate truths in the software industry, but there are ways to prevent them.
领英推荐
In this blog , we’ll tell you how to find critical resources that are at risk of being accidentally deleted, and how to mitigate this risk. Specifically, we’ll focus on the primary way customer traffic reaches your services: through load balancers.
?
Anyone wanting to minimize downtime and deliver reliable, available applications needs to have fully instrumented systems and playbooks so they can respond quickly and effectively to outages or incidents. But there’s another piece to the reliability puzzle: resilience testing.
Read this blog to find out how resilience testing works together with your observability and incident response practices to reduce the amount and severity of incidents, lower your MTTR, and make your system more reliable.
——
??? Office Hours
August 8th, 11am PT/2pm ET
Migrating to the cloud usually means faster deployments and easier scalability, but it also means latency. Cloud applications communicate over distributed networks, and while these networks are fast, little bits of latency can quickly add up.
In this Office Hours session , we’ll talk about the latency problem inherent to cloud computing and how it can impact your applications. We’ll discuss the network-centric design of cloud platforms, how to build applications to best use this design, and how to ensure your services are resilient and fault-tolerant.
?
Fully-managed SaaS services offer incredible scalability and accessibility, but at a cost: they’re also single points of failure. If your application depends on a SaaS service and the service fails, guess who your customers will blame?
In this Office Hours session , we’ll show you how you can recreate a failure in a managed service provider using Gremlin’s fault injection tools. You’ll learn how to run experiments that replicate SaaS outages in a safe, controlled, reversible way, while only impacting the services you want to test. We’ll also show you how you can easily choose from a pre-populated list of managed services directly in the Gremlin web app.
——