A new report revealed those who have successfully implemented chaos engineering have 99.9% or higher availability and greatly improved their mean time to resolution (MTTR).
Gremlin’s inaugural 2021 State of Chaos Engineering report found 23% of teams who frequently run chaos engineering projects had a MTTR of under 1 hour, and 60% under 12 hours.
Gartner echoed similar sentiments about the report’s availability finding by predicting that by 2023, 80% of organizations that use chaos engineering practices as part of SRE initiatives will reduce their MTTR by 90%.
According to Gremlin’s report, the highest availability groups commonly utilized autoscaling, load balancers, backups, select rollouts of deployments, and monitoring with health checks.
Found outages before they become failures
Chaos engineering in serverless environments is more useful than you’d think
To build resilient systems, embrace the chaos
The most common way to monitor standard uptime was synthetic monitoring, however, many organizations reported they use multiple methods and metrics.
In the report, Gremlin also found that chaos engineering has seen much greater adoption recently, and that the practice has matured tremendously since its inception 12 years ago.
“The diversity of teams using Chaos Engineering is also growing. What began as an engineering practice was quickly adopted by SRE teams, and now many platform, infrastructure, operations, and application development teams are adopting the practice to improve the reliability of their applications,” the report stated.
While it’s still an emerging practice, the majority of respondents (60%) said that they ran at least one chaos engineering attack and more than 60% of respondents have run chaos against Kubernetes.
The most commonly run experiments reflected the top failures that companies experience, with network attacks such as latency injection at the top.
However, some companies are not adopting chaos engineering mostly due to lack of awareness, experience, and time at 80%. Less than 10% of people said that it was because of fear of something going wrong.
“It’s true that in practicing Chaos Engineering we are injecting failure into systems, but using modern methods that follow scientific principles, and methodically isolating experiments to a single service, we can be intentional about the practice and not disrupt customer experiences,” the report stated. “We believe the next stage of Chaos Engineering involves opening up this important testing process to a broader audience and to making it easier to safely experiment in more environments.”