Initiatives like DevOps and Site Reliability Engineering (SRE) were designed to bring development and operations teams together to deliver better software. However, a recent report has found the shared accountability these initiatives promote is actually causing problems.
OverOps’ Dev vs. Ops: The State of Accountability report found DevOps is creating chaos and confusion when it comes to application reliability and downtime. The report is based off of more than 2,400 responses from development and IT professionals.
“Successful DevOps isn’t just about moving fast and eliminating barriers between teams. It’s about unifying the right people, processes and tools to gain a complete understanding of your system and ensure the delivery of reliable software. Without clearly defined workflows and insight into what’s happening at the deepest level of your environment, more accountability ultimately means more problems,” said Tal Weiss, CTO and co-founder of OverOps.
According to the report, 67 percent of respondents find their entire team is to blame when an application breaks or experiences an error, and 73 percent believe both Dev and Ops are equally accountable for the entire quality of the application. Weiss explained this inability to distinguish an “owner” is one of the biggest problems to the chaos. About a quarter of the respondents stated there is a lack of clarity around who is responsible for code quality.
Weiss explained it is especially hard for the operations team to identify who is accountable for code quality because of their core responsibilities. “In a given day they may receive a handful of automated alerts related to a service disruption or slowdown, and it’s up to them to use dashboards, monitoring tools and logs to quickly determine where the problem is and who is responsible for fixing it. When ownership for a given release, application or piece of code is ambiguous, it’s difficult for ops to pull in the right people and move fast to get an issue resolved. Meanwhile they’re primarily being measured by uptime and how long it takes for them to resolve incidents. This means every moment wasted just trying to locate the problem and determine who is accountable is a strike against them,” he said.
According to Weiss, in order to address and maintain DevOps processes, teams need to have greater visibility into their environments. “If you don’t have insight into what’s happening at every level of your environment, it’s impossible to effectively identify the root cause of an issue and who is responsible for fixing it,” he said.
Other findings included too many users are wasting time troubleshooting errors, a majority of respondents cited a lack of a formal process for gaining visibility, data and metrics in place as a number one obstacle, and about 40 percent of respondents believe they are moving too quickly, which is causing errors in production.
“As the lines between these two teams continue to blur, organizations will need to focus on adopting tools that deepen visibility into their applications. Clarifying ownership of applications and services, and avoiding the ‘multiple owners = no owner’ syndrome is a crucial for even the most bleeding edge organizations,” the report stated.