Almost 10 years ago, Amazon CTO Werner Vogels famously revealed how the company had moved away from traditional department-led Web operations to developers deploying their own code.
“The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it,” he said.
(Related: Putting the ‘Ops’ back in DevOps)
His words now ring true for Web development teams everywhere. A workflow that was unthinkable 10 years ago is now becoming the standard for modern teams thanks to today’s powerful, reliable and easy-to-use cloud infrastructure services. And thanks to new tools like Amazon Web Services, Heroku, DigitalOcean, GitHub, hosted CI, etc., developers can now write, test, deploy and operate large-scale Web applications without involving traditional Ops teams or system admins.
It’s a new landscape where entire roles are being phased out and developers need to rapidly acquire new skillsets to adapt. Like any drastic change in an industry, those who don’t adapt will get left behind.
Great CTOs need to successfully transition their developers to this new reality. When introducing a workflow of “You write it, you ship it. You ship it, you fix it,” every developer will be held accountable for the code he or she ships. It can be stressful to know that when shipping code, you’re adding to the bucket of code that you’re already responsible for. An unintended effect of this approach is that it could contribute to greater anxiety around shipping code, thereby decreasing output.
Software output is already a major issue for CTOs. A Cambridge University study from 2012 concluded that developers spend nearly half of their time (49.9%) debugging. As a developer myself, I’d say that number is pretty accurate—and probably conservative for large organizations.
If the right culture and tools are in place, it’s a great business case to have developers operate and fix their own code. Moving to this model makes a lot of sense for today’s organizations that need to keep pace with how quickly businesses move. Developers can fix code faster than any non-developer or traditional Ops person because they wrote it. Being held accountable for their own code can increase overall code quality and lower overall debugging costs.
Companies who successfully adapt their development team to do Ops will be moving extremely fast, but it has to start with a cultural change and the right set of tools. Outsourcing infrastructure is a cultural paradigm shift in how development teams work.
Potentially, the shift is seismic from a business perspective, but it doesn’t just work out of the box. Outsourcing infrastructure to the cloud not only requires a reallocation of responsibilities, but it also presents a massive cultural challenge for development teams, as developers now have to work two jobs: writing code and making sure it gets fixed when something breaks.
Transparency and accountability are key to successfully managing this cultural change. This change in workflow means that developers are not only cranking out code from 9 to 5, they are also on duty from 5 to 9. Those 2 a.m. bug notifications are an unwelcome alarm clock for developers, especially if they originate from a coworker’s mistake, but they need to be addressed immediately.
As developers write and deploy code left and right, multiple times a day, better tools and knowledge sharing about what gets shipped, who shipped it, how it’s performing, etc. will be key for a development team to thrive in this evolving environment. To enable developers to juggle these new responsibilities, teams need to invest in solid debugging tools and coordination workflows.
Furthermore, when downtime occurs, it’s just as important to embrace blameless post-mortems and encourage transparency, so everyone on the team can learn from the mistakes that were made, no matter who caused them.
The benefits of the shift: cheaper infrastructure, less maintenance, faster development cycles, and multiple deployments per day. Cloud infrastructure has played an instrumental role in orchestrating this sea change by providing a reliable foundation that eliminates much of the Ops work of the past. With stability in the cloud largely a non-issue and owned by the infrastructure provider, the majority of downtime is caused by code changes by developers.
Instagram built its social photo-sharing service entirely atop the Amazon Web Services cloud. Prior to its acquisition by Facebook, Instagram was able to develop a reputation for maximum uptime even as it scaled to handle tens of millions of users. Instagram cofounder Mike Krieger recently said that the company’s main Ops issues when building its service weren’t caused by infrastructure. “Our No. 1 cause of issues is ourselves,” he said.
In other words, the old ways are gone and, whether we like it or not, the Ops burden is now on the developers. Today’s Ops is about making sure the codebase is performing; that’s something every development team can relate to.