As an industry, software development teams continue to embrace cloud-based toolchains. This trend makes a ton of sense for companies trying to drive development productivity, efficiency, and velocity in the era of hybrid and asynchronous work. But as we’ve seen with Jira’s recent outage, relying on a cloud-based tech stack creates risk. I’m not pointing fingers here. My own company offers a cloud-based productivity platform, and we, like every other cloud provider, have experienced outages. These events are inevitable, so as we become more reliant on the cloud-based software model to run our businesses, it’s essential for teams to understand what steps they need to take to cope with outages when they happen.
Not all outages are created equal. Jira’s was high in severity but low in terms of customers impacted. The reverse could be true for the next one you may experience. This is why it’s essential to consider the possibility of outages when selecting your software providers. There are multiple important considerations to keep in mind. We’ve boiled it down to three different primary considerations.
Prepare for the inevitable
If you use a cloud-based solution, you know an outage is coming, but it’s impossible to know when, so build a plan. Internally, that means establishing a single point person — an incident manager — that helps coordinate activity during the event, documents important information, and more. Getting buy-in from all stakeholders across your organization is key when an outage hits, so everyone will be in agreement on the next steps to solve the issue as fast as possible.
Have a workaround (to the extent possible)
Having a viable alternative available when an outage hits is nice, but obviously not always possible, but striving to provide some level of productivity will, at the very least, help to mitigate some of the lost progress when an outage occurs. Speaking from personal experience, my team has dealt with outages from GitHub multiple times. Knowing these will happen, we work to provide a workaround to enable our team to get something done in the interim. Prior to this happening, you should ask if there is a self-hosted possibility to get the benefits of the cloud without being dependent on the infrastructure.
Choose a cloud-based provider that communicates status updates clearly and regularly
Due to the nature of cloud-based software, it would likely be impossible to choose a company that’ll never experience an outage. However, you can look into how companies have handled outages in the past, how reliable their software is, and what their usual response time is. The SaaS industry is small, so don’t hesitate to ask around your network about their experience with different companies and how they handle outages. Opt for organizations that are quick to document an outage, provide regular and transparent updates, and take these service interruptions seriously.
Communicate status updates to internal stakeholders clearly and regularly
In addition to your own team, internal stakeholders and upstream managers need to understand what’s happening with the outage as well. They should not have to ask your team if there is a problem when something’s not working as it should. It’s possible they are the first to know, but more often than not, the organization experiencing the outage should be communicating first on what’s happening. There should be a single source of truth that delivers all your official communications on the event. This is OK if it’s multi-channel, but it should be coming from one source to ensure consistency and accuracy of information.
Take note of what you’d do differently
Dealing with an outage that negatively impacts your team’s productivity can be frustrating. Especially if all you can do is wait until it’s fixed. However, these outages present a great opportunity to reflect on what your company would do in the event of your own outage. As we mentioned before, outages are a hazard of doing business in the SaaS industry, and we can learn a lot from how our peers handle these situations. Whether it’s good–or bad–take notes on how you felt as a customer navigating the situation and adopt it when your product experiences its own outage.
Good luck!
Hopefully, these points will enable you and your team to weather the coming outage better. While some of these may seem self-evident, I’ve always found value in making implicit advice explicit, particularly since it helps to have specific steps to follow when confronted with chaos. It reduces confusion, settles nerves, and provides a pathway to productivity.