The current global pandemic has impacted every aspect of business. For some companies, that means shutting down entirely, for others it means shifting everything online and processing even more customer requests than before. Companies that have moved online are simultaneously adjusting to the added pressure of a remote workforce. With very little warning, IT teams (who are also now working from home) have had to act fast to enable this switch for their organizations so employees can continue to work efficiently and effectively, and customers continue to receive uninterrupted service and support.
On-call and incident management teams are under constant pressure, making it challenging to keep up with demands and prioritize mission critical work. This is a huge undertaking, but according to IT management company PagerDuty, IT teams are now responding to incidents up to 63% faster than they had been before this crisis. How is this possible?
Rachel Obstler, VP of product at PagerDuty, says that many IT companies are currently operating in “hypercare” mode, allowing them to resolve incidents faster despite the increased demand. Hypercare mode is an increased period of focus on stability and performance, and it can also include IT teams slowing down work on new features and shifting their focus to scaling, resiliency, and redundancy.
In order to not just survive this crisis but thrive and adapt to the increased demands of digital services moving forward, IT teams will have to increase their focus on automation and collaboration. Teams will be stretched, and the extent to which they can reduce manual toil and operate virtually will dictate whether they can come out of this pandemic stronger than before, with more resilient and performant systems. One way of reducing toil, automating tasks, and promoting better collaboration is to use a tool like PagerDuty. This is particularly apparent with Network Operations Centers (NOC), which are oftentimes limited by being located on-premise.
“Network Operations Centers, where teams traditionally would all be in the same room, now need to be just as effective while virtual,” Obstler explained. NOC operators will likely not have the same tools available to them at home that they had when they were in the office, where they can look at dozens of screens or metrics, so they’ll have to work to virtualize their NOC. PagerDuty can help by proactively notifying users of any issues, providing important context like how many services are impacted, and automating the “call-list” to quickly mobilize teams to fix problems, regardless of where they are working.
It’s likely that companies have already introduced some level of automation into their NOC to compensate for an increase in the amount of distributed locations and the scale of managed systems. Those companies will have to continue introducing even more process automation to support a 100% virtual workforce.
“Companies that already had processes in place for remote work will have an easier time transitioning to this new normal,” said Obstler. In addition, companies that were predominantly transacting with their customers digitally will have an edge over those who weren’t.
Obstler believes that the level of success a company will have responding to these changes will largely depend on those factors and the age and maturity of the organization—more so than on the type of industry they are in. For example, a brick-and-mortar shop will find this a challenging time if they hadn’t been doing anything online up to this point, regardless of what type of goods and services they’re offering.
Once companies start stabilizing, they’ll have to begin shifting out of hypercare mode. Assigning more people to be on-call and available for incident response should ideally be only a short-term measure to keep up with demand until the organization is able to easily meet increased demands.
“The whole idea of DevOps and that movement is that you have the same team that is both developing new features and is also responding to issues when they happen on the same service … When you combine those efforts you allow the team to really balance between those two investments, and what typically happens is they do operational work until it’s stable enough. Then, you can do more feature work and at some point, maybe it starts to get unstable and you invest in operations. They’re able to make those tradeoffs in a very fluid way,” said Obstler.
It will be difficult for IT operations teams to undertake these challenges without the proper tooling in place. A solution like PagerDuty can help with a number of those changes that organizations will need to implement. For example, it can help companies virtualize their NOC, automatically activate critical response teams, and automate and scale incident response to deal with increased IT demand.
The company is currently offering a free starter licenses that include unlimited alerting and on-call management for the first six months of use. More information is available here.
Content provided by SD Times and PagerDuty