With Black Friday, Cyber Monday, Chanukah, and Christmas right around the corner, online shopping becomes top of mind. And for retailers, these sale days can make or break their entire year.
So, when it comes to making changes to your website at such as critical time, do you continue to look to maximize your release cadence, or do you forgo speed for stability and robustness? Do you freeze out changes to the code, locking down the production environment, or risk implementing a change that could create downtime and cost you in a big way? Or, if a change has introduced an error or a failure, should you lift the freeze?
Or Weis, co-founder and CEO an Israeli rapid production debugging startup called Rookout, said with so much at stake, treading carefully is the best course of action.
“I think it’s not only a question of how risky it is to leave the change in place, it’s also a question of alternative costs,” Weis said. “When you’re doing code freezes, it not only affects your ability to make changes, it affects your ability to do work. While you are freezing changes in production, development work still needs to go on. You still need to work on the features you’ll release a week afterward, or a month afterward. You still need to work on maintaining the stability of your solution, so even if you’re not taking it completely out of play, if it’s malfunctioning and you’re losing a portion of your traffic, that can affect your bottom line.”
This time of year, businesses look for any edge they can get on their competitors, and part of that is adopting Agile and DevOps practices. But in CI/CD, for example, the culture allows for moving fast and breaking things, and correcting any mistakes in the time you have. But if you don’t have time in this pressurized sales environment, you have no margin for error.
The key, Weis said, is striking the right balance, and taking the right approach to assessing and handling defects and errors during the freeze. “There’s not a lot of wiggle room. You keep getting locked up in a binary option – either you do something or you don’t,” he explained. “So you end up not doing most of the stuff. You end up hurting your development. You end up hurting your ability to respond to errors and incidents. You end up not being able to maximize the performance of your application.”
Weis laid out a few approaches to production debugging, including setting up monitoring to capture as much information as you can about performance when you’re under a code freeze, so that when incidents arise, you have the data to deal with it. The same is true for logging: log as much as you can to get as vast a collection of data as you think you’ll need. The downside here, though, is that costs can be high: costs for storing the data, costs for memory, and costs for CPU.
Rookout’s approach has been to create a production-grade debugging solution that is frictionless and creates no risk, enabling users to feel confident that Rookout won’t change the way an application is running, impact performance and won’t carry any issues that will harm the application, Weis explained.
It puts non-breaking break points into the production code and collects the data a user wants to measure how the application performs, from variables in code to the entire stack. It will then send alerts when anomalies are detected for quick remediation.
His advice to retailers going into peak season and looking to freeze their code? “What’s key is practice. You need to identify the areas that you expect you’ll have the most incidents with, the areas that you’ll require interaction with to continue your work in development, and the areas that you think will require additional maintenance work. Practice collecting the data you want, practice channeling it to the different systems that you want to use for data aggregation and log aggregation, their APM solutions, their databases. Once they’re accustomed to knowing where to search for the answers they want, getting the answers they want and having those in front of them becomes very natural. That’s the approach you want to have for the user experience.”