We’ve written a lot lately about progressive delivery, and how it can help organizations deploy more quickly to get feedback on changes before releasing them widely.
Progressive delivery uses experimentation techniques such as feature flags, blue-green rollouts and canary releases to show new features or bug fixes to a small cohort of users, and takes feedback from those experiments to make a decision to go big with it or roll it back to its original state for more work. These experiments enable organizations to decouple deployment from release.
In a recent conversation I had with Dave Karow, evangelist at feature flag platform provider Split Software, he discussed something he called layered progressive delivery.
This approach, he explained, begins with finding consensus with developers and SREs. “There’s nobody that’s not going to want better cycle time, shorter cycles. There’s nobody that’s not going to want automating the ability to detect when things go awry that you didn’t expect,” he said. “There’s probably — hopefully — not too many people that aren’t going to want to know whether the thing they just did had an effect.”
He went on to say that this new approach to progressive delivery builds layer upon layer of richness to get more out of the experiments, and strongly debunked the notion that experimentation is both hardcode rigorous and that it requires building two versions of the code.
Savvy experimenters, Karow said, do dynamic config, which he explained allows development teams to send data along with a flag that sets different parameters for different users. He said the parameters of a recommendation engine, for example, “could dictate, do I want to give David a lot of answers, or just a handful of answers? And if you’re deciding whether you’re going to expose people to this new thing, you could also create two or three cohorts that each have different parameters. Now you’ve got people on your legacy engine, and you’re got two or three cohorts in the new one, and you’re trying different things — like lots of answers, not very many answers, ranked by popularity versus ranked by relevance.” The key point he made is that you can change the value in the flags and what those parameters are without having to create new versions of the code.
“So now David is in cohort three that gets this, but we’ve just changed that he’s going to see results ranked by popularity instead of ranked by relevance in the engine. And we’re going to run that for a week and see what happens. That’s not three copies of code.”
When Karow talks about a layered approach, it simply describes a way to implement progressive delivery in progressively more value-rich ways, starting with the one that’s least threatening and not a point of debate with a developer.
A hidden benefit of using a feature flag platform to deliver the variations is that it also is capturing telemetry from each of those cohorts separately and processing the data separately, to quickly compare how each cohort behaves.
Karow gave an example from LinkedIn, which he said has been doing experimentation for a long time. They had an experiment on which version of an application would cause people to do more job listings. The developers didn’t monitor the application for speed, but got an alert from the platform that said the changes made the application slower. Automating guardrails, such as always monitoring for speed, can provide insights you might not have expected. “When the thing that’s rolling it out also is the thing that’s keeping track of how it’s going, it becomes really easy to know what’s happening,” he said.
The next layer is measuring release impact. “If you achieve shorter lead times, and you’re shipping a lot, you might be like a hamster on a wheel, like you’re in a feature factory, and it sucks,” Karow said. “It’s demotivating. But if you have direct evidence of your efforts, it leads to pride of ownership.”
The top layer is test to learn, an area Karow said can help organizations take bigger risks but in a safe way. He gave the example of a food delivery service that wanted to ask customers questions about their eating and shopping habits to fine-tune their service, but didn’t want to ask too many questions for fear of turning off their users. So, he said, they did a status quo, a modest release, and a “go for it” release — which also increased onboarding time by two or three minutes. And right away, he said, they saw more money from every customer.
So instead of the usual pre-release hand-wringing — Do it. Don’t do it. We’ll lose everything. We’ll miss our quarter. — they tried these changes out in a safe way that gave them hard data from real customers.