Continuous integration isn’t only pushing developers to release more often, it’s enabling them to use A/B testing more effectively.  Companies like Omniture, SitesSpect and Webtrends have all pushed out A/B testing software to help developers make better choices. And that’s the promise of A/B testing: no more guesswork.

What exactly is A/B testing, and how can it be used in software? Simply put, it’s the act of putting percentages for use on digital assets. In Farmville, for example, an A/B test could be set up to see how many users purchase an in-game pig when it is offered to them, versus how many purchase an in-game cow. A/B testing software would push out either a pig or a cow to Farmville players based on their profiles, or just based on a 50/50 chance, as determined by the developers.

Bob Garcia, optimization solution sales director for North America at Webtrends, has been working in the A/B and multivariable test market for many years now, and he said that proper use of such tests can have a dramatic impact on the bottom line.

He told the story of a recent test consulting engagement Webtrends had with a major motorcycle retailer. “We worked with them on tests for their checkout process. They started with little tests in their existing site, like changing button location, the size of the text, and adding content. They did this incrementally. Some of the tests didn’t derive statistical differences, but then they went and built an entirely separate, parallel checkout process. When they did that, they saw an improvement, which has translated into US$2.5 million in extra revenue by reducing abandonment rates. That’s a great use case where A/B testing is really most useful, in testing really big concepts.”

Thus, most of the work in A/B testing is designing the alternative layouts and deciding what to test. Garcia said A/B testing is much more effective on large changes rather than simple ones like button relocation and text adjustments.

Testing a new market
Hugh Reynolds, CEO and founder of Swrve New Media, is expecting A/B testing to be the new market for middleware. He was a cofounder of gaming physics middleware company Havok, and with his new company, he’s offering SaaS middleware for the Facebook game crowd.

For Swrve New Media, that aforementioned pig can be just about anything. Reynolds said his team has been focusing on expanding the capabilities of Swrve New Media so that it can now handle A/B testing duties within subsets of users, rather than simply spraying tests across an entire user base.

Reynolds said Swrve New Media is the perfect tool for project managers because they can tweak application performance without introducing bugs, requiring redeploys, or even requiring a developer or IT assistant. “We’re great for product managers who want to adjust things,” he said. “If the product guy comes in and says, ‘We want to run a test and see if we should give people 1,000 starting coins or 500 starting coins,’ the dev team is going to say ‘No way!’ ”

But Swrve New Media allows such substitutions to be performed quickly from a Web interface. Developers add tags into their code where A/B tests can be performed, and the Swrve New Media interface allows for a simple point-and-click substitution on the basis of user identity, percentages or other factors.

“It’s the product managers who are our champions,” said Reynolds. “It really hangs on how a whole organization behaves. The old model was that product management was out on a peninsula, and they’d walk the drawbridge back to development and maybe be refused admission. Every few weeks, the high priest of analytics pulls the numbers out of a hat and says, ‘This proves everything!’ ”

That’s all changed thanks to continuous deployment, said Reynolds. “The business part of the organization is no longer out on that peninsula, it actually ends up right in the center. It can be in the center of creating that application. For a typical application, it’s build, deploy, and sometimes people just give up then. At that stage, it’s made no money. It’s the product management team and development team and monetization managers that do that hard bit.

“I think using A/B testing gives [product managers] a way to have their own cycle of development. Because there’s continuous deployment, product management will initially pick low-hanging fruit. This is what we pick this week, come together with development team and say, ‘What do we want to test?’ Then that gets instrumented.”

A/B’s detractors
Of course, SaaS A/B testing platforms aren’t the only way to run such tests. A small kerfuffle occurred in the blogosphere in May when a handful of developers compared the effectiveness of an A/B test with the effectiveness of a multi-arm bandit algorithm.  

Blogger Chris Stucchio wrote that some multi-arm bandit algorithms are less effective than straight-up A/B testing, but he insisted that such algorithms are epsilon-greedy and not the most effective way to implement a bandit. Instead, he advocated the use of an algorithm from the paper entitled “Finite-time Analysis of the Multiarmed Bandit Problem.”

Stucchio showed that the Finite-time paper’s algorithm is only marginally better than a straight 50/50 A/B test. The real key to the multi-armed bandit algorithms is that they take into account the state of the various elements being tested, rather than simply using a straight percentage on all user requests.

Still, even Stucchio advocated the use of any A/B testing rather than none. Reynolds mirrored that statement. On the Web, said Reynolds, the tradition has been to write software, deploy it, and then guess what’s not working. With A/B testing, he said, the guesswork is removed from application design decisions. “It changes the dynamic where the product and business team end up with a concrete way to affect the bottom line that they previously didn’t have,” he said.

“It used to be that you commissioned a change, and then crossed your fingers and hoped you weren’t heading in wrong direction. It was like a cannon: You shoot, bring it back, load again, fire it, and hope it gets there. A/B testing is more like the guided missile: You fire it in the general direction, then steer it wherever you want to go.”