For enterprise testing organizations, the world of software development is somewhat similar to the world of legalized marijuana. There’s a saying in the legal pot world: “Out of the shadows and into the light.” This is used to describe the way users of illegal weed are now able to come out of hiding and talk about their usage in public without fear of reprisal.

So, too, is it with testing in production. It’s been happening for years, decades even. And yet the idea of talking about testing in production in front of the businesspeople, the operations team, and the CIO seemed like madness.

(Related: Don’t let testing stop your agility)

Testing in production, the traditional wisdom went, meant that the QA team had failed in its goals before shipping. If the tests were run in production, why weren’t they run before production? Why didn’t this get done during development?

In truth, however, it’s extremely difficult to test every scenario, every idea, every possible area of code brittleness when you’re under a deadline. Even large enterprises often struggle to get all their tests done before shipping.

And yet, this is not the real thrust of the testing in production movement. Frankly, it’s more about extending the testing beyond development, rather than about grabbing more time for tests that weren’t already run.

Testing in production is more about the current landscape of software development than it is about covering one’s rear, or about hiding things from those higher up. Instead, it’s endemic to the rise of services, APIs and never-ending development efforts.

In a world of microservices, cloud-based applications and the Internet of Things, testing in production isn’t just a good idea—it’s the only way you can get things done. Testing environments will drift from the production environment, users will encounter unforeseen behaviors, and clouds that are not under your direct control will experience outages and service degradation.

Without testing in place, these types of problems will run rampant. Even worse, they won’t pass critical information back to the development team, removing the all-important feedback loop between QA and development.

John Jeremiah, technology evangelist and lead of the HPE Digital Research Team, said that he’s relieved testing in production is finally being talked about in the open. “I don’t think it’s a new concept by any stretch of the imagination. Calling it ‘testing in production’ is just being more honest about what happens in reality. My background is as an application developer and project leader. Almost every time I’ve gone live with a system, I’ve never been able to complete all the testing we wanted to do. We reached a milestone where we had to ship, and we were at a level of acceptable risk, so we went live,” he said.

“We may not have said we were testing in production because that was unacceptable, but the reality is we were paying very close attention to the system where we thought we had risk,” said Jeremiah.

He added that testing in production often brings performance and load testing into the forefront. “Our load and performance testing are the foundations of our approach to how we help people with DevOps. What we refer to as continuous assessment is about understanding how an application is delivering business value to users,” he said.

“It’s about having insight into what’s happening to users, and giving that info back to the product team so they can respond. This is the fundamental principle in DevOps. It’s about fast, high-fidelity feedback to the development team so they can iterate and react.

“This is not new to what we do,” said Jeremiah. “The challenge people struggle with is the idea that you’re going to allow something that is other than perfect to reach the end users. But it’s about being honest. The other thing that’s happened is we started to realize there’s an amazing amount of things we can learn by listening to users. Web teams have done this for years with A/B testing. We are all unwitting participants in an experiments every time we log into Facebook.”

What is ‘testing in production’?
What, then, does testing in production really mean? Does it entail any significant divergence from traditional testing of the functional, acceptance, smoke, load and performance requirements? Or does this new world of microservices and APIs require that everything is tested in both development and production? Does testing ever stop? It’s a confusing world out here in the sunlight.

Tom Lounibos, CEO of SOASTA, discovered testing in production years ago, almost by accident.

“We were running a test for TurboTax eight or nine years ago. When we run these tests, it doesn’t matter where the test is running. They basically gave us the target to test and we were testing it, ramping up to 200,000 virtual concurrent users on this application. Then we hear the voice of God over the phone in this conference room say, ‘Are you running a performance test on the production site?’ We had five guys around the table and we all backed away from the table and said ‘Oh my God! We’re testing a production server,” he recalled.

“We realized that testing in a lab behind a firewall is kind of ridiculous in the web era. You can’t replicate the Internet behind the firewall. Add in dependencies on third-party services and it’s impossible. The only real true way of testing today is in production. Seventy percent of our load testing is done in production right now.”

Testing. Testing everywhere
Things have definitely changed in the world of testing. Antony Edwards, CTO of TestPlant, said, “Three years ago no one ran test scripts against production systems, but now we have many customers ‘testing in production.’ I remember the first time someone came to us with a ‘testing in production’ requirement (though they didn’t call it that). It started as a really awkward call with us not understanding each other because operations guys and test guys use different terminology. Finally I asked them to draw a diagram for us on the whiteboard and suddenly it all became clear. We’re much better talking to operations teams now.”

Those initial communications difficulties are not quite gone, either. “But ‘testing in production’ means different things to different people,” Edwards said. “For some people it means running your test scripts continually against the live system as a more sophisticated form of monitoring; but these people are still testing pre-production as well. For other people it means only testing in production, i.e. not really testing, just deploying changes straight after coding (maybe to only a small percentage of users) and seeing if they complain. It’s interesting that the same term is being used to describe what I’d consider to be very mature, and very immature, testing.”

Tim Pettersen, senior developer at Atlassian, said that communication issues can be alleviated through the use of proper life-cycle tools. “Testing and version control are tightly intertwined. The best practice is to build and test your code as soon as it’s pushed by a developer,” he said.

Ian Buchanan, developer advocate at Atlassian, said, “This is a place where ecosystem stuff helps. Directly, neither Bamboo nor Bitbucket Pipelines are themselves test tools: You run tests from them. You can kick off the builds in a test grid with something like Sauce OnDemand for Bamboo, then outsource to a test service that can test across browsers or across multiple mobile devices.”

Adrian Cockroft, technology fellow at Battery Ventures and former cloud architect at Netflix, is partly responsible for popularizing the modern approach to application design. Under his watch at Netflix, the company deployed Chaos Monkey, a tool that randomly destroys online servers, ensuring systems are resilient enough to deal with such a scenario.

“I think what’s really happening most recently is the growth of microservices as an architectural pattern. All the monitoring vendors are now booth display-compliant: They all have microservices support on their client. Microservices break the app into small pieces. You need to have a map of those pieces. You need to do end-to-end tracing, like the OpenTracing Project,” said Cockroft.

He went on to state that “On top of that, the pieces are changing continuously. You don’t have one version of the system; it’s changing daily, or many times a day in the extreme cases. When you’re trying to figure out what broke in the system, the important part is to have a fine line to the code to figure out what broke.”

Risky business
Bryce Day, CEO of Catch Software, said that getting to testing in production requires an approach based on risk assessment. “Testing always gets squeezed. It’s always that ambulance at the bottom of the pile. You may start with a month to do testing, and it always gets squeezed down to two weeks. It’s hard to go to management to say ‘We’ve got 1,000 tests, we need time to do them all,’” he said.

Day said that management doesn’t understand some arbitrary number of tests. Rather, it understands risk. “To change the conversation, say ‘What’s the risk profile you’re happy to have?’ If it’s a prototype, maybe I’m happy with 50% risk. Then you can put quality assurance guidelines across that project. For a hugely risky project, you might have only a 5% tolerance. In agile, knowing where to allocate resources early is a key component,” he said.

Risk can be effectively taken into account if it is considered early. “The other side is people are assigning risk to the requirements and stories, and they’re putting little to no emphasis on the frequency of the processes going through,” said Day. “So risk is the impact times, the probability of occurrence. If you look at HP, you have risk assigned to the impact side. If you’ve done all your test cases, your risk is mitigated. We take a more holistic view. For a rare occurrence test case, why would we test that earlier than a frequent test case for a medium risk?”

Wayne Ariola, chief strategy officer of Parasoft, also feels that a risk-based approach to testing can help with testing in production. “Due to agile and more iterative development styles, people have focused from a bottom-up perspective: ‘Are we closing out our user stories associated with testing?’ What they’re asking is, ‘Are you done with a task?’ We have to ask a different question: ‘Is the risk associated with the release candidate acceptable?’” he said.

“It’s a massive transformation. They have to understand what this application impacts. If it’s down, if it’s breached, what is the true impact to the business? Most organizations moving toward Continuous Delivery are beginning to realize this is quite interesting. Across the board, we’re seeing the biggest complaint is that the monolithic infrastructure and overhead associated with managing IBM or HP test suites are not allowing them to achieve their agile or Continuous Delivery objectives.”

But HPE’s Jeremiah doesn’t think that adding numeric risk tracking to a testing platform is terribly helpful. He said that it’s difficult to get users to input the required information. Rather, he said, building in predictive analytics is the secret to becoming more agile.

“People have been trying to do risk-based testing for 20 years,” said Jeremiah. He’s long heard of “the idea of prioritizing requirements with the business, and the impact if it doesn’t work, and [this can be used to figure out risk].

“In my experience, it’s incredibly hard to get people to go through the process of assessing the risk and giving really good input into it. Often, you end up without a good spread. It ends up being very labor-intensive, and a lot of times it never is successful, and they struggle with doing it.”

Jeremiah advocated for an automated approach using machine learning to understand risks and issues before they happen. He said HPE is “providing algorithm insight to know what we can know about the data in the system. It’s going to help teams make better decisions. It’s trying to take the unreliable risk assessment scenario, take that out of the equation, and give people the insight they need so they can make better decisions.”

At the end of the day, bringing testing in production out of the shadows and into the light is all about establishing methods and a formal approach, instead of just better ways to hide those tests running during open hours. This is becoming par for the course as more and more applications rely on third-party APIs for important functionality. This problem becomes even more difficult when your users are adding their own content to your site at a steady pace.

Lubos Parobek, vice president of products at Sauce Labs, said that dynamic websites make for a more complicated testing sequence. “Another thing we’ve seen is some customers have very dynamic websites where users themselves might be adding content or making modifications,” he said.

“In some ways all the changes aren’t in the control of whoever is coding the website. In those instances we want to make it so no one can break it, but we want to continually run tests so when there are changes,” problems are caught early, said Parobek.

Just how does that get done? He said developers shouldn’t duplicate tests for production use. He said that he “would definitely look through your set of functional test cases and determine which of those are the critical ones: The ones you want to take time to check regularly. Next is how often do you want check that? It could be you don’t push changes often, so you just check when you deploy. Or it could be lots of changes happen all the time. Maybe it makes sense to run those every five minutes.”

IBM’s best practices
Glyn Rhodes, product manager of IBM Cloud, said that best practices for testing in production are a new way to ensure stability, but he also stated that not every company is ready for it.

“It’s important to recognize that only certain testing disciplines are generally suitable for execution in production,” he said in an e-mail to SD Times. “One would seldom execute integration testing in production, for example, as a stable foundation that faithfully represents the technology stack is a prerequisite for production testing.

“As ever, it all comes down to risk. Risk in this case can mean a few different things: An IBM customer that creates safety-critical software is very unlikely to risk it; there are laws and regulations to be mindful of. The approach is also best served by a production environment that be controlled, segregated and manipulated at the touch of a button. Cloud and automated deployment play a key part in helping testing to ‘shift right.’ Traditional production environments significantly increase the risk associated with testing in production.

“For example, performance testing is a suitable discipline—assuming the right risk profile. In the recent past, this meant baselining, then testing outside of business hours and subsequently manually resetting your environments after every run. Painful stuff. Cloud management and automated deployment tools will help speed up this process, but they don’t fundamentally change the risk profile.”

Rhodes has a more creative approach to testing in production, however. “A more creative approach comes in the form of the Dark Launch. Using this method, an IBM customer launches new functionality across a subset of customers/infrastructure,” he said. “This enables them to not only control the risk profile of testing in production, but it extends the types of testing that may be executed. A/B testing provides feedback on preferred functionality, and one can even expect a degree of acceptance testing to be executed in these circumstances. Once confidence has been established, the changes are rolled out further.”

So while many organizations may already be testing in production, the movement currently gaining ground in the market isn’t about just blindly doing so. Rather, it’s about adding testing in production to the standard toolbox and treating it like a professional job, instead of a sneaky thing you’re only able to do when the office is empty.

Rules of thumb for testing in production
According to TestPlant’s Edwards, “The great thing about testing in production is that it’s easy to add and it’s hard to go too wrong. Key things to keep in mind:

Consider the load you are putting on your servers. If your servers can handle 1,000 concurrent users and run close to capacity at peak times, don’t run 200 tests at the same time. It may sound unlikely that you’d run 200 tests at the same time, but testing in production is often worried about compatibility (e.g. making sure your website is working on all versions of Firefox, Chrome, etc.), and so it’s actually quite easy to start hitting this number of concurrent tests.

Consider the variants you need to test. Once people are able to easily run tests in production against different client variants (e.g. Windows 8.1 with Chrome 51.0), they can go a bit overboard. You probably don’t need to test 10 different versions of Chrome. I’d typically recommend that people look at their audience and test the environments that cover 95% of their users.

Similarly, consider the tests you need. It would seem intuitive that more is better, but I’ve seen companies drown themselves in test results so that when a key user flow isn’t working, it gets lost in the noise.

In a microservices/cloud architecture, you need to:

• Have well-defined APIs on your services. Not just the syntax but the semantics.

• Test these APIs. A lot. Happy path, sad path, and not-sure path.

• Load test your services. Know when they’ll break, because every service will eventually break.

• Test end to end, i.e. from the client application.

I think too many people are skipping the last step these days. There seems to be an attitude that by testing each individual component you no longer need to test the system. I sometimes use the analogy of a car: It’s like testing the engine, the brakes, the wheels, and so on separately, but never actually testing if the car can drive.”