Testing in production comes out of the shadows

Published: August 30th, 2016

- Alex Handy

For enterprise testing organizations, the world of software development is somewhat similar to the world of legalized marijuana. There’s a saying in the legal pot world: “Out of the shadows and into the light.” This is used to describe the way users of illegal weed are now able to come out of hiding and talk about their usage in public without fear of reprisal.

So, too, is it with testing in production. It’s been happening for years, decades even. And yet the idea of talking about testing in production in front of the businesspeople, the operations team, and the CIO seemed like madness.

Testing in production, the traditional wisdom went, meant that the QA team had failed in its goals before shipping. If the tests were run in production, why weren’t they run before production? Why didn’t this get done during development?

In truth, however, it’s extremely difficult to test every scenario, every idea, every possible area of code brittleness when you’re under a deadline. Even large enterprises often struggle to get all their tests done before shipping.

And yet, this is not the real thrust of the testing in production movement. Frankly, it’s more about extending the testing beyond development, rather than about grabbing more time for tests that weren’t already run.

Testing in production is more about the current landscape of software development than it is about covering one’s rear, or about hiding things from those higher up. Instead, it’s endemic to the rise of services, APIs and never-ending development efforts.

In a world of microservices, cloud-based applications and the Internet of Things, testing in production isn’t just a good idea—it’s the only way you can get things done. Testing environments will drift from the production environment, users will encounter unforeseen behaviors, and clouds that are not under your direct control will experience outages and service degradation.

Without testing in place, these types of problems will run rampant. Even worse, they won’t pass critical information back to the development team, removing the all-important feedback loop between QA and development.

John Jeremiah, technology evangelist and lead of the HPE Digital Research Team, said that he’s relieved testing in production is finally being talked about in the open. “I don’t think it’s a new concept by any stretch of the imagination. Calling it ‘testing in production’ is just being more honest about what happens in reality. My background is as an application developer and project leader. Almost every time I’ve gone live with a system, I’ve never been able to complete all the testing we wanted to do. We reached a milestone where we had to ship, and we were at a level of acceptable risk, so we went live,” he said.

“We may not have said we were testing in production because that was unacceptable, but the reality is we were paying very close attention to the system where we thought we had risk,” said Jeremiah.

He added that testing in production often brings performance and load testing into the forefront. “Our load and performance testing are the foundations of our approach to how we help people with DevOps. What we refer to as continuous assessment is about understanding how an application is delivering business value to users,” he said.

“It’s about having insight into what’s happening to users, and giving that info back to the product team so they can respond. This is the fundamental principle in DevOps. It’s about fast, high-fidelity feedback to the development team so they can iterate and react.

“This is not new to what we do,” said Jeremiah. “The challenge people struggle with is the idea that you’re going to allow something that is other than perfect to reach the end users. But it’s about being honest. The other thing that’s happened is we started to realize there’s an amazing amount of things we can learn by listening to users. Web teams have done this for years with A/B testing. We are all unwitting participants in an experiments every time we log into Facebook.”

What is ‘testing in production’?
What, then, does testing in production really mean? Does it entail any significant divergence from traditional testing of the functional, acceptance, smoke, load and performance requirements? Or does this new world of microservices and APIs require that everything is tested in both development and production? Does testing ever stop? It’s a confusing world out here in the sunlight.

Tom Lounibos, CEO of SOASTA, discovered testing in production years ago, almost by accident.

“We were running a test for TurboTax eight or nine years ago. When we run these tests, it doesn’t matter where the test is running. They basically gave us the target to test and we were testing it, ramping up to 200,000 virtual concurrent users on this application. Then we hear the voice of God over the phone in this conference room say, ‘Are you running a performance test on the production site?’ We had five guys around the table and we all backed away from the table and said ‘Oh my God! We’re testing a production server,” he recalled.

“We realized that testing in a lab behind a firewall is kind of ridiculous in the web era. You can’t replicate the Internet behind the firewall. Add in dependencies on third-party services and it’s impossible. The only real true way of testing today is in production. Seventy percent of our load testing is done in production right now.”

Testing. Testing everywhere
Things have definitely changed in the world of testing. Antony Edwards, CTO of TestPlant, said, “Three years ago no one ran test scripts against production systems, but now we have many customers ‘testing in production.’ I remember the first time someone came to us with a ‘testing in production’ requirement (though they didn’t call it that). It started as a really awkward call with us not understanding each other because operations guys and test guys use different terminology. Finally I asked them to draw a diagram for us on the whiteboard and suddenly it all became clear. We’re much better talking to operations teams now.”

Those initial communications difficulties are not quite gone, either. “But ‘testing in production’ means different things to different people,” Edwards said. “For some people it means running your test scripts continually against the live system as a more sophisticated form of monitoring; but these people are still testing pre-production as well. For other people it means only testing in production, i.e. not really testing, just deploying changes straight after coding (maybe to only a small percentage of users) and seeing if they complain. It’s interesting that the same term is being used to describe what I’d consider to be very mature, and very immature, testing.”

Tim Pettersen, senior developer at Atlassian, said that communication issues can be alleviated through the use of proper life-cycle tools. “Testing and version control are tightly intertwined. The best practice is to build and test your code as soon as it’s pushed by a developer,” he said.

Ian Buchanan, developer advocate at Atlassian, said, “This is a place where ecosystem stuff helps. Directly, neither Bamboo nor Bitbucket Pipelines are themselves test tools: You run tests from them. You can kick off the builds in a test grid with something like Sauce OnDemand for Bamboo, then outsource to a test service that can test across browsers or across multiple mobile devices.”

Adrian Cockroft, technology fellow at Battery Ventures and former cloud architect at Netflix, is partly responsible for popularizing the modern approach to application design. Under his watch at Netflix, the company deployed Chaos Monkey, a tool that randomly destroys online servers, ensuring systems are resilient enough to deal with such a scenario.

“I think what’s really happening most recently is the growth of microservices as an architectural pattern. All the monitoring vendors are now booth display-compliant: They all have microservices support on their client. Microservices break the app into small pieces. You need to have a map of those pieces. You need to do end-to-end tracing, like the OpenTracing Project,” said Cockroft.

He went on to state that “On top of that, the pieces are changing continuously. You don’t have one version of the system; it’s changing daily, or many times a day in the extreme cases. When you’re trying to figure out what broke in the system, the important part is to have a fine line to the code to figure out what broke.”

Risky business
Bryce Day, CEO of Catch Software, said that getting to testing in production requires an approach based on risk assessment. “Testing always gets squeezed. It’s always that ambulance at the bottom of the pile. You may start with a month to do testing, and it always gets squeezed down to two weeks. It’s hard to go to management to say ‘We’ve got 1,000 tests, we need time to do them all,’” he said.

Day said that management doesn’t understand some arbitrary number of tests. Rather, it understands risk. “To change the conversation, say ‘What’s the risk profile you’re happy to have?’ If it’s a prototype, maybe I’m happy with 50% risk. Then you can put quality assurance guidelines across that project. For a hugely risky project, you might have only a 5% tolerance. In agile, knowing where to allocate resources early is a key component,” he said.

Risk can be effectively taken into account if it is considered early. “The other side is people are assigning risk to the requirements and stories, and they’re putting little to no emphasis on the frequency of the processes going through,” said Day. “So risk is the impact times, the probability of occurrence. If you look at HP, you have risk assigned to the impact side. If you’ve done all your test cases, your risk is mitigated. We take a more holistic view. For a rare occurrence test case, why would we test that earlier than a frequent test case for a medium risk?”

Wayne Ariola, chief strategy officer of Parasoft, also feels that a risk-based approach to testing can help with testing in production. “Due to agile and more iterative development styles, people have focused from a bottom-up perspective: ‘Are we closing out our user stories associated with testing?’ What they’re asking is, ‘Are you done with a task?’ We have to ask a different question: ‘Is the risk associated with the release candidate acceptable?’” he said.

“It’s a massive transformation. They have to understand what this application impacts. If it’s down, if it’s breached, what is the true impact to the business? Most organizations moving toward Continuous Delivery are beginning to realize this is quite interesting. Across the board, we’re seeing the biggest complaint is that the monolithic infrastructure and overhead associated with managing IBM or HP test suites are not allowing them to achieve their agile or Continuous Delivery objectives.”

But HPE’s Jeremiah doesn’t think that adding numeric risk tracking to a testing platform is terribly helpful. He said that it’s difficult to get users to input the required information. Rather, he said, building in predictive analytics is the secret to becoming more agile.

“People have been trying to do risk-based testing for 20 years,” said Jeremiah. He’s long heard of “the idea of prioritizing requirements with the business, and the impact if it doesn’t work, and [this can be used to figure out risk].

“In my experience, it’s incredibly hard to get people to go through the process of assessing the risk and giving really good input into it. Often, you end up without a good spread. It ends up being very labor-intensive, and a lot of times it never is successful, and they struggle with doing it.”

Jeremiah advocated for an automated approach using machine learning to understand risks and issues before they happen. He said HPE is “providing algorithm insight to know what we can know about the data in the system. It’s going to help teams make better decisions. It’s trying to take the unreliable risk assessment scenario, take that out of the equation, and give people the insight they need so they can make better decisions.”

At the end of the day, bringing testing in production out of the shadows and into the light is all about establishing methods and a formal approach, instead of just better ways to hide those tests running during open hours. This is becoming par for the course as more and more applications rely on third-party APIs for important functionality. This problem becomes even more difficult when your users are adding their own content to your site at a steady pace.

Lubos Parobek, vice president of products at Sauce Labs, said that dynamic websites make for a more complicated testing sequence. “Another thing we’ve seen is some customers have very dynamic websites where users themselves might be adding content or making modifications,” he said.

“In some ways all the changes aren’t in the control of whoever is coding the website. In those instances we want to make it so no one can break it, but we want to continually run tests so when there are changes,” problems are caught early, said Parobek.

Just how does that get done? He said developers shouldn’t duplicate tests for production use. He said that he “would definitely look through your set of functional test cases and determine which of those are the critical ones: The ones you want to take time to check regularly. Next is how often do you want check that? It could be you don’t push changes often, so you just check when you deploy. Or it could be lots of changes happen all the time. Maybe it makes sense to run those every five minutes.”

IBM’s best practices
Glyn Rhodes, product manager of IBM Cloud, said that best practices for testing in production are a new way to ensure stability, but he also stated that not every company is ready for it.

“It’s important to recognize that only certain testing disciplines are generally suitable for execution in production,” he said in an e-mail to SD Times. “One would seldom execute integration testing in production, for example, as a stable foundation that faithfully represents the technology stack is a prerequisite for production testing.

“As ever, it all comes down to risk. Risk in this case can mean a few different things: An IBM customer that creates safety-critical software is very unlikely to risk it; there are laws and regulations to be mindful of. The approach is also best served by a production environment that be controlled, segregated and manipulated at the touch of a button. Cloud and automated deployment play a key part in helping testing to ‘shift right.’ Traditional production environments significantly increase the risk associated with testing in production.

“For example, performance testing is a suitable discipline—assuming the right risk profile. In the recent past, this meant baselining, then testing outside of business hours and subsequently manually resetting your environments after every run. Painful stuff. Cloud management and automated deployment tools will help speed up this process, but they don’t fundamentally change the risk profile.”

Rhodes has a more creative approach to testing in production, however. “A more creative approach comes in the form of the Dark Launch. Using this method, an IBM customer launches new functionality across a subset of customers/infrastructure,” he said. “This enables them to not only control the risk profile of testing in production, but it extends the types of testing that may be executed. A/B testing provides feedback on preferred functionality, and one can even expect a degree of acceptance testing to be executed in these circumstances. Once confidence has been established, the changes are rolled out further.”

So while many organizations may already be testing in production, the movement currently gaining ground in the market isn’t about just blindly doing so. Rather, it’s about adding testing in production to the standard toolbox and treating it like a professional job, instead of a sneaky thing you’re only able to do when the office is empty.

Rules of thumb for testing in production
According to TestPlant’s Edwards, “The great thing about testing in production is that it’s easy to add and it’s hard to go too wrong. Key things to keep in mind:

Consider the load you are putting on your servers. If your servers can handle 1,000 concurrent users and run close to capacity at peak times, don’t run 200 tests at the same time. It may sound unlikely that you’d run 200 tests at the same time, but testing in production is often worried about compatibility (e.g. making sure your website is working on all versions of Firefox, Chrome, etc.), and so it’s actually quite easy to start hitting this number of concurrent tests.

Consider the variants you need to test. Once people are able to easily run tests in production against different client variants (e.g. Windows 8.1 with Chrome 51.0), they can go a bit overboard. You probably don’t need to test 10 different versions of Chrome. I’d typically recommend that people look at their audience and test the environments that cover 95% of their users.

Similarly, consider the tests you need. It would seem intuitive that more is better, but I’ve seen companies drown themselves in test results so that when a key user flow isn’t working, it gets lost in the noise.

In a microservices/cloud architecture, you need to:

• Have well-defined APIs on your services. Not just the syntax but the semantics.

• Test these APIs. A lot. Happy path, sad path, and not-sure path.

• Load test your services. Know when they’ll break, because every service will eventually break.

• Test end to end, i.e. from the client application.

I think too many people are skipping the last step these days. There seems to be an attitude that by testing each individual component you no longer need to test the system. I sometimes use the analogy of a car: It’s like testing the engine, the brakes, the wheels, and so on separately, but never actually testing if the car can drive.”

Article Tags

agile, Battery Ventures, Catch Software, HPE, IBM, IoT, Parasoft, pipelines, testing, testing in production, TestPlant

About Alex Handy

Alex Handy is the Senior Editor of Software Development Times.

View all posts by Alex Handy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Testing in production comes out of the shadows

Article Tags

Subscribe to SDTimes

About Alex Handy

Related Articles

Testlio Takes On AI Chatbot Risk Before It Reaches Customers

Why Test Environments Fail—and What Top Teams Do to Avoid the Chaos

Testing the Unpredictable: Strategies for AI-Infused Applications

Testlio launches new AI-powered QA analysis solution