Signs of a “Testpocalypse” loom on the horizon, Alberto Savoia (costumed as the Grim Reaper) told the audience. It was the keynote at the Google Test Automation Conference last October, where the theme was “Cloudy with a Chance of Tests.”
“Hiring and recruiting for testers is waaaay down. Testers are being commoditized, and there’s an exodus of test leadership,” he said. Finally, “More and more companies are shifting to ‘FrAgile’ post-agile ‘testing.’ ”
Indeed, outsourcing quality assurance is now the norm for 70% of respondents to the June 2011 World Quality Report conducted by Capgemini, HP and Sogeti. But cloud adoption is causing new demand for testing and QA, according to the survey.
“The day of the performance tester is going to come back,” said Kelly Emo, director of application product marketing for HP Software. “When you think about what cloud brings to the equation of testing, there are a couple of levels. For teams developing apps where part or all of it will be running in the cloud, that’s a whole new level of concerns across all three aspects of testing: functional, performance and security. The other level is, how can you harness the cloud to make you more effective as a tester?”
Savoia isn’t really a pessimist about testing, but he does see it as a rapidly changing field. Even the name “tester” is suspect, he said, and should be replaced by something new to reinvigorate awareness of the value of people who break software for a living.
According to Emo, “I think that what’s happening is that people are categorizing too much. More testers are becoming developers, and developers are becoming testers. Testers today have to know the fundamental architecture, or whether there are pieces running on iOS or Android hitting their app in the cloud. And to be more responsive to business, developers need to focus on regression testing, unit testing and basic functional testing so that testers can focus on high-value fringe cases, integration or end-to-end, and exploratory testing.”
The cloud over testing
In keeping with its stepchild relationship to application development, testing was not on thought-leader lips when cloud hype thundered in a few years ago. Yet it’s now clear that testing may in fact be one of the best cases for cloud adoption, exceeding that for deployment.
“This is an interesting time to be in cloud computing,” said Brian White, VP of products for Skytap, a cloud automation startup founded more than four yeas ago in Seattle.
“Because of the concern around security of data, enterprises are looking at cloud in the dev and test area, not for touching customer-sensitive data that’s in their production apps.”
According to Brett Goodwin, Skytap’s VP of marketing and business development, the No. 1 use case for Skytap’s 150 customers is development and testing. “After that, the customer can deploy their tested apps in their own data center or on to some other cloud. But the more common scenario is they move those final apps into their own environments.”
Vendors such as Keynote Systems, LoadStorm, Parasoft, Skytap, SOASTA and more are the first to see where companies are placing their cloud bets. Private clouds are definitely popular.
“I’d say 15% of our customers are using private or hybrid cloud for load generation, but to deploy to private cloud, it’s way less,” said Dan Bartow, vice president and chief evangelist of SOASTA.
“But this year, for the first time, SOASTA has two or three paying customers that have private clouds that they use for deployment. It’s on our road map to integrate with a private cloud API, such as Nimbula’s.”
It’s all part of the evolution of virtualization, according to Theresa Lanowitz, founder of the San Francisco consultancy Voke. She characterizes three stages of virtualization, starting with the now-mature server market, which targeted data centers and was dominated by VMware. Then there’s application and desktop virtualization, which Citrix dominates, whereby a private cloud service delivers desktops and apps to any user on any device. Here, the primary users are service centers. And at the earliest stage of adoption is life-cycle virtualization, where vendors are competing for primacy. The user base? Development, operations and testing.
Virtual lab management is the Holy Grail for application life-cycle management, Lanowitz posits. That ties in with the Skytap story. “If you’re doing a hard press on testing, it’s not feasible to give the test team multiple labs and environments. But with Skytap it’s possible to create these complex environments with multiple machines, load balancing, firewalls, etc. and simulate the production environment earlier in the development cycle,” said White.
Another challenge in the heterogeneous landscape of cloud computing is when the developer can’t reproduce the multi-tier test environment, he explained. “With Skytap, that entire environment can be quickly saved as a template,” he said.
“We found our customers like the ability to quickly create a snapshot of an entire virtual data center, including all of its machines, data, memory state, network settings and application state. You can save that off, then a developer can start up their own copy. It’s a really efficient way—if the QA team finds a bug—for the dev team to reproduce the bug in its own environment.” One such feature is the “publish” button, which saves the configuration as a URL.
With Skytap, a development manager can create a template for multiple standardized development and test environments in the Skytap Cloud Library, from which developers and testers can quickly serve themselves to new environments. Destructive or security tests are no problem: The configuration is network fenced and can be redeployed from the source template to return to a clean slate. Further, the management console offers a host of administrative features such as seeing compute and storage consumption of teams broken down over time.
“A lot of companies worry that when they go to cloud, users will be using resources willy-nilly,” said White.
The power of parallelization
While virtual test lab management saves on procurement and provisioning time and dollars, it offers a less-noted power: parallelization. Software developers and testers are “still remarkably single-threaded in the way we operate,” wrote guest author Patrick Lightbody in “The Cloud at Your Service.”
“Whether it’s automated unit tests, which almost always run in sequence, or the agile-less inefficiencies that organizations unknowingly employ, such as forcing their test team to wait idle for the next release candidate to be pushed to the testing environment, the software industry has been surprisingly slow to use parallelization to improve its own internal processes.”
Beyond parallelizing unit tests, continuous integration is another opportunity for test improvement: Skytap’s API can operate test configurations programmatically, for example. “When developers check in code, the software is built and a set of smoke tests are run to pick up problems immediately, without human intervention,” said White.
The API lets you “create new configurations from templates, publish configurations to remote team members, add new machines to existing configurations, and kick off new builds and test scenarios based on applications running in Skytap Cloud,” according to documentation.
IBM’s recent acquisition of Green Hat fills a similar gap in its dev and test cloud offering; Green Hat’s virtual environments can comprise multiple infrastructure tiers and components, and serve as a continuous test environment for early or parallel testing during the development cycle.
Multiple opportunities loom for test center virtualization, but the most obvious use case for cloud testing is still load and performance, where terms such as capacity, stress, spike and failure come into play. SOASTA, a Mountain View, Calif.-based startup launched in 2007 for performance testing in the cloud, is the leader here.
“The classic problem with testing is performance, and there have been performance testing products—HP (formerly Mercury) LoadRunner, and Micro Focus (formerly Borland) SilkPerformer, to name but two—around since the late 1980s. But there has not been a modern performance testing tool since then,” said Bartow.
SOASTA’s CloudTest platform is the answer to the question that founder Tom Lounibos was asked in 2002: “How do we simulate 100,000 users hitting our application at once?” At the time, a mere 4,000 users could cost US$1 million and require 800 servers, not to mention a powerful and as-yet-nonexistent analytical engine. Today, the company has 3,000 customers, among them e-commerce giants such as Intuit and Target, and performance tests can scale to many millions of concurrent users.
“We don’t own data centers at SOASTA,” said Bartow. “When I have a customer who comes along and wants to test for two hours, the cost for us is 34 cents an hour—the cost of a cloud server from Amazon. I can get 1,000 servers from EC2 cloud in under 10 minutes.”
Though the company aggressively competes on price and speed of provisioning against other offerings such as Keynote, that’s not its only selling point. “It’s not just about scale testing, it’s about performance testing. What does performance mean? It’s ambiguous, but it doesn’t only mean scale or load,” Bartow said.
SOASTA’s analytics can help companies retain or accommodate more customers by reducing bandwidth-hogging features of their websites, or by properly configuring load balancers. “We also do functional test automation. We bridge that gap between lab testing and external testing as a full complement to the software development life cycle,” said Bartow.
There is also the need to emulate the components in the service-oriented architecture of an application. Parasoft Virtualize is a “unique solution for the marketplace,” said Wayne Ariola, VP of strategy for Parasoft, a 25-year-old ISV. “We look at the traffic going from a cloud-based service to its dependent app. Parasoft Virtualize captures that traffic—the necessary interactions—and then allows the tester to replay those interactions whenever they need.”
Similarly, Parasoft SOATest (packaged with Parasoft Load Test) automates end-to-end testing, validating complex transactions beyond the message layer through a Web interface, ESBs, databases, and everything in between. “This allows you to do massive composite integration testing. And let’s say we also have a dependency on a mainframe. Biggest problem is that teams need to schedule time within mainframe downtime to execute tests. We can do the same thing with the mainframe. Now testers are able to cut the connection to the mainframe and test at their leisure,” said Ariola.
HP Service Virtualization software does the same thing, allowing project teams to access simulated resources, reducing delays and managing costs. Further, services from external vendors may have pay-per-transaction fees, be part of a production system that can’t be tested on, or have security limitations that make them hard to use.
“Your e-commerce app might be doing a credit check with Experian or shipping through FedEx,” said Emo. “HP Service Virtualization enables testing teams to make forward progress when those cloud services are not available. You can record and replay the access to that service or create a virtual version. That virtual service is available in your test lab for functional tests, or whatever you want. You use the interface definition language and put that in your test lab.
“More advanced customers will put it in their ALM repository and script for functional or load testing against it.”
HP resold a solution from iTKO for a while, according to Emo, but “We wanted to create our own organic version that was part and parcel of our ALM platform.” That integration of HP ALM with HP Service Virtualization means users can set up and control virtual environments from automation tools, as well as dump the results into test reports.
But with its longstanding history in testing, HP faces a problem of its own with virtualized testing labs: how to compete in a marketplace that eschews software licensing in favor of metered, cloud-based usage. LoadStorm, for example, puts its per-hour load test pricing on page one of its website: $39.90 for up to 1,000 maximum concurrent users, and $3,990 for 100,000. In contrast, HP LoadRunner licensing is notoriously complex, but Emo touted Virtual User Days as a low-cost way to access HP testing expertise as needed for 24 hours.
The mobile experience
But there’s another sea change driving performance concerns: mobile devices. “If the cloud was the hot thing from 2008 to 2010, everyone’s agenda went to mobile initiatives in late 2011,” said Bartow. “It’s been ‘mobile, mobile, mobile’ since then. That focus on mobile is doing good things for performance and performance testing.”
There’s also a big concern around the disparity between carrier networks. “Users are more forgiving, but they also are using their mobile device out in the field at a moment when they need something, so there’s an increase in expectation that it will be fast and reliable,” said Bartow.
SOASTA, like many others, only recently waded into the mobile shallows with an offering in the last quarter of 2011, with another announcement coming in February.
When compared to a browser connected to the Internet, a phone or tablet can take 10 seconds to load. “Folks don’t realize what we’re used to on an Internet connection. On the phone, performance is going to be slower. When designing apps and sites, you need to make sure that even at small scale, the customer experience is good enough,” he said.
Another issue is one of architecture, he said. “How is your mobile site going to perform when your main site is at peak performance?” SOASTA customers Sears and Target are rolling out large mobile e-commerce offerings, and they must make IT and developers aware that these entry points are all part of the same app stack, sharing the same database as the main e-commerce site.
“It’s not just a workload on www.target.com, it’s a shared workload,” said Bartow. “There may be a disjoint between application development and IT and operations teams. They’re not always asking questions like, ‘Hey, this mobile app is going to use up this much bandwidth.’ ”
Keynote, a SOASTA competitor, acquired a cloud-based platform for testing and monitoring mobile websites and apps late last year. Its DeviceAnywhere is a service for performance-testing apps on 2,000 devices across global networks.
Could mobile change the way testers themselves access their virtual labs? Not likely, said Parasoft’s Ariola. “It would be difficult to construct complex test suites over a mobile device. What we’re doing a lot is connecting to a developer or QA person’s life a lot faster, giving reports out or allowing execution commands via mobile devices—for instance restarts, or when a test run has stopped. Getting notifications over mobile devices is great,” he said.
If the question of whether cloud testing refers to tools inside or out of the cloud to test applications in the cloud, the concept of testing mobile apps for testing cloud apps seems even more infinitely looped.
Getting there from here
“Testing apps in the cloud, from the cloud, is an interesting challenge,” said Bartow. “We publicly announced in 2008 or 2009 our first global test cloud. What we meant was we use multiple public and private clouds from different cloud providers, such as Amazon EC2, Rackspace and Microsoft Azure.”
But the company found an interesting quirk early on: “Testing apps from Amazon cloud on Amazon cloud gave very unrealistic results,” he said. “The Amazon data centers are sophisticated enough that they optimize the workload, and the results are too fast. It’s not really external testing, it’s all back-channel.” In order to give customers a true picture of their application’s load-handling capability, testing must be cross-cloud.
There’s also a bit of strategy involved in choosing regional availability zones from a cloud vendor for testing. For example, if time is of the essence and cost is key, working with instances in a geographically nearby region provides for a fast data exchange rate. In Amazon EC2, such data transfers, using internal IPs and private DNS names, cost nothing.
But there are circumstances where costs around a tenth of a cent per GB may kick in, such as transferring between different availability zones in one region, or when using external IPs, public DNS names, or elastic IPs in a single availability zone. Data transfer across different regions is billed as external Internet traffic. The minor additional cost may be worth the peace of mind of knowing load testing has mimicked real conditions.
Ultimately, cloud may make the art of testing more apparent. In a presentation at Google’s Test Automation Conference, Vishal Chowdhary described how the Microsoft Translator team attempted to “hit the moving target” of cloud testing. The challenges of testing a large-scale Web statistical machine translation that has output in 37 languages, is not strictly defined, and that continuously improves are numerous. The developer must take care around personally identifiable user information and find strategic ways to deal with mushrooming data volumes, while rigorously monitoring everything.
The load-testing challenges are varied, too; load characteristics depend on the percentage of traffic for different language pairs, what’s translated, infrastructure changes, and API calls by third parties. The goals for load testing, according to Chowdhary, are to “sustain traffic, get better and degrade gracefully.” His team looks at throughput, latency and how accurate the translations are, as well as the CPU and memory demands of the application and whether the machine is “healthy” overall.
While the team uses some crowdsourcing for this purpose, testing applications such as Microsoft Translate doesn’t sound like a task that will be shrinking or becoming any less complex in the near future. And that’s just the vision testing evangelists have been spreading: In the cloud, testing will only become more critical.
The million-dollar mousetrap mistake
As the comic books say, with great power comes great responsibility—or at least the temptation to automate too much. In “Test Automation Snake Oil,” a 1999 article by James Bach, there is a blackout that occurred during “the unattended execution of the wonderful test suite that my team had created. When we arrived at work the next morning, we found that our suite had automatically rebooted itself, reset the network, picked up where it left off, and finished the testing. It took a lot of work to make our suite that bulletproof, and we were delighted. The thing is, we later found, during a review of test scripts in the suite, that out of about 450 tests, only about 18 of them were truly useful.”
The moral? “We had a test suite that could, with high reliability, discover nothing important about the software we were testing. I’ve told this story to other test managers who shrug it off. They don’t think this could happen to them. Well, it will happen if the machinery of testing distracts you from the craft of testing,” he said.
“We try to help customers build inclusive test plans,” said SOASTA’s Bartow. “Our struggle is getting people to do the right amount of testing. This holiday season, there was a record level of uptime for retailers. But only 10% of retailers are doing comprehensive testing for reliability and fail-over. When I was at Intuit, we’d do it very regularly. We’d yank the cords out of servers.”
And while change is inevitable, much of it will be in the form of catching up to Google’s advanced automation scenarios, predicted HP’s Emo. “I went to Software Test Professionals Conference last year. The hot topics were… mobile was No. 1, then using bots, citizen developers, and crowdsourcing.”
Indeed, Google’s open-source test tools such as BITE, Quality Bots, Test Analytics and ScriptCover are taking novel approaches to crawling or crowdsourcing common test scenarios. But it takes more than just enough eyeballs to detect some bugs: A system for finding and blocking adversarial advertisements requires human discrimination.
According to “Detecting Adversarial Advertisements in the Wild,” a paper published in 2011 by ACM, “Acquiring hand-labeled data represents a significant cost, requiring expert judgment to navigate intricate policies and to recognize a wide variety of clever adversarial attacks. Our pilot experiments testing low-cost, crowdsourced rater pools showed that crowdsourcing was not a viable option to achieve labels of the needed quality for production use. Thus, we rely on more expensive, specially trained expert raters. It is therefore critical that we make the most efficient use of this limited resource.”
But there are early innovators, and then there’s everyone else. “A lot of enterprises are going, ‘That’s great, but I’m still at quality maturity level one.’ They’re still looking at how to manage all their test cases, and get more automated test scripts so they can free up time to do more exploratory testing,” noted Emo.
Perhaps the “Testpocalypse” will be postponed.
What to do about data?
Cloud applications can be trickier to manage when it comes to data-intensive operations. Testing for millions of concurrent users will itself generate terabytes of data. And data transactions against storage usually cost around a tenth of a cent, in addition to adding bandwidth costs. Highly readable, verbose data formats can also bloat files and add cost.
“We have customers with test environments with 3TB of data on Skytap,” said Skytap’s White. “They often have a production database or replica that’s just massive. Using the VPN is a way to connect the app under test to the database.”
But the latency between servers and databases is a challenge. The reliability of the connection is key as well. “We have with our VPN a self-healing technology that will detect if it’s down,” said White.