Performance is all around us. It’s something we experience every moment of our waking lives, whether we’re counting on a response from some specific device, network, vehicle, whatever system we’re in the habit of using. “Things” either perform as we expect them to, or they don’t.
When systems perform as expected, all is well. You barely even notice. We make our requests and receive the responses or the results as usual; and naturally we pay zero attention to the system’s performance aspects, because that’s what it means to expect something. And when systems occasionally exceed our expectations, we’re impressed for, like, a second… Wow!
But what a difference when performance fails you. You pay close attention, because something has gone wrong. A new mobile website your friend has recommended is taking forever to load. A banking application you’ve used at least a hundred times is failing to respond. It can be something as trivial as your trusty morning toaster, which suddenly has stopped working. Maybe that’s not so trivial.
(Related: How to put testing back into DevOps)
Of course, when a system fails mid-flight or mid-transaction, the results can be catastrophic in many ways. Whatever the matter, when performance fails, you suddenly have to change your plans, find some alternative system, or—worst case for most of us—call the help desk, because with certain complex or critical systems or important transactions, there is simply nothing else you can do. Ugh. Someone say a prayer!
In business, expectations are very high these days, and the stakes are even higher. How can professionals deliver against these high expectations for performance? I strongly believe that doing performance testing is no longer enough. Organizations are finding themselves needing to transform to the more comprehensive discipline of performance engineering.
The new culture of performance engineering
The practices around performance engineering represent a cultural shift in how organizations understand that fuzzy, hard-to-pin-down aspect of end-user requirements including: “performance.” But performance engineering is also a hardware thing, a software thing, and because good performance is about people satisfying customers, it’s a business thing.
It’s most important to start with culture. Regardless of your role (or the business unit you work for, or your title), the idea is to build performance into everything you do. If you understand this, then you have the fundamental principle of performance engineering under your belt, and you can get started by applying this cultural awareness to real results.
Imagine that you want to build a website for your business, and you want it to be really fast and responsive. Fine. Who can argue with that? But the subjective quality “fast and responsive” totally depends on many variables, including how many concurrent users you have, and the type of content you want to serve up. Will you have text only? Streaming video? What types of devices will people be using to access the site? What types of connections will you support? In what geography will your users be accessing your products or services? The answers to these questions will determine what it takes to make your website’s performance truly “great” for your users.
Which simply means that the quality of being fast and responsive in this case is context-dependent. A few years back, I was a part of a team that built PlanetOrange.com, which was intended to teach the rudiments of financial literacy to children in an interactive, game-like experience. For them, the site was fast and responsive, and nearly all players were on high-speed LAN connections on good desktop computers. We were not coding a real-time trading app for Wall Street traders to be used on the floor of the NYSE with a poor 2.5G network connection on an iPhone 5. That’s what I mean by “context-specific.”
Let’s start with agile
Performance engineering plays an important role in agile and DevOps. A good starting point for understanding both the value of DevOps and its transformational character is agile development. Agile methods have been around for a decade and a half, but most mid-sized and large organizations are only now going through that adoption process. There are many reasons why they’re pursuing this, and there are many places where the agile transformation starts, including different interests among the different stakeholders.
Often, what happens is that the delivery process ends up moving many features and functions more quickly from development to production, to the point where the production (or Ops) team says “Hold on, we can’t take this.” The Ops team hesitates, simply because they don’t want to impact the stability of the production environment. Of course, this prevents that new high-value feature or function from getting to the end users and blocks the projected value for the business.
I’ve seen organizations that say they’re doing performance engineering and agile development, along with DevOps, but who only view performance as an afterthought. Performance testing in many of these organizations is the job of a separate team, typically at the end of the life cycle. And those teams are given, say, five days or less to run their performance tests before the product is released. In cases like that, no matter what flag the team is carrying—whether that’s waterfall, agile, quality assurance, performance testing, performance engineering, Ops, or DevOps—the work that’s needed on the actual performance capability is coming too late in the game.
My question to these teams is simple: What if performance were built into the culture of the organization and considered a “way of life” for all individuals, all collaboratively working to deliver the highest-value capabilities at the highest quality as fast as possible to your end users? For developers and testers, this would enable results at a build level, while your unit tests are running. You’d start to get real performance results almost immediately. At an architecture level, the choices of build vs. buy would be easier, because verification of commercial off the shelf, acquisitions and integrations to your environment could be understood in terms of their performance value. The next thing you know, your extended team members and stakeholders will start thinking like performance engineers, as people who want to establish effective performance engineering on their team and across their entire business.
Performance at the user story level
How soon should performance issues be addressed? From my experience, I argue that it should start in the earliest phase, when granular project deliverables, like user stories, are being planned and described. Consider what happens when a user story comes to light. Mike Cohn, and others who write about agile development, have long advocated use of an index card, or maybe a sticky note, for capturing user stories. The idea being, if you can’t fit you user story on the space of a 3×5 index card, you haven’t honed your story down to its essentials yet.
Writing a good story in agile development involves things as simple as defining whom the actor is and how they will interact to get expected results. But for performance engineering, there are additional considerations. When teams start breaking down epics into stories, they also need to think about acceptance test criteria and specific tasks. I suggest they use the back of the card to capture these things as part of the user story.
That means you can start to build in performance at a very early stage. As you’re working with a businessperson, an analyst or the end user, the acceptance test criteria you’ve captured shows that you’re implementing this story correctly, including the things that it must do. This is what’s missing today in the way requirements are understood and managed, and that represents a huge gap in effective performance engineering practice.
Mike Cohn’s user story template helps you to think about who, in whatever role or capacity, needs to perform what activities in order to get a certain result. From a functional perspective, this user story template gives you a rudimentary framework for building in performance consideration. For example, imagine you’re playing the role of a moderator who wants to create a new game by entering a name and an optional description, so that you can start inviting estimators. What’s missing from the basic user story are things like, “How many people concurrently will enter a new name and optional description?” “How many people will invite other estimators?” and “What types of devices will they be using? What are the network conditions, and in what distributed world geographies?”
This is why I suggest that all of these acceptance test criteria related to performance are captured on the back of a user story card, in addition to the essential elements of the user story on the front of the card. This should be adopted across teams into their “doneness criteria” for each story and release. That’s how to start building performance into your stories from the beginning with your stakeholders. Not only does this help with the subjective fast and responsive quality mentioned above, it also furthers the relationship and results delivered to Ops, stakeholders and your end users.
Predictive analytics have a role in this as well. Leveraging the results delivered into your production environment today, and learning what is causing issues in production, enable you to highlight these and learn from the start. As an example, I have a customer who is able to predict (within about a 40% range of accuracy) whether or not a particular user story is going to create a production incident. At first blush you may say “only 40%?” After looking at this in more detail, this enabled them to move from 0% to 40%; what would it mean for you if you could reduce the number of performance related production incidents by 40%?
I firmly believe that capturing the performance requirements, based on acceptance test criteria at the level of each user story, along with continuous assessment from production results, enables them to get much more accurate predictions and improved production stability with speed/quality/responsiveness.
Why old methods lead to poor performance results
At a certain level, effective performance engineering doesn’t require a specific methodology. However, waterfall methods traditionally leave performance until the end of the cycle, and that’s not good for anyone. A traditional waterfall cycle has requirements defined up front. Then you go through estimates and approvals. Next comes development, functional tests, some security scanning, and eventually, at the very end of the cycle, you get around to performance testing, often a few days or weeks before the scheduled release date. And of course this is compressed, since everything else ran over schedule. And then there are all the last-minute code fixes, so you get one last “Go/No Go” run the day or night before the release to your end users.
This is a terrible situation for the performance tester—often this role is left to one person—who has to make the go/no go recommendation based on these last-minute results. I’ve been there too many times. For example: You have a system that is going to impact 9 million users, with 180,000 people per hour logging in. If you find an issue, you have to be damned sure that this massive system, the one an entire team has worked on for months, actually has to be delayed because of what you just found.
Of course, if you don’t make the no-go decision, it will soon be your team who’s called in for remediation, urgently required to find, fix and retest a production incident, which has already impacted your end users and business. Needless to say, when performance testing is left to the end like this, quality suffers and the whole team suffers—including your end users.
Correlation between performance and user satisfaction
But it’s not just last-minute performance testing that spells the problem here. Another underlying problem is the lack of focus on the end user and what the experience of using the product or service will mean to that user. While there are many things that can cause performance issues in production, the perception of the end user is something we need to consider in this highly competitive, highly volatile marketplace. Let’s take usability, for example.
How is usability related to performance? The correlation depends on to whom you’re speaking. When it comes to the end user, regardless of what’s happening behind the scenes, if their experience in using your product or service does not meet their expectations, they will say they’ve “had a poor experience.” How many times can you disappoint that particular user? How long will they be willing to come back and try your capability again?
Most likely, they will simply hop over to one of your competitors and try to make a purchase or acquire some alternative product or service, because they’d rather explore something new than deal with what they perceive as poor performance from you.
And here’s an important note: Most users do not care why an experience is poor. Maybe your web server happened to be busy loading heavy images at the moment that particular user hit your site; maybe it was the first time that user was engaging with the app, which required a (previously undisclosed) download of many other files. Whatever the reason for a bad experience, the user is most likely to chalk it up to poor performance. It doesn’t matter if it’s actually a security issue, a functional issue, a UI issue—it doesn’t matter! The performance of your app is to blame. The same can be said if an online retailer fails at delivering in the last mile, of getting the package you ordered to your door. That was your simple expectation, and now they’ve failed you.
But what about complex systems? True, not all systems can be judged according to certain gold standards of first-rate user experience, as with, say, the Apple iPhone. Many systems are necessarily complex, and the user experience has to be understood by both the user and the provider accordingly.
Take some of the larger systems out there: ERP (enterprise resource planning) systems, or core banking systems, or real-time stock trading systems, or travel booking and scheduling systems, especially when these systems become distributed over data centers and geographies in order to support a large user base. These are large enterprise systems, which are frequently growing to accommodate changing reporting requirements and regulations, which in turn require continuous education of the user base. How a user experiences performance with these systems is a tricky thing to measure. But stakes are high, and measurements must be made.
Not long ago, a large manufacturing company located in the central United States bought a company in Japan. When the Japanese team first attempted to log into the new parent company’s ERP system, it took many minutes going screen to screen.
Put yourself in the newly acquired Japanese company’s shoes. I’d call this exponential delay a performance issue that unduly affected the efficiency of employees. Not only is the new system they have to use extremely large and complex, it’s also painfully slow. The already big problem of complexity has been magnified by excessive delays, which makes an inherently poor user experience even worse.
In the previous example, could the user interface have been designed more efficiently? Could the transition team have engineered that system in a way such that it was more efficient and more responsive? Possibly. Anticipating the impact of poor internal system performance on your employee base can be critical, especially when those systems are connected directly—even in some tangential way—to your customers.
Remember my comment earlier about having to call a help desk? No one likes having to call a customer service person, or as it’s generically known, the help desk. You know you’re going to be on the phone for much longer than you want to be, and furthermore you’re not getting anything done on your actual job. I especially hate it when I’m on that support call, and the person on the other end says, “Oh, please bear with me; my system is really slow right now.” That frustrates me to no end. As the customer, I’m not only experiencing whatever issue I called about in the first place, I’m now experiencing the poor performance of their help system as well.
Horrors for the seasonal laborer, and you
Consider the large retail fulfillment houses that, as we near the holiday shopping season, look to hire seasonal employees. These companies need to think about the usability of these systems that temporary help will have to use. Is it easy for them to become efficient quickly? Maybe their job is to pick and pack items on a warehouse floor. If it takes them an extra two minutes per person per order, how does that inefficiency over thousands of orders add to your labor costs?
How does that impact your business in online reviews and media coverage, all during the holiday shopping season? This is why usability and performance are directly related. System elements include a complex back-end system; a (relatively simple) handheld device employees use to fulfill orders; networks; CPUs; memory; batteries; operating systems… everything is in play here. And you’re relying on a seasonal resource to get a critical job done during your peak business months? Come on! You need to make the system as well as the factory floor operations as easy as possible.
Any inefficiency is ultimately about your own business’s inefficiency, and that is almost always a result of poor performance.
How performance engineering became the new QA
Over the years, with the growing use of performance engineering techniques, the role of a traditional QA professional has been diminished through increased automation, and the shift of certain testing aspects to the development team.
As a result, QA professionals have been looking for ways to use their skills in the context of performance engineering and DevOps throughout the life cycle. For a few years now, we’ve been seeing a trend among some organizations that are excelling at results from DevOps, which are enabling traditional QA professionals and testers, as well as people who’ve worked in the unique niche of performance testing, to retool their skills to inject performance engineering practices earlier and throughout the delivery pipeline.
For QA folks who are nowadays thinking about their careers and picking up additional skills, including how to code, performance engineering really brings two worlds together for them, and they are frequently positioned to excel. If quality is already in your wheelhouse, then the next step might be learning the language of your stakeholders, helping them see what your team is doing to infuse performance into the products and services they are responsible for.
This requires thinking about the culture of your organization, thinking about the business owners and the IT leaders, about developers and operations teams. By being more mindful of all these different roles, you can begin to help all these individuals build performance into everything they’re doing. Ultimately, all of that effort translates to a focus on the end user.
This is one way to understand how the responsibility of performance can be shifted to an integrated team, in many cases with QA professionals transforming their more narrowly defined roles as testers into performance engineering and end-user advocates. Embracing that transformation offers a huge value to your team, your business, and to your user base.
Doing so will enable you to win, based on new performance engineering capabilities your organization is embracing.
To learn more about how to do this, check out my new book, co-authored with Shane Evans, “Effective Performance Engineering.”