I’m often asked if and when fully autonomous testing could become a reality. That’s a topic I love to discuss. But, before delving into that, let’s take a closer look at the two words that make up that term.
Autonomous, meaning “without human intervention,” is pretty simple. Testing is more difficult because the investigative, inquisitive nature of testing does not lend itself to automation. What I am about to describe is best categorized as “autonomous checking.” With that in mind, let’s continue.
With advanced tooling like vision-based test automation and other intelligent automation engines, the problems of automated checking have shifted from “How do I reliably automate this interface” to higher-level problems. Humans are still overwhelmingly responsible for creating the automated checks: describing what inputs to fill in, what buttons to click, etc. This is the first horizon.
The shift to autonomy is best defined as “Describing becomes Deciding.” With approaches such as smart impact analysis, this is already the case. You don’t need to describe which tests to run; you just need to decide if the tool’s recommendations suit your needs. This is great in closed systems such as SAP, Salesforce, and ServiceNow (where these offerings shine). With the help of AI, this trend will expand well beyond this—into the realm of bespoke/custom applications.
So great! The future will just be getting a printout of possible activities from the machine and giving it the green light! Well… not so fast. You see, these closed systems not only have defined processes; they also have defined outcomes (the oracle). Not so with bespoke applications. While determining the actions to take is possible generically (by examining people taking these actions), it’s not always possible to extract the “Why” component. When a user executes a transaction, their eyes flick to the top of the screen to double check that the “Amount” value is correct. This validation is not captured, and so the automated process misses the point of the check (which was to determine not only that the transaction was processed, but that it was processed correctly).
This is not a bleak outlook, however. While “Fully Autonomous” checking may still be quite a way off, the trend of “Describing becomes Deciding” will remove a ton of busywork that bogs down quality engineers today. Parsing through the outputted scenarios, injecting validations, and deciding which to run is a much more pleasant job than worrying about why the Login button doesn’t have a stable ID field.
With that said, there are a few things to watch out for:
- Beware of test case spam
If you embark on an autonomous testing endeavor and your team comes back with a tool or process that “generates thousands of tests” beware. You still need to parse through these tests, inject validations, and debug them if they “fail.” The motto of “fewer, targeted tests” has been a good guide for the past 20 years, and it remains so now.
- Investigate the how
When you are told that your tests can be automatically generated, dig a bit into how this happens. AI is not magic. If something appears to be magical, it is most likely a fabrication. Your team should be able to tell you that the process examines usage patterns, parses existing (accurate) definitions, or has some other source of how it defines the test. “Shaking up the app and generating tests from it” is still firmly in the world of magical thinking.
- Ask about maintenance
Having a thousand tests is like having a thousand smoke detectors. If you own an entire high-rise apartment building, that’s probably justified. If you own a house, then you will spend two hours switching them all off when you burn the toast. Tests that fail must be investigated, updated or discarded. Inquire about the nature of this method to ascertain whether autonomy will actually save you time in the long run.
Despite this, the future of autonomous checking appears to be very bright. The goal as an industry should be to devise a method for generating the best—and fewest—tests necessary to achieve the desired level of assurance.