Q - Can we use Automated Execution to find defects? A - Yes

TLDR; Can Test Automation mind defects? Yes, both regressions and new defects by adding more variation into the paths or data used.

Statements like “Test automation should-not/can-not/will-not find defects” do not completely match my experience. In my experience, Automated Execution can be used to help find defects and frequently does. Not just ‘regressions’ but ’new defects’ as well.

“Should not”, which I’m also going to consider as covering “Must not”, I view as a value statement or belief about the world.

Automated execution “should not” find defects because we “should” have found them during unit testing or exploratory testing, and they “should not” slip through.

In reality, sometimes they do, so in this post, I’m considering the capability and possibility statements “can not” and “will not”.

Last week, I automated a system, and during the automated execution process, I noticed a defect. A race condition defect was triggered when the Automated Execution pushed the system faster than a user would and created a situation that the GUI was not designed to prevent.

I wasn’t automating the system to support my testing. I automated it to help me with a business process.

I had to use the skills I have learned over my years in testing to put guards, synchronisation, and checks in my automated execution code to work around the defect to automate the system without triggering the issue.

We can use Automated Execution to help find new defects, and we can write our automated execution code in ways that reduce or increase the likelihood of finding defects.

Does Automated Execution Actually Find Defects

I have found defects using Automated Execution.

Exceptions, logging, observation, and assertion failures during automated execution runs have alerted me to defects.

The Automated Execution has not ‘found’ the defect.

At no point has the Automated Execution turned to me and said, “Hey Alan, look at this, a defect. And here is an explanation of what I’ve found.”

This means that statements of the form “Test automation should-not/can-not/will-not find defects” are semantically valid.

But the underlying impression they give is that we can’t use automated execution processes to find defects, which does not match my experience.

I find the defect.
Automated Execution helps me find the defect.
Depending on how I write the automated execution code.

Reducing Variety of Execution Output

I use variety as one of the critical levers for finding or avoiding unexpected defects during Automated Execution.

I can code to minimise the opportunity to do things differently or encounter issues in the system. We often do this when we are working with ‘flaky’ execution, a valid approach, but recognise that it can reduce the likelihood of triggering certain error classes.

I code in such a way that misbehavior of the system does not impact the execution. I automate so that it uses as ‘happy’ and ‘perfect’ a path as possible. The Automated Execution becomes the nicest user of the system possible, patiently waiting for everything to render and become available before interacting.

This way, the only output from the execution is:

I did it.
I could not complete my task because the path terminated early.

Minimising the variety in outputs reported by the execution process has the side-effect of reducing the possibility of finding defects.

I can reduce the possibility even further by making the execution paths shorter:

put the application into a specific state
deep-link into the application
exercise a very short path
synchronise effectively to control state
use the same data on each execution run
only use the data for this specific path instance
assert on a small set of results

Then I am reducing the opportunity to find defects outside of this controlled environment.

If execution fails, then it most likely relates to the highly constrained scenario, particularly when only one execution path fails during the @Test execution.

We do this to make our Automated Execution highly reliable and help us identify specific acceptance criteria violations represented by the assertions.

This approach to automating is ideal for adding into a continuous integration process.

It is not ideal for finding new defects.

Obscuring Error Results

An automated retry strategy can reduce the opportunity of Automated Execution finding defects.

when an @Test method fails, put it in a queue and retry it
when an action fails to put the system in the state you want, repeat it and try again

Both of these are great ways to work around intermittent execution issues. Some of which may well be defects.

The retry mechanism at an @Test level can obscure race condition issues where the application is handling parallel activities. This might be a perfectly valid approach in the context of a strategy where a set of @Test methods have been created to hunt down race conditions, and we rely on the race condition execution to alert us to those errors. And we want the individual path execution to assert on conditions regardless of race conditions or interference from parallel execution paths.

We choose to deliberately ignore error classes in some automated execution because we have other @Test code covering it.

The retry mechanism at an action level might obscure usability problems and prevent us from identifying interaction defects that would impact a user. The kind of problems that a user might have to workaround in the real world with: “just wait a little before pressing the button”, “just try clicking it again”, “just refresh the browser and try again”, “just leave the site and buy from a different store, this one is rubbish”.

Suppose we don’t have an additional automated execution to target these errors. In that case, we might be unknowingly obscuring defects and might believe that the Automated Execution can-not/will-not find these types of problems.

Retries are a valid strategy to use if we are aware of the risk that it covers up some defect classes, and we either accept or mitigate that risk.

Some Ways of Increasing the Opportunity for Automated Execution to Identify Defects

Automated Execution identifies defects when, during the execution of a path, it fails and reports an error.

Most Automated Execution is path based i.e.

‘@Test’ based
where each ‘@Test’ has a name and a specific purpose.
each ‘@Test’ has assertions to check the expected conditions have been met
a path is a set of actions to complete

Path based Automated Execution will identify issues in:

path traversal, i.e., moving to the following action in the path. If we can not complete an action or start an action, an exception will be thrown, and the path ended.
assertions not being met

One way to increase the chance of finding an error is to examine the points of variability.

What is fixed?

the path itself will not vary
the data required to allow the path to complete will not vary e.g. user.accountStatus == InCredit, we would not vary the accountStatus otherwise the path could not function

What could we vary?

Vary how the action is performed.

If it is a button.click(), then perhaps we could also set focus and press enter; perhaps we could find the form and submit the form.

Vary the other data.

Any data that is not key to the path could be treated as being part of an equivalence class and varied. e.g. the amount that the user has in credit, the user’s name, anything.

When varying data we will identify additional conditions that have to be met, e.g. if we are adding two numbers, we need to be careful to avoid creating a final number that overflows the size of an Integer or triggers an onscreen error message.

“Overflow” is an error condition. Error messages seen would follow from having invalidated an invariant i.e. a number is too big. Triggering an error is not part of the path, so we accidentally added variety in the wrong place.

And… if the overflow is a risk, or the error message needs to be checked, and we create a workaround in our path… we should also create an individual @Test to ensure that this condition is tested individually.

We don’t want to code to make sure it never happens without having an @Test to ensure that if it can happen, we know about it and that if it should happen (in the case of an error message), it does and we trigger it deliberately.

Adding more variety means we may also have to amend the assertions to take account of the variety. e.g. asserting that the random amount we generated for the in credit amount is displayed onscreen when the account is viewed, or is returned in the API call for the account details.

When varying data, ensure the data is logged to allow the @Test to be repeated in the event of an error.

What could we ease?

One way I make my execution robust is by synchronising e.g.

wait for page to load
wait for rendering to complete
wait for component I want to interact with to exist
wait for component to be in the state I need to interact
interact
wait for interaction to complete

Above is an extreme example.

I ‘probably’ don’t need all of those.

For some systems, they will have a Shared State Readiness. I probably only need to wait for rendering to be complete, then all the other interactions should be possible to complete.

I can increase the opportunity for defects by removing the unnecessary Action State Readiness synchronisation.

If my execution starts throwing false positives because I misidentified the state e.g. one of the buttons I want to click is not part of the Shared State and is enabled only when an asynchronous call completes, then I add the Action State Readiness synchronisation back into the path.

What can we add?

Assertions for ‘@Test’ methods tend to be very specific to the path.

We could add additional Assertions related to the ’things we can observe’ during the path.

e.g. a specific path might be crediting money into a user’s account. The assertions would be: check total after crediting. We are unlikely to check that the user’s address and name have not changed.

We could.

By using Domain Objects throughout our tests e.g. a ‘user’, we know what all the attributes for that user are and could check them on each screen they are presented.

This is extreme. But it is a way of ‘finding’ things that we did not expect by continually checking the consistency of the Domain Object during the path against the system output.

These assertions would be coded into an abstraction that ‘knows’ for each page or component, which fields are available to be checked and can compare them against our domain object. This would alert us to values that are different and values that are missing.

Compare with baseline

We might decide that randomising data is too hard, and creating domain objects to track state is too much work.

I have not found it difficult in practice, but different applications require different levels of data, and you might be working with a data-intensive application that makes this harder.

Another approach is to keep the data fixed, assert known conditions related to the path, and then use a ‘compare to baseline’ approach during the path execution.

For example, create snapshots of data, or the parts of the screen that render data, or the JSON objects in responses. Save those out. If the @Test does not reveal an exception and the snapshots are checked that they accurately reflect the expected results, then mark those as a baseline, and next time the @Test is run, compare your results to the saved baseline.

This comparison approach is used in tools such as PACT, AppliTools, Approval Tests. Comparisons are often built into libraries e.g. a GSON ’equals’ on a JsonObject does not care about field order, which was always a pain testing messages in the past.

I don’t tend to use a comparison approach since I take more comfort from a random data approach, but I can see value in this extending the scope of your assertions to find unexpected changes with minimal upfront coding effort,

Final Points

Automated Execution does not ‘find’ defects. It reports on issues it encountered when trying to do something.

We can use it to help us find more defects by introducing more variation and extending the scope of its observation.

If we add more variation and introduce false positives, we have not identified the variation points in our system or have not added enough control in our variety generation. We remove that variation to gain more trust that issues are worth investigating.

We can obscure Automated Execution’s ability to help us find more defects by treating some of its reports as ignorable and retrying the execution until it runs without error.

We can also make the execution run robust in the face of variability to workaround system inconsistencies, this can also prevent us noticing issues during automated execution.

We decide if we want to widen or tighten our error detection focus during automated execution runs.

Automated Execution can be used to help find ’new’ defects.