What does it take to find bugs?

TLDR; Testing uses models to target the system, and our information is constrained by the models we use and build. We can introduce variation to increase the possibility of finding information related to bugs. We have to take care not to develop false confidence.

Some notes on testing inspired by quotes from Ackoff, Ashby, Dijkstra and Weinberg.

Bugs

“Dijkstra: Testing shows the presence, not the absence of bugs”

NATO Software Engineering Conference 1969

How can statements like the above help us?

Well, we might think “What can we do to show the presence of bugs?”

And… What does “bugs” mean?

Bugs might mean ’this doesn’t always work the way I want it to'.

Problem: Does this work at all?

What might ‘work’ mean?

We have to ‘define’ what “work” means - we could do that through a model of interaction.

We can model the software interaction with an external integration component e.g. a user or another system.

I could model or ‘define’ “work” to mean “given an objective, if I can use the software to achieve the objective then the software works”.

I can identify an objective and ‘use the software’ in the way that I think it will be used to achieve that objective. If I observe that the objective has been achieved then it ‘worked’. But… I can’t say it always ‘works’ because:

specific version of software
specific user interaction
specific environment configuration
specific data

If any of those variables change, can I be sure that it continues to work?

What if my observation was flawed?
What if I didn’t observe deep enough into the system to spot a problem?

I observed a specific instance of ‘worked’ but I can’t take that to mean that it always works.

“…it was right from the start quite obvious that program testing was quite ineffective as a means for raising the confidence level.”

“a discipline of programming” by edsger w. dijkstra

Models

When we test we compare a model of the system with the system.

When I say ‘with the system’ I mean “with our observation of our perception of a reality of the system”.

We can use an ‘Observation’ focus as a key lever to apply to our Testing process to help us improve.

Do we understand the technology such that we can observe at all levels of the technology?
Do we understand how the application uses the technology to spot an issue if it manifests?
Do we have tooling that supports us observing at a deeper technology level? e.g. for Web Testing - XHR requests, HTTP traffic, cookie usage, local storage usage, memory usage etc.

Testing is a modelling process, not a ‘solving’ or ‘proving’ process.

I modelled the interaction at a certain point of time.

My testing resulted in an amended model of the system.

For the paths that I didn’t observe any issues, then I may have reinforced in my model that the perceived probability of failure in this part of the application is reduced. i.e. I didn’t see any problem so I now believe more strongly that it ‘works’.

I might equate that to an increase in confidence.

If I do equate that to confidence then I may start to think of my model of the system as the reality of the system and start to think of the software as ‘fault free’.

As a tester I have to maintain skepticism towards the system.

The easiest way for me to do that is to recognise that I am building a model of the system and interacting with a model of the system rather than working with “The Reality of The System”

“extreme skepticism is the only proper attitude”

Computer Programming Fundamentals by Herbert Leeds, Gerald Weinberg, Chapter 11 - Program Testing

What does it take to show the presence of bugs?

“…program testing can be quite effective for showing the presence of bugs, but is hopelessly inadequate for showing their absence."

pg 20 “a discipline of programming” by edsger w. dijkstra

Find an instance of data, interaction approach, path through the system. And vary it.

If we keep using the same data, interaction approach and path through the system then we aren’t maximising our opportunity to “show the presence of bugs”. We instead run the risk of falsely increasing ‘confidence’.

“Theories can be tested by using data from the past, present and future - and progressively improved as a result. However, the data used to test a theory should not be the same as those that suggested it.”

pg 73 A Concept of Corporate Planning, Russel L. Ackoff, 1969

With variation we might hit a combination of variables that shows the presence of bugs

Why?

“only variety can destroy variety”

An Introduction to Cybernetics, W. Ross Ashby (11/7), 1964

We use variation to find a combination of input such that the system does not exhibit enough variety in its processing to provide an output that falls within the tolerance parameters of our oracle.

But we have issues:

With more variation, and no observation of issues, we might increase ‘confidence’.
We might introduce so much variation that our oracle which we compare our observation to, does not cover the variation scope i.e. we might not know what is supposed to happen
We might constrain our variation to the limits of the oracle and therefore artificially reduce our probability of finding an issue, and yet at the same time increase our ‘confidence’

Enough, Too much

One of the key issues for Testing is how much variety is enough?

This is one of the hardest parts of testing. How do we know when to stop?

Some heuristics around this are:

When we feel we can’t add any more value.
When we run out of time.
When we stop learning things i.e. all our observations reinforce our model rather than expand it or cause us to rethink it.
When something else has been assigned a higher priority.
When we covered our agreed scope.

It is easier to identify when should we not stop.

When all we used were the data in the examples.
If all we did was demonstrate an oracle match once.
If our observation was limited to a surface observation rather than a deep technology observation i.e. “what if if only looks like it worked?”
When we haven’t even covered all the agreed acceptance criteria and examples.
When we didn’t go tackle any of the ambiguity in the agreed acceptance criteria.

Automating Confidence

Often we use Automation as a confidence tool.

“If we run the automated execution and it still works then at least we haven’t broken anything.”

If we understand that this means to the limits of:

our input data
interaction methods
observation
comparison to oracle

We could add more variation into the Automation.

‘input data’ seems the simplest variation injection point. Keep the path and interaction method constant, and vary the data. We often have to constrain the variation in input data to the oracle. Which generally means randomly pulling values from a set of equivalence classes or exhaustively using an input set.

Identifying where we constrain the input data, might lead to identifying new paths or gaps in our oracle coverage.

We run the the risk that we missed things out of the sets, or we defined our equivalence classes incorrectly.

This risk exists because we are modelling rather than solving.

Our testing is limited to our model. And our evaluation is limited to the model.

The risk is that this takes too long. Therefore we could run the variation execution in parallel to our other automated execution.

We face the risk that we identify a massive scope of variation. Then we need to convince ourselves that we do not need the variation or can identify where the variation will benefit us. By reviewing our existing automated execution coverage against the variation we are going to introduce. But… we have to take care that that future changes do not impact or invalidate our review.

When we have no variation in our automated execution. Then we have created a fixed model of the application, and should remain vigilant and skeptical of its coverage. This skepticism can lead us to review the model more frequently (the automation) and check that it covers enough (of our model of the system). Treat it as a ‘safety net with holes’ rather than ‘confidence booster’.