How to assess coverage of automation?

TLDR; Coverage requires some sort of model. We can organise code to support review against a mental model, and some models are executable. Other models we compare against the output of execution.

I was asked a series of questions: How can we document what an automated test does and covers without adding a lot of overhead? How do we know what is not covered by automation?

Supporting Video

Watch on YouTube

Coverage

All coverage assessments involve comparing some sort of model, to some sort of implementation.

When we perform code reviews to look at code and figure out what is covered we are usually working from some sort of mental model i.e. what we expect to see covered. Then when we review the code we map it on to our model.

We often have to work in small chunks when doing this because it is hard to hold an entire model of the system in our heads.

Therefore we work at multiple levels.

We review the detail of an @Test method against a model to review coverage of the flow implemented and the assertions used.

We review the output in terms of package structure and naming to review coverage of a system wide model.

When working on Agile projects that split their delivery into Stories we often chunk our reviews into ‘story’ sized bites, and use the version control commits to limit the scope of our review against stories.

How do you document what an automated test does?

the name of the test can help
- write a sentence in camelCase to describe the intent of the test
- if you can’t capture the intent of the test in the name then you may be trying to test too much
create abstraction layers to make the test code readable quickly during a detailed review
have all assertions in the test code and not hidden in abstraction layers so that condition coverage is obvious
make important data visible, hide unimportant data in abstraction layers e.g. test data generation classes

How do you organise code to help review coverage?

the structure of the tests is important
- the name of the test class
- the package structure provides a hierarchy for organisation

How do we know what is not covered?

This is a ‘mapping’ process.

We have to have some model of ‘cover this’ and then a way of mapping the test to the ‘covered’.

Custom

This can be a more custom process.

To avoid too much maintenance people usually pick high level models to map to e.g. stories, rather than acceptance conditions.

It can be possible to create mappings as a result of executing the test code.

e.g.

if you are testing an API then you could capture all the traffic and compare it to a model of the API - have all end points been called? have all params been used? have all verbs been used? etc.
- I have used code based proxies and processed HAR files to achieve this in the past with custom code.
if you are testing a GUI then you could capture all the traffic and compare it to a site map of the application, have you visited all pages etc.

Tooling

Some teams use Cucumber and similar DSL based tools to do this.

Uncovered areas are those that have not been implemented yet so Cucumber highlights them but the tests still ‘run’.

This is using the Gherkin as a model, which becomes executable when interpreted by Cucumber and the mapping achieved when the code is written and successfully executed.

NOTE: people will may point out that this is a misuse of the tool if you call this BDD or use BDD anywhere in the description of this. But, if you want a simple to use DSL based modelling tool which can highlight coverage at the DSL level then this could be useful.

There is a risk that you try and model everything in Cucumber, and this will likely cause your project to slow down.

Cucumber is very useful for modelling high level processes as a DSL which can be data driven with a table of important data. If you try to put all your data in Cucumber table it becomes unmaintainable. If you only expose the important data then it is easy to see what data you have covered.

Cucumber can be useful if there are aspects of Coverage that the people reviewing them need a high level DSL to allow them to understand it. i.e. if they won’t read code.

If they “can’t” read code then creating abstractions that make the code easy to understand, and then training them, can help avoid the need for a DSL based tool.

JUnit 5

I’ve just started experimenting with JUnit 5 Data Driven tests and found those to be very visible when executed.

https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests

e.g.

static IntStream allPulperVersions() {
    return IntStream.rangeClosed(1, ThePulperApp.MAXVERSION);
}

@DisplayName("Check model for nav matches number of items on nav")
@ParameterizedTest(name = "using version {0}")
@MethodSource("allPulperVersions")
public void checkMenuItemsMatchModel(int version) {

    driver.get(url + "?v=" + version);

    PulperNavMenu menu = new PulperNavMenu().getForVersion(version);

    Assertions.assertEquals(menu.countMenuItems(),
                menu.countAdminMenuItems() + menu.configuredNonAdminVersionMenuItems());

    Assertions.assertEquals(menu.configuredNonAdminVersionMenuItems()+
            menu.countAdminMenuItems(),
            driver.findElements(
                    By.cssSelector("#primary_nav_wrap ul li")).size(),
            "Unexpected number of menu items in version " + version
    );
}

This could be made more readable through abstractions to support a code review of assertions i.e. hiding the driver.findElements and locators.

But the output to support a ‘system review’ is quite readable.

I use the @DisplayName to make the test more readable than the method name.

I use @ParameterizedTest to run the test with different parameters and the name to make the instantiation of the test execution more visible

- NavigationViaMenuTest
    - Check model for nav matches number of items on nav
        - using version 1
        - using version 2
        - using version 3
        - using version 4
        - using version 5
        - using version 6
        - using version 7
        - using version 8
        - using version 9
        - using version 10
        - using version 11

What to do?

I recommend:

keep the mapping (which is not executable) to a minimum, because mapping needs to be maintained otherwise it becomes misleading. e.g. annotations about which stories are covered by this @Test
make the coverage self documenting through effective package structure and naming
use parameterised tests to make ‘data’ visible and support review through the execution report
use abstraction layers to keep code readable, and keep the coverage review at the code level as much as possible
use model based comparison on output generated organically by the test execution for custom ‘gap’ spotting (e.g. analysing proxy output)
use a DSL based tool when the review process requires a DSL and the coverage can be documented by data as much as possible to harness the Data Driven capabilities of the DSL tooling

The actual solutions you adopt will depend on the language you are coding in, and the technology that the application is implemented using.

Many of my early blog posts were related to modelling. It is a key part of how we test.