A Webinar about resolving the root cause of intermittent automated execution.

Replay

You can watch the live webinar replay on the Eurostar Huddle site:

The official recording does not appear to have worked properly - no slides, robot voice, etc.

Fortunately I recorded it.

I have uploaded the recording to my Evil Tester Talks Archive, which also includes the full 1.5 hour version with more detail and (at the time of writing) 7 other talks.

And I have released it on YouTube.

Watch the Full 1.5 Hour Version in Evil Tester Talks

Slides

Your Automated Execution Does Not Have to be Flaky

Eurostar Webinars Feb 2018

Alan Richardson


Have you experienced flaky test automation? In this webinar, Alan Richardson plans to convince you that you haven’t. Instead you have experienced the result of not resolving the causes of Intermittent execution. Alan will explore some common causes and solutions of intermittent behaviour. Why? So you never say the phrase, “flaky tests”, ever again.


Flaky Test Automation is Normal

Have you experienced flaky test automation?

Particularly GUI Automation

Yeah. Of course. Its normal.

“Our Tests are Flaky”, “Some tests fail Randomly”


Flaky Test Automation is Normal

Because we have normalized Flaky Test Automation

  • “Our Tests are Flaky”
  • “Some tests fail Randomly”

“There is nothing so absurd that it has not been said by some philosopher.”

Cicero, On Divination, Book II chapter LVIII, section 119 (44 BC)

“Truth happens to an idea. It becomes true, is made true by events.”

William James, Lecture VI, Pragmatism’s Conception of Truth, Pragmatism: A New Name for Some Old Ways of Thinking (1907)

Its a nonsense IDea but people say. Constantly. Experts. People who know what they are doing. Its normal.

Flaky tests ‘become’ true because we write them. We don’t learn the strategies to fix them so flaky is how we describe them.


How to Normalize Flaky Test Automation

We all know these Test Automation Truths

  • “GUI Automation is Flaky”
  • “We have to live with ‘flakiness’”
  • “We shouldn’t automate at the GUI”
  • “We can only remove flakiness under the GUI”

“Flaky Tests” blames the tests. Not good enough.

Have you accepted flakiness as the normal state of affairs?

Not flaky execution, not flaky environment, not flaky data. Flaky Tests.

Flaky tests is too high a level. Too high an abstraction to deal with the problem.


It isn’t even the “Tests”

  • We don’t automate Tests

  • We automate the execution of steps in a workflow or process

  • We automate the execution of a System

  • We add condition assertions during that execution

  • We call that a “Test”

We don’t have flaky Tests - we have automated execution that fails. Sometimes on the steps, sometimes on the assertions.


‘Flakiness’ does not reside at a ’level’

I have seen ‘flakiness’

  • in Unit Tests
  • in API Tests
  • in Integration Tests
  • in GUI Tests

It is too easy to say ‘flaky’

  • and then blame ‘GUI execution’
  • and then blame ’the tool’

Living with flakiness is a choice.

Choose a different approach.

It is too easy to say ‘flaky’ and then blame ‘GUI execution’. This is too easy an excuse.

It is too easy to say we have to live with ‘flakiness’ because we don’t have an API.


I am not the only person saying this.

  • see references at the end and name drops throughout
  • try to cover something different in this talk

“We designed that flakiness. We are allowing that to happen. We engineered it to be that way. And its our fault that that exists.”

Richard Bradshaw, “Your Tests aren’t Flaky, You Are!” Selenium Conference 2017

https://www.youtube.com/watch?v=XnkWkrbzMh0


Take it more seriously. Describe it differently.

Intermittent

  • Occurring at irregular intervals; not continuous or steady.

https://en.oxforddictionaries.com/definition/intermittent

something is happening, but not all the time

that makes me think there is a cause that we haven’t identified

there is some factor in the system that I don’t know about

“Flaky” doesn’t do that


Take it more seriously. Describe it differently.

Nondeterministic Algorithm

“a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs”

That doesn’t sound like how we describe ‘automation’.

That’s not what we want.


Flaky is not serious enough.

We do not want to use nondeterministic algorithms for continuous assertions that we are relying on

We are trying to rely on this. Continuous Integration. Continuous Deployment. We need this to run reliably. We need to trust that pass means everything ran as expected.

remove the word ‘flakiness’ from your vocabulary


Your Test Automation is not Flaky

Your automated execution fails intermittently

You have experienced intermittent failures. You have chosen to live with them, you haven’t fixed the causes of intermittent failure.


Don’t Blame Tests. Look For Root Causes.

watch Alister Scott’s GTAC 2015 talk


I have removed ‘flakiness’

  • from Unit Tests
  • from API Tests
  • from Integration Tests
  • from GUI Tests

Automated execution does not have to fail intermittently.


How to remove Intermittent Failure from your Automated Execution

  1. Care
  2. Investigate
  3. Do something about it

How to remove Intermittent Failure from your Automated Execution

  1. Decide Intermittent Failure is unacceptable
  2. Investigate the cause of Intermittent Failure
  3. Mitigate
    • Remove the cause
      • actually fix it
    • Implement a retry strategy
      • might obscure bugs
    • Accept Intermittent Results
      • might provide hints at solutions

Obvious thing to do is remove the cause. But this can be hard.

You might try a retry strategy. This can hide problems. But if you your system exhibits intermittent failure to users in live then you might have to do this. Because that is what users do in live. i.e. the problem might not be your execution, it might be your system.

Alister Scott talks about retry strategies obscuring bugs.

Accept intermittent results. You might move everything that is failing intermittently into a test run of its own. This creates a ‘good’ execution pack, and a ‘bad’ intermittent pack. You isolate the failures until you can work on them. If you find that you keep moving things into the intermittent pack, then you need to rethink the mitigation strategy because eventually the pack will get larger and you haven’t done anything about the root causes.


Take it seriously

We write Automate Assertion checking because we care that those assertions are true for each build of the system.

Determinism is important.


High Level Grouping of Common Causes of Intermittency

  • Synchronisation - lack of or poor
  • Parallel Execution - interference
  • Long Running Tests - too long, too risky
  • Automatability - hard to automate system
  • Tools - inappropriate or out of date
  • State Preconditions - not controlled
  • Assertions - wrong or incorrect assumptions
  • Data - not controlled

see also Richard Bradshaw and Mark Winteringham “SACRED” Mnemonic.

Splat Sad


Will cover Top 3 for each Grouping


Top 3 Common Causes - Synchronisation

  • None

  • Time Based Synchronisation

  • Incorrect App State Synchronisation

  • None

Like a Rhino in a Chocolate Shop. Charging through everything.

Remote Cloud Execution might improve things because of latency.

  • Time Based Synchronisation

implicit waits - everything waits, which can slow down error reporting

worse, thread.sleep()

Wait for 10 seconds.

But all sync is time based… timeouts. But these can be domain specific and can assert SLAs, adjusted for remote latency or environment conditions.

Ideally we want state and domain synchronisation.

  • Incorrect App State Synchronisation

e.g. wait for ‘ajax’ image to go away, rather than Ajax call to complete processing

Naive app state identification e.g. “Page footer present means page is ready to work with”


Common Solutions - Synchronisation

  • Synchronise on States
    • do not rely on framework Synchronisation
  • Multiple Intermediate States
    • Consider Latency
  • Synchronise in Abstractions not the @Test methods
    • unless @Test specific

Top 3 Common Causes - Parallel Execution

  • Framework not thread safe
  • Tests Interfere
  • Shared Test Environment

People often jump into Parallel Execution too soon.

  • Framework not thread safe

People build Frameworks, which control how they work. Often heavy on inheritence, Single instantiated control objects, shared global state and variables. All stuff that makes parallelism hard.

Test Frameworks might be thread safe, but our abstractions often are not.

May need to fork rather than thread e.g. different suites

  • Tests Interfere

System state preconditions changed by other tests running in parallel

Shared data can cause interference. Browsers. System processes. System responding to one request.

Might even be bugs in the system.

Shared browser usage.

  • Shared Test Environment

Not just shared for automated execution - data controls can help with this.

But shared across different teams, Exploratory testing going on in the same environment as automated execution, CI, performance testing


Common Solutions - Parallel Execution

  • Independent environments
  • Independent Data
  • Separate Suites rather than threaded execution
  • Create Threadsafe, reusable code
    • Create reusable library abstractions rather than Frameworks
    • Avoid ‘static’ singleton objects

Top 3 Common Causes - Long Running Tests

  • Sequential rather than Model Based

  • not delineating between: preconditions, process, assertion

  • components tests in flow rather than isolation

  • Sequential rather than Model Based

Model based has a lot of synchronisation built in. Particularly if it is state transition based because State transitions have guard conditions or entry conditions. Synch should be baked into model based execution pretty well.

Model based usually has some variety handling.

Sequential doesn’t have a lot of variety. It can’t handle pop-ups, A/B tests.

Short flows are usually easier to make reliable.

  • preconditions, process, assertion

We want to:

  • setup the preconditions we need
  • carry out a process
  • assert on the conditions we want to check

Make ‘precondition’ setup part of the test.

Too long a ‘process’ i.e. the stuff you do in the execution might leave you at risk of overlap, or too long to setup.

Too many assertions can slow things down, overlap with other processes, timing of assertions might be hard. Sync on assertion checking.

  • components tests in flow rather than isolation

Very often we think we need to assert on all conditions in a full executable flow - login through GUI, navigate here, setup data in GUI, do something. Check in GUI.

It is possible to assert on GUI rendering and some functionality in isolated playgrounds. Particularly if we design the system to operate as components. e.g. good for React and other JavaScript MVC fameworks


Common Solutions - Long Running Tests

  • Understand that more actions == more risk
  • Synchronise prior to each step
  • Consider Model Based Testing
  • Create component test and automated execution playgrounds
  • Minimum assertions

Top 3 Common Causes - Automatability, Automatizability

Not Testability:

  • Application has non-deterministic behaviour

  • Hard to Synchronise

  • Application fails non-deterministically in live

  • not Testability

Some people refer to this as ’testability’, but it isn’t. I can often easily test an app that I find hard to automate.

  • Application has nondeterministic behaviour

JavaScript Callbacks trigger ‘random’ popups

Async and out of order processing

A/B Testing in app

External ‘stuff’ not related to core functionality e.g. ads, libraries, rendering

Alister Scott covers this in his GTAC 2015

  • Hard to Synchronise

JavaScript frameworks. Many DOM updates

e.g. Queued messages (guaranteed), buffered processes, promises.

  • Application fails non-deterministically in live

Saw this at a client recently - every team was reporting flaky tests - GUI, API, Components, Backend, Low level stuff. Every team.

Intermittent Application Architecture - the application is intermittent in live, by design.

Not even ‘bugs’ just known to be a ‘flaky’ app at times, and user just presses button again.

you can’t expect to have deterministic automated execution if your underlying application is non-deterministic.


Common Solutions - Automatability, Automatizability

  • Build apps that can be automated
  • Non-Deterministic apps need step retry strategies rather than test retry strategies

Top 3 Common Causes - Tools

  • Out of Date

  • Inappropriate

  • Local Tool Infrastructure

  • Out of Date

out of date language bindings

out of date drivers and APIs

  • Inappropriate

using the wrong tool for the wrong job, e.g. use WebDriver to automate a GUI that sends HTTP REST API methods, instead of automating the HTTP REST API

  • Local Tool Infrastructure

environment performing automated updates

e.g. browsers automatically update rather than controlled to match the driver versions

e.g. maintaining a local grid or tool environment

environment not maintained

environment not up all the time

environment out of date


Common Solutions - Tools

  • Use the right tool for the job
  • Keep your tooling environment controlled and up to date
  • Change your approach to take latency into account
    • process on server return results
    • return source, process on execution client

Top 3 Common Causes - State Preconditions

  • Not Checking State Preconditions at start of test

  • Not controlling state preconditions prior to test

  • Precondition setup using same tool

  • Not Checking State Preconditions at start of test

We want to fail fast and at the appropriate point.

We don’t want to ‘do something’ that may or may not pass. We want to make sure it can pass. And report if it can’t.

  • Not controlling state preconditions prior to test

To the best of your ability make it so that your execution can pass. Setup what you need. Lock what you need. Book what you need.

  • Precondition setup using same tool

e.g. GUI automated execution has state setup using GUI automated execution

Mark Winteringham Selenium Conf talk


Common Solutions - State Preconditions

  • control data
  • precondition state setup - whatever works
    • http, db, api - ‘hack it in’
  • avoid dependencies between execution unless a long running test

Use abstraction layers to create the dependencies rather than rely on other tests.


Top 3 Common Causes - Assumptions Encoded in Assertions

  • Assert on an Ordered Set

  • Assert on Uncontrolled Data

  • Assertion Tolerences

  • Assert on an Ordered Set

multiple data items, asserted in non-guaranteed order e.g. {1,2,3}

sometimes {3,2,1}, sometimes {2, 1, 3} but we always assert on an ordered set {1,2,3}

instead assert on ‘includes’ and ’length’

{S}-union-{1}=={1}, {S}-union-{2}=={2}, {S}-union-{3}=={3}, |S| == 3

Need to make sure assertions don’t allow duplicates or extras to slip through.

  • Assert on Uncontrolled Data

assert on ‘stuff’ not important to test

This might also arise from duplicated assertions.

  • Assertion Tolerences

assertion tolerances not expansive e.g. amended time not within 3 seconds (operation took longer than expected), amended time == created time (because scale used is milliseconds instead of nanoseconds i.e. not enough to make distinction)


Common Solutions - Assumptions Encoded in Assertions

  • Logging so you can interrogate failure afterwards
  • Ability to re-run tests with same data and setup

Top 3 Common Causes - Data

  • Missing Data
  • Externally controlled data
  • Uncontrolled Data

Data issues are pretty easy to spot, and pretty easy to avoid. But are all too common, particularly when we view the data as too hard to setup, because of complicated conditions, or dates, or amount of transaction data.

  • Missing Data

  • Externally controlled data

  • Static data
  • Hard-coded data
  • Uncontrolled Data
  • Live Data
  • Randomly Generated Data

A lot of this comes down to the system design and programming notion of “Avoid Global State” or “Avoid Shared State” “Avoid Global Variables”


Common Solutions - Data

  • Create data for each test
  • Avoid test dependencies
  • Avoid re-using data between tests
  • Check data as a precondition
  • Data synchronisation on all precondition data

Summary

  • Your Test Execution is not ‘flaky’, it is failing intermittently
  • It is possible to remove intermittent failures, even when automating through a GUI
  • Commons solutions: synchronisation, data control, environmental isolation

Other talks to watch


Other talks to watch

Search also for: Flaky Selenium, Flaky Automation, Flaky Test Automation


End

Alan Richardson www.compendiumdev.co.uk


BIO

Alan is a Software Development and Testing Coach/Consultant who enjoys testing at a technical level using techniques from psychotherapy and computer science. In his spare time Alan is currently programming a Twitter client called ChatterScan, and multi-user text adventure game. Alan is the author of the books “Dear Evil Tester”, “Java For Testers” and “Automating and Testing a REST API”. Alan’s main website is compendiumdev.co.uk and he blogs at eviltester.com

Watch the Full 1.5 Hour Version in Evil Tester Talks