Lessons learned from a cloud grid bug

Some lessons learned when working with a cloud grid vendor on a live production bug they had in their grid system.

Lessons Learned from a Cloud Grid Bug

Because my Selenium WebDriver with Java course covers as much of the Selenium WebDriver API as I can, I often have code usage in there that doesn’t see the light of day on many live projects. Therefore it can act as ’edge cases’ for drivers and grid installations.

A few days ago the course code identified a problem on TestingBot, which they have now fixed.

All my @Test code now runs clean on the TestingBot environment.

This post describes the issue that TestingBot fixed, and some generic lessons learned that I draw from this situation.

Finding The Bug

I ran my @Test code from a Jenkins CI, configuring the grid setup using properties passed in as -D values.

firefox
version: 37
platform: WIN7

I isolated the problem to a specific action in my code by running the first listed failing JUnit tests in the Jenkins build.

I did this, but running the specific methods from IntelliJ and configuring my execution through Environment Variables via the “run \ edit configurations” functionality.

I found the test that triggered the hang:

    @Test
    public void submitFormWithDropDownFiveUsingKeyboardSpecialKeys(){
    
         driver = Driver.get("https://testpages.herokuapp.com/styled/" +
                "basic-html-form-test.html");
    
        WebElement dropDownSelect;
        dropDownSelect = driver.findElement(By.cssSelector("select[name='dropdown']"));
        dropDownSelect.sendKeys(
                Keys.chord(
                    Keys.HOME,
                    Keys.ARROW_DOWN,
                    Keys.ARROW_DOWN,
                    Keys.ARROW_DOWN,
                    Keys.ARROW_DOWN));
    
        waitForOption5DropDownSelected();
    
        clickSubmitButton();
    
        new WebDriverWait(driver,10).until(
            ExpectedConditions.titleIs("Processed Form Details"));
    
        assertDropdownValueIsCorrect();
    }

I could see on the TestingBot “Test Steps” output that it had executed the findElement statement, but wasn’t executing the sendKeys, and it seemed to be freezing when executing the sendKeys.

findElement

        dropDownSelect.sendKeys(
                Keys.chord(
                    Keys.HOME,
                    Keys.ARROW_DOWN,
                    Keys.ARROW_DOWN,
                    Keys.ARROW_DOWN,
                    Keys.ARROW_DOWN));

I emailed this issue off to TestingBot via the support contact form and received an email two days later letting me know the issue had been fixed.

… turns out it was a character encoding issue we had in our grid. The Keys.chord sends out Unicode characters, which somewhere in our grid was wrongly converted, causing the selenium node to hang. We use nginx in front of our grid, which in turn was stalling on the request, only timing out after 900 seconds.

So, all good.

Lessons Learned

And some generic lessons learned:

Our automated code often has workarounds in it due to environment
Maintaining a grid can be hard
Cloud Grid systems, by default, have lots of logging support
Your code config should support CI and local debugging
Investigate and raise bugs

Our automated code often has workarounds in it due to environment

If I was working on site where this code failed on my grid, I’d probably just change the code and find another way of selecting the items in the drop down. And to be fair to TestingBot, this isn’t the way that most people could select an item in a list, but my Selenium WebDriver with Java course is designed to teach the normal ways, and demonstrate alternative approaches because, when I automate, I often have to find workarounds to:

browser issues
Webdriver versioning issues
environment issues
system bugs
etc.

Maintaining a grid can be hard

To be honest. If this had happened on a local grid because of network configuration issues in our environment. I don’t think I’d have been able to track it down and fix it.

I’d have been able to create a workaround in my code, but what would happen next time there is an issue?

Maintaining a local grid can be hard, and takes time to keep it up to date, and up and running. That’s why I try to keep my grid configurations pretty simple, and use cloud based grid solutions when I need a grid with lots of versions and combinations.

Cloud Grid systems have lots of logging support

One thing I encounter on live sites is that people often write logging code very early into their testing ‘frameworks’ because they don’t have the kind of default logging support that you saw above on the output from TestingBot GUI.

The cloud grid systems (Saucelabs, BrowserStack, TestingBot) all show the WebDriver protocol messages received, and capture movies or screenshots of the code executing against the browser.

Consider in your own work environment whether you should write the logging code yourself, or take advantage of the logging already built into the cloud grid systems.

Your code config should support CI and local debugging

I described in a blog post some ways of configuring your driver abstractions.

This situation exemplifies why it is needed. I was able to run the @Test code via CI, and very quickly, run the individual @Test method in the IDE in debug mode. If you have to fiddle with config files or find this hard to do, then I recommend you revisit your code so you have the flexibility you need to run your test code as and when you need to.

Investigate and Raise Bugs

Do raise bugs when you find them. You don’t always get a good response like I did from TestingBot. i.e. they thanked me, and actually fixed the problem

But with a single fix, TestingBot became a viable platform for me to consider as competition to SauceLabs and BrowserStack. That’s a good result.