Skip to main content
blog title image

7 minute read - tools

Generate Random Tabular Test Data

Oct 30, 2021

What happens if you combine a table editor with test table generation? You get a tabular random table generation tool.

Among the many tools, games and apps that I’ve written sits a “Markdown and CSV Table Editor”.

I create this because I wanted:

You can use the tabular data generator and editor online with source on github.

Why AG Grid?

I wanted a flexible data grid, which ‘just worked’ without my having to do much coding.

I didn’t really want to do much more than:

  • create the grid
  • load it with data

Then I wanted the grid to handle:

  • moving rows about
  • moving columns about
  • exporting CSV

I had to do some coding to:

  • add new rows
  • add new columns
  • rename columns
  • delete rows
  • delete columns

But that was simple enough and I added a custom header in the grid columns to have links to add that functionality.

Fairly quickly I had a CSV editor. That could create and export CSV Files.

I then wrote a simple routine to parse the grid data and create Markdown tables as well.

And I thought I’d probably just stop there.

Learning JavaScript

The way I learn most languages is by creating applications.

I’ve started creating ‘faq’ style repos with small examples to make sure I learn the Unit Test Frameworks for the language at the same time:

e.g.

There are more repos like the above in my Github Profile

But primarily, I try to create applications and tools as it helps put concepts into context.

Dangers With Tool Creation

One of the dangers with tool creation is that I end up adding features ‘because I can’, this is fine when I’m using the tool to learn a language but as soon as I create something that is useful I have to be careful about what I add.

For this tool, I tried to keep the functionality set to the core of data editing.

Some of the features of the tool were driven by a desire to experiment:

  • I wanted to learn how to load files into a browser
  • I wanted to learn how to ’export’ from a page to file
  • I wanted to learn how to implement drag and drop file uploading in JavaScript

But these were core to the feature set in the application and easy to justify.

Some features I added ‘because I could’:

  • generating JSON
  • generating Gherkin
  • generating HTML Tables

Adding extra imports and exports was useful because it forced me to abstract the data structures to have a single internal representation of a data set to make it easier to work with. So I only have to ’export to’ and ‘import from’ a single data set for each format, and I get ‘conversion’ for free, e.g. conversion becomes “import from csv to internal data set” and “export from internal data set to Markdown”.

Test Data Generation

I always had ’test data generation’ in mind when I created the table editor.

I last wrote Test Data Generation tools back in 2005:

This used XML as the data format and was coded in VB, so is completely non-runnable.

I’ve been tempted to revisit it but never found the time.

I created a prototype of this tool in Excel, and it probably no longer works either.

But I released them in case the code is useful for someone.

During the writing of this post I realised that I appear to have experimented with Faker in Java about 3 years ago. I had forgotten completely, but that is why I release as much of my code publicly as possible, otherwise it just gets lost on my hard drive:

Test Data Generation Libraries

Since I wrote my test data generation tools, other libraries have been created in multiple languages to make Test Data Creation easier.

The most popular is Faker.

I think the Perl version may be the ‘original’ but I’m not sure:

Faker is pretty easy to use, it has multiple dictionaries and randomly pulls data in from those dictionaries. And it has helper methods to generation random dates, integers, etc.

The only reason I can see for not using Faker is if you were concerned that the library was ’too big’ because of all the text in the dictionaries.

Faker is a really good basis for writing Random Test Data for feeding in to Automated Execution.

Using Random Data When Automating

When using Random data I tend to recommend randomising ’equivalence classes’:

I do use adhoc random generation to fuzz and test things e.g.:

Regex

I didn’t start with Faker when I wanted to add test data generation into my table editor though.

Why?

Faker is a library. It has many API Calls e.g.

I would have to find a way to ’expose the library’ to a user of the table editor and didn’t want to write a parser as the first step in adding random test data generation to the table editor.

Instead I went on the hunt for a Regex library which can generate random data.

Faker can actually do this, there is a helper method in both the Java and JavaScript libraries for generating from a Regex, and there may well be a similar method in all the Faker libraries.

It isn’t a well used method in the libraries so often isn’t a full Regex parser.

I found RandExp.js and it had a single API call, so I thought that would be easy to incorporate.

My first random data generator spec for the tool was a text area where each line was either a ’name’ or a ‘regex’, e.g. This uses the example regex from the RandExp site:

sha1 Hash
[a-f0-9]{40}
Password
\w{6,15}
Time
(1[0-2]|0[1-9])(:[0-5]\d){2} (A|P)M

This allowed me to have a very flexible test data generation system with minimal parsing code required.

I don’t know what the Regex limitations in RandExp are, but it seems pretty good with the experiments I’ve run through it.

Issues with Regex for Random Data Generation

Regex is incredibly flexible but doesn’t really understand the semantics of the data it is processing.

It is a syntax based technology.

As an example to generate a random number between 50 and 999 I’d have to create two conditions in a regex

  • one which simulates generating a two digit number between 50 and 99
    • [5-9][0-9]
  • one which simulates generating a three digit number between 100 and 999
    • [1-9]([0-9]{2})
  • then combine them
    • [5-9][0-9]|[1-9]([0-9]{2})

Why write ‘simulate’? Because Regex is really just working with Strings and Characters, not primitives like numbers

We can simulate a lot of ‘stuff’ but it is not same as generating a random number between 50 and 999.

Faker for More Domain Control

Having created a simple set of functionality for the Regex I expanded my text area parser to support simple faker calls e.g.

faker.name.findName
faker.internet.email
faker.helpers.createCard

The parser is very simple and just splits the line by “.” and sees if the name matches a faker api chain and if so, calls it.

Allowing me to intersperse the faker calls with Regex

sha1 Hash
faker.git.commitSha
Password
faker.internet.password
Time
(1[0-2]|0[1-9])(:[0-5]\d){2} (A|P)M

There will be some faker api calls that don’t work because they require parameters and the only special case call I’ve supported is to the faker.fake API which takes a single argument as a mustache template e.g.

faker.fake {{name.lastName}}, {{name.firstName}}

Limits of Tabular Data Generation

Each of the columns is defined seperately in the data generation spec so there is no way to create dependencies between them.

  • derived data like calculating an age from a date
  • conditional data like one column value depending on another

I might add that in the future but that would require a more complex parser or some sort of ’exec’ mechanism in the parser to allow JavaScript to be used.

Summary

This is a work in progress project that I’m using to learn more about JavaScript development.

I tried to make it useful, as I proceed, and most features are in an MVP stage i.e. they do the minimum necessary to meet the user need.

The tool is functional and can be used online for free Test Data Generator and Table Editor.

Source is available, and since it is a ’learning’ project the code is pretty simple, it also doesn’t use any build system at the moment, it is pure browser based JavaScript.

It would not have been possible to create this without adding in additional libraries like AG Grid, RandExp, Faker.js and PapaParse for CSV parsing.