In most of your software testing activities, you need data. Sometimes you can rely on a small sample, but if you want to perform some load testing or if you want to test a feature that needs to produce a multipage invoice, then you start to need more than just two or three occurrences. Test data generators are tools that can help you in this task with the automatic generation of hundreds or thousands of customers, products or accounts items with different attributes for their id, email, name, etc.
Test data generators can work in different mode: from the random approach to a more focused or intelligent way. Their goal is to use a predefined data structure to produce the data need for test in a specific format that could range from a spreadsheet file to SQL insert instructions. For more information about test data generation, you can visit https://en.wikipedia.org/wiki/Test_data_generation
This article presents some open source test data generators. Do not hesitate to contact us to include any tool that that is not yet listed in this article. The products currently included in this article are: Databene Benerator, DataFactory, Data Factory, DataGenerator, generatedata, Spawner, SQLfuzz, Synth, test-data-generator
October 19 2021: added generatedata, SQLfuzz, Synth
Databene Benerator is a framework released under both open source and commercial licenses that can be used to generate high-volume test data. This test data generation tool works on Windows and Unix systems. It supports many database systems (Oracle, IBM DB2, MS SQL Server, MySQL, PostgreSQL, …), XML, XML Schema, CSV, Flat Files and Excel. Databene Benerator has also a plugin system that allows for instance to use it with Eclipe or Maven.
Figure: How Databene Benerator works. Source http://databene.org/databene-benerator
DataFactory is an open source test data generator tool that allows you to easily generate test data. It was primarily written for populating database for development or test environments by providing values for names, addresses, email addresses, phone numbers, text, and dates. DataFactory can be integrated with Maven.
Data Factory is an open source Java API that can be used to generate random data. It is useful when developing applications that require a lot of sample data.
DataGenerator is an open source Java library that can produce large volumes of data to meet the challenges of the Big Data domain.. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among variables and the library traversing the model to produce relevant data sets. DataGenerator can be used with IDE like Eclipse, IntelliJ IDEA or NetBeans.
generatedata is an opent source script that is essentially an engine to generate any sort of random data in any format. It currently comes with 30 or so Data Types (types of data it generates), 8 Export Types (formats for the data), plus around 30 data sets for specific countries (city names, regions etc). But more importantly, it can be extended in any way you want. If you need to generate random data programmatically rather than manually via the UI, you can use the REST API.
Spawner is a generator of sample and test data for databases. It can be configured to output delimited text or SQL insert statements. It can also insert directly into a MySQL database. Includes many field types, most of which are configurable. Spawner works on Linux and Windows systems.
Figure. Spawner generation screen. Source: http://spawner.sourceforge.net/
SQLfuzz is an open source tool for software testing that loads random data into SQL tables for testing purposes. The tool can get the layout of the SQL table and fill it up with random data.
Synth is an open source tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data. Synth provides a robust, declarative framework for specifying constraint based data generation. Synth provides a flexible declarative data model which you can version control in git, peer review, and automate.
test-data-generator is a simple open source Java tool to generate data that can be used with Maven. It supports many different data values like emails, country or name. You can produce output in different formats like csv, tsv or sql. You can also directly inject the generated test data in a database using a jdbc connection.