In most of your software testing activities, you need data. Sometimes you can rely on a small sample, but if you want to perform some load testing or if you want to test a feature that needs to produce a multipage invoice, then you start to need more than just two or three occurrences. Test data generators are tools that can help you in this task with the automatic generation of hundreds or thousands of customers, products or accounts items with different attributes for their id, email, name, etc.
Test data generators can work in different mode: from the random approach to a more focused or intelligent way. Their goal is to use a predefined data structure to produce the data need for test in a specific format that could range from a spreadsheet file to SQL insert instructions. This article presents some open source test data generators. Do not hesitate to contact us to include any tool that that is not yet listed in this article. The products currently included in this article are: Benerator, DataFactory, Data Factory, DataGenerator, generatedata, MockNeat, MySQL Random Data Generator, pydbgen, Spawner, SQLfuzz, Synth, test-data-generator
Updates
July 21 2022 :
renamed Databene Benerator to Benerator
added MockNeat, MySQL Random Data Generator, pydbgen
October 19 2021:
added generatedata, SQLfuzz, Synth
Benerator
Benerator is a framework released under both open source and commercial licenses that can be used to generate high-volume test data. This test data generation tool works on Windows and Unix systems. It supports many database systems (Oracle, IBM DB2, MS SQL Server, MySQL, PostgreSQL, …), XML, XML Schema, CSV, Flat Files and Excel. Benerator has also a plugin system that allows for instance to use it with Eclipe or Maven.
Figure: How Databene Benerator works.
Websites: https://github.com/rapiddweller/rapiddweller-benerator-ce and https://sourceforge.net/projects/benerator/
DataFactory
DataFactory is an open source test data generator tool that allows you to easily generate test data. It was primarily written for populating database for development or test environments by providing values for names, addresses, email addresses, phone numbers, text, and dates. DataFactory can be integrated with Maven.
Website: https://github.com/andygibson/datafactory
Data Factory
Data Factory is an open source Java API that can be used to generate random data. It is useful when developing applications that require a lot of sample data.
Website: https://sourceforge.net/projects/data-factory/
DataGenerator
DataGenerator is an open source Java library that can produce large volumes of data to meet the challenges of the Big Data domain. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among variables and the library traversing the model to produce relevant data sets. DataGenerator can be used with IDE like Eclipse, IntelliJ IDEA or NetBeans.
Website: http://finraos.github.io/DataGenerator/
generatedata
generatedata is an open source script that is essentially an engine to generate any sort of random data in any format. It currently comes with 30 or so Data Types (types of data it generates), 8 Export Types (formats for the data), plus around 30 data sets for specific countries (city names, regions,etc). But more importantly, it can be extended in any way you want. If you need to generate random data programmatically rather than manually via the UI, you can use the REST API.
Website: https://github.com/benkeen/generatedata
MockNeat
Mockneat is an arbitrary data-generator open-source library written in Java. It provides a simple but powerful (fluent) API that enables developers to create json, xml, csv and sql data programatically. It can also act as a powerful Random substitute or a mocking library.
Mockneat random data generation example
Websites: https://github.com/nomemory/mockneat, https://www.mockneat.com/
MySQL Random Data Generator
MySQL Random Data Generator is the easiest MySQL random test data generator tool. Load the procedure and execute to auto detect column types and load data.
Website: https://github.com/kedarvj/mysql-random-data-generator
pydbgen
pydbgen is an open source python package that allows random dataframe and database table generation. This Python package generates a random database TABLE (or a Pandas dataframe, or an Excel file) based on user’s choice of data types (database fields). User can specify the number of samples needed. One can also designate a “PRIMARY KEY” for the database table. Finally, the TABLE is inserted into a new or existing database file of user’s choice.
Website: https://github.com/tirthajyoti/pydbgen
Spawner
Spawner is a generator of sample and test data for databases. It can be configured to output delimited text or SQL insert statements. It can also insert directly into a MySQL database. Includes many field types, most of which are configurable. Spawner works on Linux and Windows systems.
Figure. Spawner generation screen. Source: http://spawner.sourceforge.net/
Website: http://spawner.sourceforge.net/
SQLfuzz
SQLfuzz is an open source tool for software testing that loads random data into SQL tables for testing purposes. The tool can get the layout of the SQL table and fill it up with random data.
Website: https://github.com/PumpkinSeed/sqlfuzz
Synth
Synth is an open source tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data. Synth provides a robust, declarative framework for specifying constraint based data generation. Synth provides a flexible declarative data model which you can version control in git, peer review, and automate.
Website: https://github.com/getsynth/synth
test-data-generator
test-data-generator is a simple open source Java tool to generate data that can be used with Maven. It supports many data values like emails, country or name. You can produce output in different formats like csv, tsv or sql. You can also directly inject the generated test data in a database using a jdbc connection.
Nice list of tools. There are also some fee online test data generation services that offer similar features without having to install a software.
Spawner no longer maintained
Thanks for the updated. Sourceforge says that the last update was 2020-01-14, so the tool might still have some value I think.