The Hidden Costs of Poor Test Data Management

Your team has some test data. Let’s say you loaded in a backup form production a few months ago. Your QA specialists are familiar with the data, but your software developers hate it. They keep telling you, “It is not the right data,” or, “It is not very useful for testing.” You let them write and automate some tests, but it is not enough. So they keep complaining. They say development is slow, and it is hard to get a feature out the door because of the test data. This doesn’t make a lot of sense to you: it is data. How bad could it possibly be?

Author: Mark Henke, Enov8, https://www.enov8.com/

Oh, it can be bad. When test data’s poorly managed, there can be many hidden costs. Working with badly managed test data is like entering a vast wasteland. The sun scorches the landscape, sand gets in your shoes, and there is no sustenance as far as you can see. Every step through it costs you. When discussing cost, we could talk in terms of time. We could talk in terms of money. But I don’t find these sufficient for something with hidden costs, like with unhealthy test data management. Bad test data can have many subtle, morale-dampening effects beyond time and money alone.

The Hidden Costs of Poor Test Data Management

So I will categorize all costs under waste. When we manage our test data badly, we waste many things. We waste time. We waste money. But we also waste things like employee happiness, innovation, and mental capacity. In this article, I will cover some of the high-level waste produced by bad test data management. I will also share what unhealthy practices cause these costs. For each cause, I will give guidance on how to measure the associated cost. (By measure, I don’t mean arrive at an exact quantity, but rather give you a sense of how much the cost is hurting the team.)

Let’s get started.

Types of Waste

Most waste produced by badly managed test data can be dumped into four categories:

1. Development Waste

Development waste is what we commonly focus on when we think about “wasting time.” Your developers cannot get things done because the test data is actively getting in their way of writing code.

2. Testing/Verification Waste

Of course, after development comes testing and verification waste. This could mean a waste of manual testing effort, or that the team is doing things that prevent you from automating tests. Perhaps they are spending too much energy running around, chatting up business analysts to try to figure out what the data means.

3. Design Waste

Design waste is harder to grasp. In an unhealthy team, very few people are deeply thinking about the problems they are solving with their software. In these cases, the waste may be small or isolated to a few individuals. But a healthy, collaborative team spends a significant amount of time thinking about the problems in front of them before they dive into code. This is where design waste kills. Bad test data can lead to a lack of understanding about the problem the team is solving and cause them to stumble around. The software developers can also lose confidence or even get annoyed at the test data and how useless it is for them. An annoyed developer is not a productive one.

4. Defect Waste

This one is brutal because it will exist no matter how much you waste on the above. You can spend expensive resources on testing, upfront design, and having many developers. But if you’re managing your test data poorly, you will always have defect waste. None of your other expenditures are guaranteed to prevent the bugs. To make it worse, when bugs hit production, poor test data can make them even harder to reproduce and fix.

Sources of Waste

As much fun as it is learning about different types of waste, let’s get to the meat of the issue. How do we cause waste by poorly managing test data? I will cover some key ways we botch our data and point out what waste it causes and how we can measure it. I have also listed a cheat sheet for quick reference on what the practices are and what waste they cause.

Poor Practice Waste
  Development Design Testing/Verification Defect
Unrealistic Data X X X X
Data Controlled Outside the Team X   X  
State Changes that are not easily visible     X X
Heavily Coupled Data X X X X

1. Unrealistic Data

Testing is about giving confidence. When we see data similar to what we expect, we know that we’ve done our work right. Unrealistic data hampers this confidence. It confuses your team. Sorting through “mystery data” causes your team to question its logic and double- or triple-check results. Defects easily seep in at the corners of unknown data, little edge cases that emerge from doubt. Having unrealistic data is fine for small unit and component tests, but when testing and verifying the system end to end or across services, it can hit your team hard.

Waste It Produces

Unrealistic data produces testing waste, which is pretty straightforward. It also produces development waste as your team tries to figure out how to verify their results. It can even produce some design waste. Unrealistic data is one of the biggest producers of defect waste.

How to Measure Its Cost

You can see signs of unrealistic data when your defects spike, but you cannot easily pinpoint a faulty process. These defects are often “one-offs,” but you consistently get different ones. Your developers and tester may say phrases like, “Our test data makes no sense.”

2. Data Controlled Outside the Team

This one can frustrate a whole host of developers. I have seen data controlled outside of a software team cost the team weeks of development and testing waste. Someone at some point thought it was a good idea to have one group control all the test data for multiple teams. Perhaps this emerged as a result of splitting a large team up and trying to decouple features. This happens to a lot to teams that are middlemen, usually orchestrating or composing data from multiple systems.

Waste It Produces

Working with data controlled outside the team emits large swaths of development and testing waste. The data could reset at some interval, ruining any customization or setup your team has made to verify success. Developers and testers often have to spend time working around the test bed.

How to Measure Its Cost

The best way to measure the cost of this waste is to look at blocked cards in a card wall or work that has large cycle times. The team will often talk about “having to mash something together.” The verification process will seem very ad hoc. Also, your software developers may complain that they need another environment. This is because they want something they can control.

3. Opaque State Changes

Opaque state changes, or state changes that aren’t easily visible, are sneaky buggers. Your system may be so autonomous or so monolithic that you cannot actually peer inside and see what’s going on in there. Often when thinking about getting work done, the team just focuses on what is needed for the end-product, not how to observe it and troubleshoot problems. Your test data will be really hard to discern because it only focuses on the inputs and outputs of long-running or complex processes.

Waste It Produces

When developing, your developers are (hopefully) writing many small tests so there is little development waste. There will be some testing waste since it’s hard to see inside the system to verify behavior. It is also hard to trace outputs back to their inputs. The big pain here is defect waste. Your team will spend an enormous amount of time debugging and troubleshooting if your data is opaque.

How to Measure Its Cost

The big signal here is that developers will have trouble demoing things. User stories will seem very large, and it will be hard for the team to slice them down and still show progress.

4. Heavily Coupled Data

Heavily coupled data is the curse of the monolith. It’s a very common problem. When you have heavily coupled data, it’s hard to isolate problems and solve them. It’s also hard to debug defects.

Waste It Produces

Heavily coupled data can spit up all the waste, which makes it harder to develop features. This can cause testers to chase all sorts of weird effects. Heavily coupled data ripples defects to odd corners of your system, and that makes it really hard to figure out where to insert solutions into your software.

How to Measure Its Cost

The biggest measure of how much heavily coupled data is costing you is how often your developers say, “I don’t understand this system.” This lack of understanding will seep to all steps in their workflow. You may feel that your team is incompetent, but they are just feeling overwhelmed.

Conclusion: Waste Not, Want Not

If you’ve been feeling the pain of some of these sources of waste but previously couldn’t quite define why, I hope this article brings you some clarity. And if none of these issues seem to be ringing true for you, I am glad that your team has avoided these poor practices. Whatever your case, I hope it’s clear to you that how you manage your test data can create significant costs to your team in the form of all sorts of waste, from user story cycle time to developer burnout.

You can avoid these problems. There are healthier alternatives. We didn’t have time to talk about solutions today, that’s a whole new blog post, however there are solutions out there that help you better understand and measure your test data, but for now, knowing where the costs come from should help you avoid some of these problems. 

About the Author

This article was written by Mark Henke. Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.