Test Data Management¶

Status: 🟢 Active | Owner: Engineering Enablement | Last Reviewed: 2025-Q4

Overview¶

Test data management is one of the most underinvested areas of testing practice, and poor test data strategies are one of the most common sources of flaky tests, slow test suites, and tests that cannot run in isolation.

Good test data management has three properties: isolation (tests don't interfere with each other), determinism (tests produce consistent results), and compliance (tests don't use real customer data).

Core Principles¶

Tests Own Their Data¶

Every test is responsible for creating the data it needs and cleaning up after itself. Tests must never depend on data left behind by another test, data pre-loaded by a setup script, or the order in which tests run. Shared mutable test state is the most common cause of ordering-dependent, intermittently failing tests.

Never Use Production Data in Tests¶

Using production data in tests — even anonymised — creates regulatory risk, security risk, and data quality problems. All test data must be synthetically generated.

Test Data Should Be Minimal¶

Create only the data required for the specific test. Minimal test data makes tests more readable and reduces setup complexity. A test that requires 50 fields to be populated to test a 2-field business rule is testing more than it needs to.

Test Data Should Be Expressive¶

The data used in a test should make the test's intent clear. Use values that communicate meaning: customerId: "premium-customer-001" communicates more than customerId: "abc123". Use named constants or factory methods with descriptive names.

Test Data Patterns¶

Object Mother Pattern¶

The Object Mother provides a set of pre-built, named test objects that represent common domain scenarios. Each method returns a fully-configured object for a specific scenario:

// Java — Object Mother
public class OrderMother {

    public static Order standardOrder() {
        return Order.builder()
            .id(OrderId.of("ord-standard-001"))
            .customerId(CustomerId.of("cust-001"))
            .status(OrderStatus.PENDING)
            .items(List.of(OrderItemMother.standardItem()))
            .total(Money.of(99.99, GBP))
            .createdAt(Instant.parse("2025-01-15T10:00:00Z"))
            .build();
    }

    public static Order highValueOrder() {
        return standardOrder().toBuilder()
            .total(Money.of(999.99, GBP))
            .build();
    }

    public static Order orderForCustomer(String customerId) {
        return standardOrder().toBuilder()
            .customerId(CustomerId.of(customerId))
            .build();
    }
}

# Python — Object Mother
class OrderMother:
    @staticmethod
    def standard_order() -> Order:
        return Order(
            id=OrderId("ord-standard-001"),
            customer_id=CustomerId("cust-001"),
            status=OrderStatus.PENDING,
            items=[OrderItemMother.standard_item()],
            total=Money(Decimal("99.99"), Currency.GBP),
        )

    @staticmethod
    def for_customer(customer_id: str) -> Order:
        order = OrderMother.standard_order()
        return dataclasses.replace(order, customer_id=CustomerId(customer_id))

Test Builder Pattern¶

For tests that need fine-grained control over specific fields, use the Builder pattern with sensible defaults:

// TypeScript — Test Builder
class OrderBuilder {
  private data: Partial<Order> = {
    id: 'ord-test-001',
    customerId: 'cust-test-001',
    status: 'PENDING',
    items: [buildStandardItem()],
    total: 99.99,
    currency: 'GBP',
    createdAt: new Date('2025-01-15T10:00:00Z'),
  };

  withCustomerId(customerId: string): this {
    this.data.customerId = customerId;
    return this;
  }

  withStatus(status: OrderStatus): this {
    this.data.status = status;
    return this;
  }

  withTotal(total: number): this {
    this.data.total = total;
    return this;
  }

  build(): Order {
    return this.data as Order;
  }
}

// Usage in tests
const vipOrder = new OrderBuilder()
  .withCustomerId('vip-customer-001')
  .withTotal(1500)
  .build();

Database Test Data¶

Transaction Rollback (preferred for unit/integration tests)¶

Wrap each test in a transaction and roll it back after the test completes. This is the fastest and cleanest approach:

@SpringBootTest
@Transactional  // Spring rolls back after each test
class OrderRepositoryTest {
    @Test
    void should_find_order_by_id() {
        Order saved = orderRepository.save(OrderMother.standardOrder());
        // Transaction is automatically rolled back after this test
    }
}

TestContainers with Schema Reset¶

For tests that explicitly test transaction behaviour (and therefore cannot use rollback), use TestContainers with schema-level reset between tests:

@pytest.fixture(autouse=True)
def reset_database(db_engine):
    """Reset all tables before each test."""
    with db_engine.connect() as conn:
        for table in reversed(Base.metadata.sorted_tables):
            conn.execute(table.delete())
        conn.commit()

Database Seeding for Integration Environments¶

For shared integration or staging environments, use a deterministic seed script that creates a baseline set of known test data. This data: - Is identified with a prefix (TEST_ or similar) to distinguish from real data. - Is idempotent — running the script multiple times produces the same result. - Is documented with the scenarios it enables. - Is never used in automated test assertions (automated tests create their own data).

Test Data and Privacy Compliance¶

Synthetic Data Generation¶

For tests that require realistic-looking data (e.g., name, address, email), use a synthetic data generation library rather than real customer data:

Language	Library
Java	DataFaker
Python	Faker
TypeScript	@faker-js/faker

from faker import Faker

fake = Faker("en_GB")

def build_customer() -> Customer:
    return Customer(
        name=fake.name(),
        email=fake.email(),
        phone=fake.phone_number(),
        address=Address(
            line1=fake.street_address(),
            city=fake.city(),
            postcode=fake.postcode(),
        )
    )

Personal Data in Tests¶

Never copy production data to development or test environments without a documented, approved anonymisation pipeline.
If test data must be seeded in a shared environment, ensure all PII fields are replaced with synthetic values.
All engineers must complete data privacy training before working with any customer data, including anonymised subsets.

Test Data Anti-Patterns¶

Anti-Pattern	Problem	Solution
Shared mutable state	Tests affect each other; ordering-dependent failures	Transaction rollback or per-test reset
Hard-coded IDs	Tests fail when IDs conflict or change	Generate unique IDs per test (`UUID.randomUUID()`)
Production data in tests	Privacy risk, unreliable, changes over time	Synthetic data generation
Massive test fixtures	Hard to understand what the test actually needs	Minimal, targeted test data per test
Database state from previous runs	Non-deterministic test results	Rollback or clean-up in `teardown`

References¶

Last reviewed: 2025-Q4 | Owner: Engineering Enablement