Data Migration & Schema Evolution¶
Status: 🟢 Active | Owner: Data Engineering
Production databases evolve continuously. Doing this safely — without downtime, without breaking running application versions, and without risking data integrity — requires a disciplined process. These standards apply to all schema changes across operational databases.
Core Principles¶
Schema changes are code. They are version-controlled, reviewed, and tested with the same rigour as application code.
Every migration must be backward compatible during the deployment window. In a rolling deployment, the old version of the service runs against the new schema while the new version is starting up, and vice versa during rollback. The schema must work with both.
Destructive operations are always deferred. Dropping a column, renaming a column, or removing a constraint happens in a separate release, after the application code that references that column has been removed and deployed.
Migrations are never edited after they are merged to main. If a migration has a bug, a new migration fixes it. Editing a committed migration causes checksums to fail on other environments.
Approved Migration Tools¶
| Language Stack | Tool | Standard |
|---|---|---|
| Java / Spring Boot | Flyway | Required |
| Python | Alembic | Required |
| TypeScript / Node.js | Prisma Migrate (Prisma projects) or db-migrate | Required |
| Any | Raw SQL migrations via Flyway | Acceptable for multi-language services |
Flyway is the default. If your service uses a different language but needs migration management, Flyway's command-line tool can be used independently of the application language.
Flyway Standards (Java)¶
File Naming¶
V<version>__<description>.sql
V20240315001__create_orders_table.sql
V20240315002__add_customer_email_index.sql
V20240322001__add_order_status_column.sql
Undo migrations (optional, for rollback support):
U20240315001__drop_orders_table.sql
Use the date + sequence format (YYYYMMDDNNN) for version numbers — not sequential integers. This prevents merge conflicts when multiple teams develop migrations simultaneously.
Repeatable Migrations¶
Use repeatable migrations (prefix R__) for objects that should be recreated on every change: views, stored functions, permissions grants.
Required Configuration¶
# application.yml
spring:
flyway:
enabled: true
locations: classpath:db/migration
baseline-on-migrate: false # Never auto-baseline in production
out-of-order: false # Migrations must run in order
validate-on-migrate: true # Checksum validation — never bypass
clean-disabled: true # REQUIRED — prevent accidental data destruction
clean-disabled: true prevents flyway clean from wiping the database. This must never be disabled in any environment that contains real data, including staging.
The Backward-Compatible Migration Pattern¶
The key constraint in zero-downtime deployments is that new code and old code must both work against the new schema for the duration of the deployment window.
The Expand-Contract Pattern¶
Schema evolution follows a three-phase lifecycle:
Phase 1: EXPAND Phase 2: MIGRATE Phase 3: CONTRACT
───────────────── ────────────────── ─────────────────
Add new column Deploy new app code Remove old column
(nullable, no that writes to new (separate release,
default required) column AND reads after old code is
Old code ignores it. from new column. fully retired)
Step-by-Step: Renaming a Column¶
Never rename a column in a single migration. Follow these phases:
Release 1 — Add new column:
-- V20240315001__add_customer_ref_column.sql
ALTER TABLE orders ADD COLUMN customer_reference TEXT;
-- Backfill existing rows (run as separate migration if table is large)
-- V20240315002__backfill_customer_reference.sql
UPDATE orders SET customer_reference = customer_id::TEXT
WHERE customer_reference IS NULL;
ALTER TABLE orders ALTER COLUMN customer_reference SET NOT NULL;
Release 2 — Migrate application code: - Deploy application code that reads from customer_reference and writes to both customer_id and customer_reference. - Validate in production.
Release 3 — Remove old column (separate release, minimum one sprint later):
Safe vs. Unsafe Operations¶
| Operation | Safe in Zero-Downtime? | Notes |
|---|---|---|
ADD COLUMN (nullable) | ✅ Safe | Immediately safe |
ADD COLUMN with DEFAULT (PostgreSQL 11+) | ✅ Safe | Metadata-only in PG 11+ |
ADD COLUMN NOT NULL without default | ❌ Unsafe | Rewrites table; blocks reads/writes |
DROP COLUMN | ⚠️ Safe with guard | Only after all code referencing it is removed |
ADD INDEX CONCURRENTLY | ✅ Safe | Does not lock table |
ADD INDEX (without CONCURRENTLY) | ❌ Unsafe | Locks table for the duration |
ADD CONSTRAINT (CHECK, NOT NULL) | ❌ Unsafe | Full table scan with lock |
ADD CONSTRAINT NOT VALID + VALIDATE CONSTRAINT | ✅ Safe | Two-phase: add without validation, then validate with ShareLock |
RENAME COLUMN | ❌ Unsafe | Breaks code referencing old name until fully deployed |
RENAME TABLE | ❌ Unsafe | As above |
ALTER COLUMN TYPE | ❌ Unsafe (usually) | Rewrites table unless cast is implicit |
Adding a Non-Nullable Column¶
-- ❌ Locks table for full rewrite
ALTER TABLE orders ADD COLUMN priority INTEGER NOT NULL DEFAULT 0;
-- ✅ Safe three-step approach
-- Step 1: Add nullable
ALTER TABLE orders ADD COLUMN priority INTEGER;
-- Step 2: Set default for future rows
ALTER TABLE orders ALTER COLUMN priority SET DEFAULT 0;
-- Step 3: Backfill existing rows in batches (see Large Table Migrations below)
UPDATE orders SET priority = 0 WHERE priority IS NULL;
-- Step 4: Add NOT NULL constraint using NOT VALID first
ALTER TABLE orders ADD CONSTRAINT chk_orders_priority_not_null
CHECK (priority IS NOT NULL) NOT VALID;
-- Step 5: Validate (ShareLock — does not block reads/writes on PG 14+)
ALTER TABLE orders VALIDATE CONSTRAINT chk_orders_priority_not_null;
Large Table Migrations¶
For tables with millions of rows, batch operations are required. A single UPDATE orders SET ... on a large table creates a long-running transaction that holds locks and generates excessive WAL.
Batch Update Pattern¶
-- V20240315003__backfill_priority_batched.sql
-- This script is designed to be run as a Flyway migration or a one-time script.
-- It updates rows in batches of 10,000 to avoid long-running transactions.
DO $$
DECLARE
batch_size INT := 10000;
rows_updated INT;
BEGIN
LOOP
UPDATE orders
SET priority = 0
WHERE id IN (
SELECT id FROM orders
WHERE priority IS NULL
LIMIT batch_size
FOR UPDATE SKIP LOCKED
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated = 0;
PERFORM pg_sleep(0.1); -- Brief pause between batches
END LOOP;
END $$;
Index Creation on Large Tables¶
Always use CONCURRENTLY:
-- ❌ Locks the table for minutes on large tables
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
-- ✅ Runs without locking — takes longer but safe
CREATE INDEX CONCURRENTLY idx_orders_customer_status
ON orders (customer_id, status);
Note: Flyway runs migrations inside a transaction by default. CREATE INDEX CONCURRENTLY cannot run inside a transaction. Use Flyway's @NonTransactional annotation:
// Java annotation to disable transaction wrapping for this migration
@NonTransactional
public class V20240315004__AddConcurrentIndex implements JdbcMigration {
public void migrate(Connection conn) throws Exception {
conn.createStatement().execute(
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_customer_status " +
"ON orders (customer_id, status)"
);
}
}
Or use a raw SQL file with -- flyway:nonTransactional comment.
Alembic Standards (Python)¶
# alembic/env.py — required configuration
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context
# Use offline mode for generating SQL scripts (useful for review)
# Use online mode for applying to a real database
def run_migrations_online():
connectable = engine_from_config(
config.get_section(config.config_ini_section),
prefix="sqlalchemy.",
poolclass=pool.NullPool, # No pooling in migration context
)
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
Migration files use op.execute() with raw SQL for complex operations:
# migrations/versions/2024_03_15_001_add_priority.py
def upgrade():
# Safe: ADD COLUMN with nullable first
op.add_column('orders', sa.Column('priority', sa.Integer(), nullable=True))
op.execute("UPDATE orders SET priority = 0 WHERE priority IS NULL")
op.alter_column('orders', 'priority', nullable=False)
def downgrade():
op.drop_column('orders', 'priority')
Data Migration (Operational to Analytical)¶
Beyond schema migrations, data must periodically be migrated between the operational and analytical planes. These have different requirements.
Operational → Analytical (ELT Pipeline)¶
Data flows from PostgreSQL to Snowflake via:
- Change Data Capture (CDC) using Debezium (reads PostgreSQL WAL). Recommended for low-latency replication.
- Scheduled extract via Fivetran or custom Airflow DAG. Acceptable for daily-batch analytical datasets.
All analytical ingestion pipelines are owned and maintained by Data Engineering — not by the product squad.
Product squads are responsible for:
- Notifying Data Engineering when schemas change (via the schema change review process).
- Ensuring new tables/columns that should be replicated are documented and have appropriate data classification labels.
- Not dropping columns without a coordination window with Data Engineering (the pipeline will break if a source column disappears).
Schema Change Coordination¶
Whenever a migration adds, removes, or renames a column in a table that is replicated to the analytical plane, the squad must:
- Open a coordination ticket with Data Engineering at least 5 business days before the migration is deployed to production.
- Data Engineering updates the ELT pipeline to handle the change.
- Both the schema migration and the pipeline update deploy in a coordinated release window.
References¶
Last reviewed: 2025-Q4 | Owner: Data Engineering