Database Selection Framework¶

Status: 🟢 Active | Owner: Data Engineering

Choosing a database is a long-term architectural commitment. Migrating from one database technology to another is expensive, risky, and time-consuming. This framework exists to ensure that database technology choices are made deliberately, with clear understanding of the workload requirements and the total cost of ownership — not based on novelty or familiarity.

All new database technology selections require Data Engineering review before adoption. Any technology not listed as Approved below requires an Architecture Review Board proposal.

The Operational vs. Analytical Decision¶

Before selecting a specific database product, the primary question is whether the workload is operational or analytical. These are fundamentally different workloads with different database families — mixing them causes performance, cost, and governance problems.

Characteristic	Operational (OLTP)	Analytical (OLAP)
Primary access pattern	Row-level reads and writes	Columnar scans across many rows
Query complexity	Simple, indexed lookups; short transactions	Complex aggregations; multi-table scans
Consistency requirement	Strong (ACID)	Eventual acceptable
Latency requirement	Milliseconds	Seconds to minutes
Concurrency	Thousands of short transactions/second	Tens of concurrent analytical queries
Data volume per query	Small (1–1,000 rows)	Large (millions–billions of rows)
Approved database families	Relational, document, cache, time-series	Data warehouse, columnar store
Who queries	Application code	BI tools, data scientists, analysts
Who owns	Product squad	Data Engineering

See Data Modeling Standards for the architectural separation requirements.

Approved Database Technologies¶

Tier 1 — Default (No additional review required)¶

PostgreSQL 16 — Operational Relational¶

Use for: All transactional workloads with structured data. The default choice for any new operational database unless a specific capability gap justifies an alternative.

PostgreSQL covers the vast majority of production workloads. Before selecting any other database, teams must articulate a concrete, measurable capability gap that PostgreSQL cannot address.

Attribute	Standard
Version	PostgreSQL 16 (minimum 15)
Deployment	AWS RDS or Aurora PostgreSQL (Serverless v2 for variable workloads)
Backup	Automated daily snapshots + continuous WAL archiving; 30-day retention
Encryption	At-rest encryption enabled (AES-256); TLS in transit
Connection management	PgBouncer in transaction mode for all services with >10 connections
Extensions approved	`pg_uuid_ossp`, `pg_trgm`, `pgcrypto`, `btree_gin`, `timescaledb` (on approval)

Strengths: ACID compliance, rich indexing, JSONB for semi-structured data, mature tooling, wide engineering familiarity, row-level security, extensive extension ecosystem.

Not suitable for: Columnar analytics at scale (use Snowflake), graph traversal (consider Neo4j — Tier 2), time-series at very high ingest rates (consider TimescaleDB or InfluxDB — Tier 2).

Redis 7 — Cache and Ephemeral State¶

Use for: Session storage, rate limiting, distributed locks, pub/sub messaging, ephemeral feature flags, hot-path caching.

Attribute	Standard
Version	Redis 7.x
Deployment	AWS ElastiCache (Redis) — cluster mode for HA
Persistence	Disabled for pure caches. RDB snapshots enabled for session stores. AOF not approved (write amplification).
Eviction policy	`allkeys-lru` for caches; `noeviction` for session stores
Max memory	Always set `maxmemory` — never allow unbounded growth

Do not use Redis as a primary database. Redis is ephemeral by design. Any data that must survive a Redis restart must have an authoritative source in PostgreSQL.

Snowflake — Analytical / Data Warehouse¶

Use for: Business intelligence, reporting, data science workloads, cross-service analytical queries, historical trend analysis.

Attribute	Standard
Tier	Snowflake Enterprise
Warehouse sizing	XS for development; S for standard BI; M/L for data science on request
Auto-suspend	5 minutes (development), 10 minutes (production)
Data ingestion	Fivetran (SaaS connectors) or custom ELT via dbt
Transformation	dbt (data build tool) — all transformations as SQL models in version control
Access	BI tool service accounts; data scientist user accounts; no direct service code access

Snowflake must not be queried by application services — it is an analytical platform for humans and BI tools, not an operational database for transactional APIs.

Tier 2 — Approved with Data Engineering Review¶

These technologies are approved but require a review with Data Engineering before adoption to confirm fit, operational readiness, and team capability.

Amazon DynamoDB — Key-Value / Document at Scale¶

Use for: Very high write throughput at predictable access patterns; global distribution requirements; session-like data at very large scale.

Not suitable for: Complex queries, ad-hoc access patterns, or any use case where the access patterns are not fully known at design time. DynamoDB's single-table design requires access pattern analysis before the schema can be designed.

Requires: Documented access pattern analysis, single-table design review by Data Engineering, capacity estimation.

Elasticsearch / OpenSearch — Full-Text Search and Log Analytics¶

Use for: Full-text product search, document search, log aggregation and analysis.

Not suitable for: Primary data storage (no ACID, no foreign keys). Elasticsearch is a search index, not a database of record. The source of truth must remain in PostgreSQL.

Requires: Data Engineering review to confirm the indexing strategy and sync mechanism from the authoritative store.

TimescaleDB / InfluxDB — Time-Series¶

Use for: High-ingest metric data, IoT sensor streams, financial tick data, custom observability data at very high volume.

Not suitable for: General-purpose operational data that happens to have timestamps.

Standard observability metrics go to Prometheus/Grafana — time-series databases are for application-level time-series data with specific high-ingest requirements.

Neo4j — Graph Database¶

Use for: Recommendation engines, fraud detection graphs, knowledge graphs, complex network relationship traversal where SQL JOIN depth would be prohibitive.

Requires: Architecture Review Board approval in addition to Data Engineering review. High operational complexity and limited internal expertise.

Tier 3 — Not Approved¶

These technologies are explicitly not approved for new projects. Legacy services using them must have a migration plan.

Technology	Reason Not Approved
MySQL / MariaDB	PostgreSQL is preferred; no capability gap that justifies two relational stacks
MongoDB	PostgreSQL's JSONB covers most document use cases; MongoDB's lack of ACID transactions at join boundaries creates consistency risks
Cassandra	Extreme operational complexity; DynamoDB covers the high-scale key-value use case with lower ops burden
SQLite	Not appropriate for networked services; permitted for local tooling and development only
Oracle	Commercial licence; PostgreSQL covers all capabilities needed
Microsoft SQL Server	Commercial licence; PostgreSQL covers all capabilities needed
CouchDB	No current use case that requires it

Decision Process¶

When selecting a database for a new service or feature, work through this checklist:

Step 1: Is the workload operational or analytical? - Operational → go to Step 2. - Analytical → use Snowflake. Do not pass Go. If you believe you need a different analytical store, engage Data Engineering.

Step 2: Can PostgreSQL handle this workload? - Document the access patterns and estimated data volume. - If PostgreSQL can handle it: use PostgreSQL. - If there is a genuine capability gap (not a performance concern that can be solved with indexing), proceed to Step 3.

Step 3: Which Tier 2 technology addresses the gap? - Review the approved Tier 2 list above. - Engage Data Engineering for a review session. - Document the decision in an ADR.

Step 4: Is it a Tier 3 technology or something not on the list? - Submit an Architecture Review Board proposal. - Proposal must include: capability gap, alternatives considered, operational plan, migration plan if rejected.

Total Cost of Ownership Considerations¶

When evaluating database technologies, teams frequently underweight operational costs. Ensure the following are considered before committing:

Cost Factor	Questions to Ask
Operational expertise	Does the team have engineers who can diagnose and tune this database under production load?
Backup and recovery	How are backups taken? What is the tested RTO and RPO?
Schema migration	How are schema changes applied without downtime?
Monitoring	Do our observability tools have native integration for this database?
Scaling	What does horizontal scaling look like? What is the failure mode at 10× current load?
Vendor lock-in	Can this workload be migrated to an alternative if needed in 2–3 years?
Licence cost	What is the per-GB or per-request cost at projected scale?
Security	Does this technology support the encryption, audit logging, and access control requirements?

References¶

Last reviewed: 2025-Q4 | Owner: Data Engineering