Skip to content

Data Access Patterns & ORM

Status: 🟢 Active  |  Owner: Data Engineering

How application code accesses data determines both the performance characteristics and the security posture of a service. These standards cover the repository pattern, ORM configuration, connection management, and the critical separation between operational and analytical data access.


The Operational / Analytical Access Boundary

The most important access pattern rule has nothing to do with ORMs or connection pools. It is about where different classes of query are allowed to run.

Application services may only access their own operational database. They may not:

  • Query another service's database directly (this is a service coupling antipattern).
  • Run analytical or aggregation queries against their own production database that could degrade operational performance.
  • Connect to the Snowflake analytical store (Snowflake is for BI tools and data science tooling, not service code).

When an analytical result is needed in a service's API response (e.g. a dashboard count, a trend), the correct pattern is:

  1. Compute the aggregate in the analytical pipeline (dbt model in Snowflake).
  2. Materialise the result into a read-optimised store (Redis cache or a PostgreSQL materialised view refreshed on a schedule).
  3. The service API reads from the materialised result — not from an on-demand aggregation.

Repository Pattern

All data access must go through a Repository abstraction. Direct database calls from application services, use-case handlers, or HTTP controllers are not permitted.

HTTP Controller
Application Service / Use Case Handler
Repository Interface (defined in domain layer)
Repository Implementation (defined in infrastructure layer)
Database

The domain and application layers define interfaces; the infrastructure layer provides implementations. This allows the data access layer to be replaced or tested in isolation.

Java Example

// Domain layer — interface only, no database imports
package com.acme.orders.domain;

public interface OrderRepository {
    Optional<Order> findById(OrderId id);
    List<Order> findByCustomerId(CustomerId customerId, Pageable pageable);
    Order save(Order order);
    void delete(OrderId id);
}

// Infrastructure layer — JPA implementation
package com.acme.orders.infrastructure.persistence;

@Repository
public class JpaOrderRepository implements OrderRepository {
    private final SpringDataOrderRepository delegate;

    @Override
    public Optional<Order> findById(OrderId id) {
        return delegate.findById(id.value())
            .map(OrderMapper::toDomain);
    }
}

Python Example

# Domain layer — Protocol (structural interface)
from typing import Protocol

class OrderRepository(Protocol):
    async def find_by_id(self, order_id: str) -> Order | None: ...
    async def find_by_customer(self, customer_id: str, limit: int, offset: int) -> list[Order]: ...
    async def save(self, order: Order) -> Order: ...

# Infrastructure layer — SQLAlchemy implementation
class SqlAlchemyOrderRepository:
    def __init__(self, session: AsyncSession) -> None:
        self._session = session

    async def find_by_id(self, order_id: str) -> Order | None:
        result = await self._session.execute(
            select(OrderRecord).where(OrderRecord.id == order_id)
        )
        record = result.scalar_one_or_none()
        return OrderMapper.to_domain(record) if record else None

ORM Configuration Standards

Java / Spring Data JPA

# application.yml — required JPA configuration
spring:
  jpa:
    open-in-view: false          # REQUIRED — prevents session kept open in web layer
    show-sql: false              # Never true in production (credential leakage risk)
    hibernate:
      ddl-auto: validate         # Never 'create' or 'update' in production
    properties:
      hibernate:
        default_schema: public
        jdbc:
          batch_size: 50         # Enable batch inserts
          order_inserts: true
          order_updates: true
        format_sql: false

open-in-view: false is mandatory. The Open Session in View anti-pattern holds a database connection open for the entire HTTP request duration, exhausting connection pools under load and creating lazy-loading surprises in serialisers.

ddl-auto: validate ensures the schema matches the entity definitions at startup without modifying it. Schema changes are managed exclusively via Flyway migrations.

Python / SQLAlchemy 2

# Async engine configuration
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

engine = create_async_engine(
    settings.database_url,
    pool_size=10,           # Base pool size
    max_overflow=20,        # Additional connections under load
    pool_pre_ping=True,     # Verify connections before use
    pool_recycle=3600,      # Recycle connections after 1 hour
    echo=False,             # Never True in production
)

async_session = sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False,  # Prevent lazy loads after commit
)

TypeScript / Prisma

// prisma/schema.prisma
generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

// prisma client configuration — singleton pattern
import { PrismaClient } from '@prisma/client';

declare global {
  var __prisma: PrismaClient | undefined;
}

const prisma =
  global.__prisma ??
  new PrismaClient({
    log: process.env.NODE_ENV === 'development' ? ['query', 'error'] : ['error'],
  });

if (process.env.NODE_ENV !== 'production') global.__prisma = prisma;

export default prisma;

Connection Pool Management

Connection pools are shared resources. Misconfigured pools are a leading cause of production database incidents.

Sizing Formula

pool_size = (number_of_pods × connections_per_pod) ≤ database_max_connections × 0.8

Example:
  - 10 pods, each with pool_size=10 → 100 total connections
  - RDS db.t3.large max_connections ≈ 170
  - 100 < 170 × 0.8 = 136 ✅

  - 50 pods, each with pool_size=20 → 1,000 total connections — EXCEEDS LIMIT ❌
  - Solution: Deploy PgBouncer in transaction pooling mode

PgBouncer is required for any service where pods × pool_size > 100. PgBouncer multiplexes many application connections onto a smaller number of real PostgreSQL connections, enabling high connection counts without overwhelming the database.

Connection Pool Anti-Patterns

# ❌ Creating a new engine per request — destroys pool effectiveness
@app.get("/orders")
async def list_orders():
    engine = create_async_engine(DATABASE_URL)  # new engine every request!
    async with AsyncSession(engine) as session:
        ...

# ✅ Singleton engine, dependency-injected session
@app.get("/orders")
async def list_orders(session: AsyncSession = Depends(get_session)):
    ...
// ❌ Holding a connection open across a long-running operation
@Transactional
public void processAndWait(String orderId) {
    Order order = repo.findById(orderId); // connection acquired here
    Thread.sleep(30_000);                 // connection held for 30s 🔴
    repo.save(order);
}

// ✅ Keep transactions short — acquire → read → mutate → commit → release
@Transactional
public void process(String orderId) {
    Order order = repo.findById(orderId);
    order.confirm();
    repo.save(order);
}   // connection released here, immediately after commit

Required Connection Lifecycle

Stage Standard
Idle timeout 10 minutes — return idle connections to pool
Connection max lifetime 60 minutes — prevent stale connections
Acquisition timeout 3 seconds — fail fast if pool is exhausted
pool_pre_ping Enabled — test connection before use

Read Replicas

For read-heavy services, route read queries to a read replica to reduce load on the primary.

# Spring DataSource routing
spring:
  datasource:
    primary:
      url: jdbc:postgresql://primary.db.acme.internal:5432/orders
    replica:
      url: jdbc:postgresql://replica.db.acme.internal:5432/orders

Rules for read replica usage:

  • Use replicas only for reads that can tolerate replication lag (typically 100ms–2s on AWS RDS).
  • Never route writes to a replica — they are read-only.
  • Never route reads that immediately follow a write to a replica — the write may not have replicated yet. Use the primary for read-your-own-writes scenarios.
  • Never route reads that are part of a transaction to a replica — keep the full transaction on the primary.

Data Access Security

Principle of Least Privilege — Database Credentials

Each service must use a dedicated database user with only the permissions it requires. No service may share credentials with another service.

-- Create a dedicated service user
CREATE USER order_service WITH PASSWORD 'managed-by-vault';

-- Grant only what is needed — no superuser, no DDL
GRANT CONNECT ON DATABASE orders TO order_service;
GRANT USAGE ON SCHEMA public TO order_service;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO order_service;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO order_service;

-- No DROP, CREATE, ALTER — DDL is for migrations only

Database credentials are managed in Vault and injected at runtime. See Secrets Management.

Production Database Access for Engineers

Direct engineer access to production databases is restricted. The break-glass procedure requires:

  1. Raise an access request in the ITSM tool with business justification.
  2. Approval from the Engineering Manager and SRE Lead.
  3. Access is granted for a maximum of 4 hours via a time-limited Vault credential.
  4. All queries executed are logged to the audit trail.
  5. No data modifications without a separate change request.

Read-only production access for diagnosis during an active P1/P2 incident can be self-approved by the on-call engineer, but must be documented in the incident timeline.


References


Last reviewed: 2025-Q4  |  Owner: Data Engineering