Performance & Load Testing¶

Status: 🟢 Active | Owner: Engineering Enablement | Last Reviewed: 2025-Q4

Overview¶

Performance testing verifies that a system meets its latency, throughput, and reliability requirements under expected and peak load conditions. It must be an explicit part of the development and release process for any service that handles user-facing traffic or participates in a critical business flow.

Performance testing is not a one-time pre-launch activity. Establish baselines early, run tests on every significant change, and treat performance regressions with the same urgency as functional bugs.

Performance Test Types¶

Type	Purpose	When to Run
Load test	Verify behaviour under expected normal load	Before every release
Stress test	Find the breaking point; verify graceful degradation	Quarterly or before peak traffic periods
Spike test	Verify behaviour under sudden traffic bursts	For services exposed to viral/marketing events
Soak test	Identify memory leaks and degradation over time	Monthly or after major dependency changes
Smoke test	Quick sanity check — minimal load, verify baseline	On every deployment

Approved Tool: k6¶

k6 is the enterprise standard for performance and load testing. It uses JavaScript for test scripting, has excellent CI/CD integration, and produces structured output that integrates with Grafana for visualisation.

// k6 — basic load test
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const orderCreationDuration = new Trend('order_creation_duration');

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Ramp up to 50 users over 2 minutes
    { duration: '5m', target: 50 },   // Hold at 50 users for 5 minutes
    { duration: '2m', target: 100 },  // Ramp up to 100 users
    { duration: '5m', target: 100 },  // Hold at 100 users
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    'http_req_duration': ['p95<500', 'p99<1000'],  // SLO: p95 < 500ms, p99 < 1s
    'http_req_failed': ['rate<0.01'],               // < 1% error rate
    'errors': ['rate<0.01'],
  },
};

export default function () {
  const payload = JSON.stringify({
    customerId: 'perf-test-customer',
    items: [{ sku: 'PERF-SKU-001', quantity: 1 }],
  });

  const start = Date.now();
  const response = http.post(
    `${__ENV.BASE_URL}/api/v1/orders`,
    payload,
    { headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${__ENV.API_TOKEN}` } }
  );

  orderCreationDuration.add(Date.now() - start);
  errorRate.add(response.status !== 201);

  check(response, {
    'status is 201': (r) => r.status === 201,
    'response has order ID': (r) => JSON.parse(r.body).orderId !== undefined,
  });

  sleep(1);  // Think time between requests
}

Defining Performance SLOs¶

Every service handling user-facing traffic must have documented performance SLOs. Define these before the first load test so you have a target to measure against.

Recommended starting point for REST APIs:

Metric	Target	Maximum
p50 latency	< 100ms	—
p95 latency	< 500ms	—
p99 latency	< 1000ms	—
Error rate	< 0.1%	1%
Throughput	Define per-service based on usage projections	—

These are starting points. Adjust based on business requirements and user expectations for your specific service.

Baseline Establishment¶

Before any load testing provides value, you must establish a performance baseline — a documented record of the system's performance under known conditions.

Run a load test against the current system at expected production load.
Record: p50, p95, p99 latencies; error rate; throughput; resource utilisation (CPU, memory, database connections).
Commit the baseline results to the repository.
All future load tests compare against the baseline. A >20% regression in any key metric is treated as a performance bug.

Performance Testing in CI/CD¶

Performance tests run in two contexts:

Continuous Performance Smoke Tests (every deployment)¶

Run a lightweight smoke test (2–3 minutes, minimal load) on every deployment to catch catastrophic regressions:

# .github/workflows/ci.yml
performance-smoke:
  stage: post-deploy
  needs: [deploy-staging]
  script:
    - k6 run --env BASE_URL=$STAGING_URL --env API_TOKEN=$STAGING_TOKEN
        --vus 10 --duration 2m
        ./k6/smoke-test.js
  artifacts:
    paths: [k6-results/]

Full Performance Test Suite (pre-release)¶

Run the full load and stress test suite before every significant release:

performance-full:
  stage: pre-release
  when: manual  # Triggered manually before release
  script:
    - k6 run --env BASE_URL=$STAGING_URL ./k6/load-test.js
    - k6 run --env BASE_URL=$STAGING_URL ./k6/stress-test.js
  artifacts:
    reports:
      junit: k6-results/junit.xml

Interpreting Results¶

Key Metrics¶

Scenarios:   (100.00%) 1 scenario, 100 max VUs
default:     100 looping VUs for 5m0s (gracefulStop: 30s)

✓ status is 201
✓ response has order ID

checks.........................: 99.87%  ✓ 29961  ✗ 39
data_received..................: 24 MB   80 kB/s
data_sent......................: 18 MB   60 kB/s
http_req_blocked...............: avg=1.21ms  min=1µs   med=3µs   max=1.02s  p(90)=6µs   p(95)=11µs
http_req_duration..............: avg=287ms   min=12ms  med=243ms max=3.45s  p(90)=451ms p(95)=498ms ✓
  { expected_response:true }...: avg=287ms   min=12ms  med=243ms max=3.45s
http_req_failed................: 0.13%   ✓ 39     ✗ 29961

What to look for: - p(95) and p(99) — these are your SLO metrics. Are they within your thresholds? - http_req_failed — error rate. Even a small error rate under normal load is a red flag. - max latency — very high max values indicate occasional severe slowdowns (GC pauses, lock contention, cold starts). - Distribution shape — a bimodal distribution (many fast requests, some very slow) suggests resource contention.

Infrastructure Considerations¶

Performance tests run against the staging environment — never production.
The staging environment should be representative of production in terms of instance size, database size, and concurrency configuration. A staging environment that is 1/10th the size of production will produce misleading results.
Monitor server-side metrics during load tests — CPU, memory, database connection pool utilisation, GC pause frequency. The bottleneck is often on the server side, not visible in the k6 output.
Exclude test traffic from production alerting thresholds — use a dedicated load-test user or IP range.

References¶

Last reviewed: 2025-Q4 | Owner: Engineering Enablement