Performance & Load Testing¶
Status: ๐ข Active | Owner: Engineering Enablement | Last Reviewed: 2025-Q4
Overview¶
Performance testing verifies that a system meets its latency, throughput, and reliability requirements under expected and peak load conditions. It must be an explicit part of the development and release process for any service that handles user-facing traffic or participates in a critical business flow.
Performance testing is not a one-time pre-launch activity. Establish baselines early, run tests on every significant change, and treat performance regressions with the same urgency as functional bugs.
Performance Test Types¶
| Type | Purpose | When to Run |
|---|---|---|
| Load test | Verify behaviour under expected normal load | Before every release |
| Stress test | Find the breaking point; verify graceful degradation | Quarterly or before peak traffic periods |
| Spike test | Verify behaviour under sudden traffic bursts | For services exposed to viral/marketing events |
| Soak test | Identify memory leaks and degradation over time | Monthly or after major dependency changes |
| Smoke test | Quick sanity check โ minimal load, verify baseline | On every deployment |
Approved Tool: k6¶
k6 is the enterprise standard for performance and load testing. It uses JavaScript for test scripting, has excellent CI/CD integration, and produces structured output that integrates with Grafana for visualisation.
// k6 โ basic load test
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
const errorRate = new Rate('errors');
const orderCreationDuration = new Trend('order_creation_duration');
export const options = {
stages: [
{ duration: '2m', target: 50 }, // Ramp up to 50 users over 2 minutes
{ duration: '5m', target: 50 }, // Hold at 50 users for 5 minutes
{ duration: '2m', target: 100 }, // Ramp up to 100 users
{ duration: '5m', target: 100 }, // Hold at 100 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
'http_req_duration': ['p95<500', 'p99<1000'], // SLO: p95 < 500ms, p99 < 1s
'http_req_failed': ['rate<0.01'], // < 1% error rate
'errors': ['rate<0.01'],
},
};
export default function () {
const payload = JSON.stringify({
customerId: 'perf-test-customer',
items: [{ sku: 'PERF-SKU-001', quantity: 1 }],
});
const start = Date.now();
const response = http.post(
`${__ENV.BASE_URL}/api/v1/orders`,
payload,
{ headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${__ENV.API_TOKEN}` } }
);
orderCreationDuration.add(Date.now() - start);
errorRate.add(response.status !== 201);
check(response, {
'status is 201': (r) => r.status === 201,
'response has order ID': (r) => JSON.parse(r.body).orderId !== undefined,
});
sleep(1); // Think time between requests
}
Defining Performance SLOs¶
Every service handling user-facing traffic must have documented performance SLOs. Define these before the first load test so you have a target to measure against.
Recommended starting point for REST APIs:
| Metric | Target | Maximum |
|---|---|---|
| p50 latency | < 100ms | โ |
| p95 latency | < 500ms | โ |
| p99 latency | < 1000ms | โ |
| Error rate | < 0.1% | 1% |
| Throughput | Define per-service based on usage projections | โ |
These are starting points. Adjust based on business requirements and user expectations for your specific service.
Baseline Establishment¶
Before any load testing provides value, you must establish a performance baseline โ a documented record of the system's performance under known conditions.
- Run a load test against the current system at expected production load.
- Record: p50, p95, p99 latencies; error rate; throughput; resource utilisation (CPU, memory, database connections).
- Commit the baseline results to the repository.
- All future load tests compare against the baseline. A >20% regression in any key metric is treated as a performance bug.
Performance Testing in CI/CD¶
Performance tests run in two contexts:
Continuous Performance Smoke Tests (every deployment)¶
Run a lightweight smoke test (2โ3 minutes, minimal load) on every deployment to catch catastrophic regressions:
# .github/workflows/ci.yml
performance-smoke:
stage: post-deploy
needs: [deploy-staging]
script:
- k6 run --env BASE_URL=$STAGING_URL --env API_TOKEN=$STAGING_TOKEN
--vus 10 --duration 2m
./k6/smoke-test.js
artifacts:
paths: [k6-results/]
Full Performance Test Suite (pre-release)¶
Run the full load and stress test suite before every significant release:
performance-full:
stage: pre-release
when: manual # Triggered manually before release
script:
- k6 run --env BASE_URL=$STAGING_URL ./k6/load-test.js
- k6 run --env BASE_URL=$STAGING_URL ./k6/stress-test.js
artifacts:
reports:
junit: k6-results/junit.xml
Interpreting Results¶
Key Metrics¶
Scenarios: (100.00%) 1 scenario, 100 max VUs
default: 100 looping VUs for 5m0s (gracefulStop: 30s)
โ status is 201
โ response has order ID
checks.........................: 99.87% โ 29961 โ 39
data_received..................: 24 MB 80 kB/s
data_sent......................: 18 MB 60 kB/s
http_req_blocked...............: avg=1.21ms min=1ยตs med=3ยตs max=1.02s p(90)=6ยตs p(95)=11ยตs
http_req_duration..............: avg=287ms min=12ms med=243ms max=3.45s p(90)=451ms p(95)=498ms โ
{ expected_response:true }...: avg=287ms min=12ms med=243ms max=3.45s
http_req_failed................: 0.13% โ 39 โ 29961
What to look for: - p(95) and p(99) โ these are your SLO metrics. Are they within your thresholds? - http_req_failed โ error rate. Even a small error rate under normal load is a red flag. - max latency โ very high max values indicate occasional severe slowdowns (GC pauses, lock contention, cold starts). - Distribution shape โ a bimodal distribution (many fast requests, some very slow) suggests resource contention.
Infrastructure Considerations¶
- Performance tests run against the staging environment โ never production.
- The staging environment should be representative of production in terms of instance size, database size, and concurrency configuration. A staging environment that is 1/10th the size of production will produce misleading results.
- Monitor server-side metrics during load tests โ CPU, memory, database connection pool utilisation, GC pause frequency. The bottleneck is often on the server side, not visible in the k6 output.
- Exclude test traffic from production alerting thresholds โ use a dedicated load-test user or IP range.
References¶
Last reviewed: 2025-Q4 | Owner: Engineering Enablement