Skip to content

Cloud & Infrastructure

Status: 🟢 Active  |  Owner: Platform Engineering  |  Last Reviewed: 2025-Q4


Introduction

Cloud infrastructure is the foundation on which all our services run. The decisions made about cloud providers, infrastructure automation, container orchestration, and networking directly affect the availability, security, performance, and cost of every system in the organisation. Poor infrastructure practices are often invisible until they become crises — a misconfigured network that enables lateral movement during a security incident, an untagged resource that makes cost attribution impossible, an infrastructure change applied manually that cannot be reproduced or rolled back.

This section documents the standards that govern how infrastructure is provisioned, configured, secured, and managed across the organisation. These standards apply equally to platform engineers building shared infrastructure and to application engineers provisioning infrastructure for their own services.


Intent

Infrastructure as Code as a Non-Negotiable

Every piece of infrastructure must be defined as code, stored in version control, reviewed via pull request, and applied via CI/CD. Manual changes to production infrastructure are prohibited. This is not merely a best practice recommendation — it is an organisational policy with enforcement teeth. Infrastructure-as-code is the prerequisite for auditability, reproducibility, disaster recovery, and effective change management.

The approved IaC toolchain is Terraform (primary) and Pulumi (approved for teams with strong programming language preferences). Module structures, state management patterns, and variable conventions are standardised to ensure that infrastructure code written by one team can be read, reviewed, and operated by another.

Cloud Provider Strategy

We operate in a multi-cloud environment with a defined primary cloud provider for most workloads. The multi-cloud strategy is driven by specific technical requirements (edge compute, ML training hardware, regulated data residency) and commercial risk management — not by the desire to use every cloud feature available. This section documents which workloads belong on which cloud, and how cross-cloud connectivity and governance are managed.

Container Standards as a Common Abstraction

Kubernetes is the standard deployment target for all new services. Container and orchestration standards ensure that every team produces container images that are secure (no root, minimal base images, signed), consistent (standard labelling, health checks, resource limits), and operable (standard ingress patterns, config injection, graceful shutdown).

Cost as an Engineering Responsibility

Cloud cost is an engineering responsibility, not just a finance concern. FinOps practices are embedded into engineering workflows through mandatory resource tagging (required for cost attribution), rightsizing expectations (enforced through policy), and regular cost review cadence. Engineers are expected to understand the cost implications of their architectural decisions.


What You Will Find Here

Page Intent
Cloud Provider Standards Approved providers, primary/secondary designation, multi-cloud governance
Infrastructure as Code Terraform/Pulumi standards, module structure, state management
Container & Orchestration Standards Docker image standards, Kubernetes configuration, Helm guidelines
Networking & Connectivity Network segmentation, DNS, service mesh, ingress standards
Cost Optimization & FinOps Tagging standards, rightsizing, cost review cadence

Last reviewed: 2025-Q4  |  Owner: Platform Engineering