Glossary

Infrastructure terms, in plain English

The acronyms and jargon you will meet when running infrastructure — explained without the gatekeeping.

SLA: Service Level Agreement — a written commitment to response and uptime targets, often backed by service credits if they are missed.
Uptime: The percentage of time a service is available. "99.9%" allows roughly 8.8 hours of downtime a year; "99.99%" allows under an hour.
MTTR: Mean Time To Resolve — the average time taken to fix an incident. Lower is better.
RTO: Recovery Time Objective — the maximum acceptable time to restore a service after a failure.
RPO: Recovery Point Objective — the maximum acceptable amount of data loss, measured in time (e.g. "up to 15 minutes").
SPF: Sender Policy Framework — a DNS record that lists which servers are allowed to send email for your domain.
DKIM: DomainKeys Identified Mail — a cryptographic signature that proves an email genuinely came from your domain and was not altered.
DMARC: A policy that tells receiving mail servers what to do with email that fails SPF and DKIM — and reports who is sending as you.
Hardening: Configuring a system to reduce its attack surface — closing unused services, enforcing good defaults, and applying security baselines.
Patching: Applying updates that fix security vulnerabilities and bugs. Unpatched systems are the most common way attackers get in.
IaC: Infrastructure as Code — defining your infrastructure in version-controlled files so it is repeatable, reviewable and rebuildable.
FinOps: The practice of managing and optimising cloud spend continuously, so cost is owned rather than left to grow.
Observability: The ability to understand what a system is doing from its outputs — logs, metrics and traces — especially when something goes wrong.
Disaster recovery: The plan and capability to restore your whole service after a major failure, with defined recovery targets and tested failover.
P1–P4: Priority levels for incidents, from P1 (critical, service down) to P4 (low, a request or question). They determine response times.
On-call: An engineer available outside normal hours to respond to urgent incidents — so problems are handled when they happen, not the next morning.

Rather not have to learn all this?

That is the point of us. We handle the infrastructure so you can focus on your business.

Get a free audit