LoadForge LogoLoadForge

How to Set Performance SLOs Your Team Will Actually Use

How to Set Performance SLOs Your Team Will Actually Use

What Is a Performance SLO?

A Service Level Objective (SLO) is an internal target for the performance of your system. It is a specific, measurable goal that your team commits to meeting — not a contractual promise to customers, but an engineering standard that defines what "good enough" means.

A performance SLO looks like this: "P95 API response time under 500ms." That single statement encodes everything your team needs to know: which metric to track (P95 response time), what threshold to maintain (500ms), and the implicit commitment that the team will prioritize work to stay within that threshold.

SLOs exist because without them, performance is nobody's responsibility. Every team agrees that performance matters, but without a concrete number attached to a concrete metric, "performance matters" is an aspiration, not an objective. SLOs turn aspiration into accountability.

The key distinction is that SLOs are internal. They are goals your team sets for itself. They are stricter than what you promise customers (because you need a safety margin), and they are specific enough to be measured and acted upon. A team with well-defined SLOs can answer the question "is our system performing well right now?" with a definitive yes or no, not a shrug.

SLI vs SLO vs SLA

Three terms that are frequently confused deserve clear definitions before we go further.

TermWhat It IsExample
SLI (Service Level Indicator)The metric you measureP95 response time
SLO (Service Level Objective)The target you set for that metricP95 < 500ms, 99.9% of the time
SLA (Service Level Agreement)The contractual promise to customers99.9% uptime or service credits issued

The SLI is the raw measurement — the number that comes out of your monitoring system. The SLO is the line you draw on that measurement — the threshold that separates "acceptable" from "not acceptable." The SLA is the legal contract built on top of SLOs, with financial penalties for violations.

SLAs must always be less strict than SLOs. If your SLA promises 99.9% availability, your internal SLO should target 99.95% or higher — giving you a buffer before the contractual penalty kicks in. Teams that set their SLOs equal to their SLAs live in constant firefighting mode, because every minor incident threatens a contractual breach.

Why Most Performance SLOs Fail

Most teams that attempt to define SLOs end up with documents that are written once, reviewed never, and violated constantly. The SLOs become background noise — present in a wiki page somewhere, absent from daily engineering decisions. Here is why.

Too many SLOs. A team that defines 30 SLOs across every endpoint, every metric, and every service has effectively defined zero. Nobody can track 30 objectives simultaneously. The important ones drown in the noise. Start with three to five SLOs that cover the most critical user-facing operations.

Too abstract. An SLO like "the system should be fast" is not an SLO. Neither is "response times should be acceptable." If the SLO does not contain a specific metric, a specific threshold, and a specific measurement window, it is a wish, not an objective.

No measurement system. An SLO without automated measurement is a decoration. If checking whether you are meeting your SLO requires someone to manually pull data, run queries, and do math, it will not get checked. SLOs need real-time dashboards and automated alerting.

No consequences for missing them. If your team consistently violates its SLOs and nothing changes — no reprioritization, no dedicated performance sprint, no freeze on feature work — the SLOs are performative. They exist on paper but have no influence on decisions. The error budget model (discussed below) provides the mechanism for consequences.

Set once and forgotten. SLOs must evolve. Your system changes, your traffic patterns change, your user expectations change. An SLO set two years ago for a system that has since tripled in traffic is not meaningful. Review and adjust SLOs quarterly.

Choosing the Right SLIs

The foundation of a good SLO is choosing the right Service Level Indicator — the metric that actually reflects user experience. Not all metrics are equally useful.

Response time (P95, P99). This is the most directly user-impactful metric. Users feel latency immediately. A P95 response time SLI tells you what experience 95% of your users are having. For detailed guidance on why percentiles matter more than averages, see our guide on response time percentiles explained.

Availability. The percentage of requests that return a successful response (non-5xx). This measures whether your system is functional at all. An availability SLI of 99.95% means that out of every 10,000 requests, no more than 5 may fail.

Throughput. Requests per second that your system can handle. This is most relevant as an SLI for systems with known capacity requirements — APIs with rate-limited consumers, data pipelines with ingestion targets.

Error rate. The percentage of requests that return 5xx errors. Distinct from availability (which also counts timeouts and connection failures), error rate focuses specifically on server-side failures.

Not every SLI is appropriate for every type of service. The following table provides guidance:

Service TypePrimary SLIsSecondary SLIs
Public APIResponse time (P95, P99), availabilityError rate, throughput
Web applicationPage load time (LCP), availabilityError rate, INP
Background jobsProcessing time (P95), failure rateQueue depth, throughput
Real-time featuresMessage delivery latency (P95), connection success rateMessage loss rate

Choose two to three SLIs per service. More than that diffuses focus.

Setting Realistic Thresholds

Having the right SLIs is necessary but not sufficient. The threshold — the number you attach to the SLI — determines whether the SLO is useful or ignored.

Step 1: Baseline Current Performance

You cannot set meaningful targets without knowing where you stand today. Run load tests with LoadForge to measure your current P50, P95, and P99 response times under realistic traffic volumes. If your current P95 is 800ms, setting an SLO of P95 < 200ms is aspirational but not immediately achievable. Setting it at P95 < 1000ms gives you headroom while you work toward improvement.

Baselines must be measured under load, not in isolation. A single-user response time is irrelevant for setting SLOs. Your SLO needs to reflect performance at the concurrency levels you actually experience. For guidance on load testing methodology, see what is load testing.

Step 2: Understand User Expectations

Research on user perception of latency provides concrete guidance for threshold selection:

Response TimeUser Perception
< 200msFeels instantaneous
200-500msNoticeable but comfortable
500ms-1sPerceptible delay, still acceptable for most actions
1-3sUser is consciously waiting
> 3sFrustrating; significant abandonment risk

Your SLO threshold should aim for the experience you want to deliver. For an API that powers real-time interactions, P95 < 200ms is appropriate. For a reporting dashboard that users expect to take a moment, P95 < 2s might be perfectly reasonable.

Step 3: Set Achievable Targets

Set your initial SLO threshold slightly above your current measured performance — close enough to be meaningful, loose enough to not trigger constant violations. If your current P95 is 400ms under peak load, start with P95 < 600ms. This gives you a 50% buffer while establishing the measurement and alerting infrastructure.

Too-ambitious SLOs are worse than no SLOs. When alerts fire constantly, teams develop alert fatigue and start ignoring them. A consistently green SLO dashboard with occasional meaningful violations is far more actionable than a permanently red one.

Step 4: Define the Window

An SLO without a measurement window is incomplete. "P95 < 500ms" measured over what period? The last minute? The last hour? The last 30 days?

The standard approach is a rolling window — typically 30 days. "P95 < 500ms, measured over a 30-day rolling window" means that across all requests in the past 30 days, the 95th percentile response time must be below 500ms. This smooths out brief spikes (a 2-minute blip during a deployment) while catching sustained degradation.

Shorter windows (1 hour, 1 day) are appropriate for alerting. Longer windows (30 days) are appropriate for error budget calculations and team-level accountability.

Step 5: Set Error Budgets

The error budget is the mechanism that gives SLOs teeth. If your SLO is "99.9% of requests complete under 500ms in a 30-day window," then you have a 0.1% error budget. Out of roughly 1 million requests in a month, 1,000 can exceed 500ms before you are in violation.

Convert that to time: 0.1% of 30 days is approximately 43 minutes of total violation time per month. That is your budget. You can spend it on deployments that briefly degrade performance, on maintenance windows, or on unexpected incidents. But when the budget is consumed, the team must respond.

The standard response to a consumed error budget is a deployment freeze — no new features are shipped until the team has restored performance and recovered budget. This creates a natural feedback loop: every deployment that degrades performance spends error budget, which eventually blocks future deployments, which forces the team to invest in reliability. The error budget turns the SLO from a passive metric into an active prioritization mechanism.

SLO Examples by Service Type

Concrete examples help teams calibrate their own SLOs against industry norms.

Public API:

  • P95 response time < 300ms
  • P99 response time < 1s
  • Error rate < 0.1%
  • Availability > 99.95%

Web application:

  • Largest Contentful Paint (LCP) < 2.5s
  • P95 full page load < 3s
  • Error rate < 0.5%
  • Availability > 99.9%

Background jobs:

  • P95 processing time < 30s
  • Failure rate < 1%
  • Queue depth < 1,000 items
  • Time from enqueue to completion (P95) < 60s

Real-time features (chat, notifications, live updates):

  • P95 message delivery latency < 100ms
  • Connection success rate > 99.9%
  • Message loss rate < 0.01%
  • Reconnection time (P95) < 2s

These are starting points, not universal standards. Your appropriate thresholds depend on your specific user expectations, traffic patterns, and technical constraints. The important thing is that they are concrete, measurable, and tied to real user experience.

Validating SLOs with Load Testing

SLOs defined in a document are hypothetical. SLOs validated under load are real. Use LoadForge to regularly verify that your system can meet its SLOs under expected — and beyond expected — traffic volumes.

Schedule a weekly load test that simulates your peak traffic pattern. After each test, compare the results against your SLO thresholds. Did P95 stay under 500ms? Did error rate stay under 0.1%? If so, you have confidence that your SLO is achievable under real conditions. If not, you have caught a regression before your users did.

The load test should simulate peak traffic, not average traffic. Your SLO must hold under the hardest conditions your system regularly faces. If it only holds at 2 AM on a Tuesday, it is not a meaningful SLO.

Go further: run stress tests that exceed your expected peak. If your SLO holds at peak traffic but fails at 1.5x peak, you know your safety margin. If a viral event or marketing campaign drives traffic to 1.5x, you know exactly when you will violate your SLO and can prepare accordingly.

Building SLOs into Your Workflow

SLOs that live in a wiki page and are reviewed quarterly are SLOs that will be ignored. To be effective, SLOs must be integrated into the daily engineering workflow.

Add SLO checks to CI/CD. Run a load test as part of your deployment pipeline and gate deployments on SLO compliance. If the new code causes P95 to exceed the SLO threshold in a staging load test, the deployment is blocked automatically. This catches performance regressions before they reach production. For implementation details, see our guide on load testing in CI/CD.

Create real-time dashboards. Build a dashboard that shows current SLO status for each objective — green when compliant, yellow when approaching the budget limit, red when in violation. Make this dashboard visible: on a team monitor, in the Slack channel, on the engineering home page. Visibility creates awareness, and awareness drives action.

Run error budget reviews in sprint retrospectives. At the end of each sprint, review how much error budget was consumed, what consumed it (deployments, incidents, infrastructure issues), and whether corrective action is needed. This makes SLO compliance a regular topic of conversation, not an afterthought.

Assign SLO ownership to specific teams. Each SLO should have a named team responsible for meeting it. When a cross-cutting SLO is violated, the owning team is responsible for coordinating the response. Without ownership, violations become everybody's problem and therefore nobody's problem.

When to Tighten or Loosen SLOs

SLOs are not permanent. They should evolve as your system, your traffic, and your users evolve. The question is when and in which direction.

Tighten your SLOs when:

  • You are consistently meeting your current SLO with significant margin. If your SLO is P95 < 500ms and your actual P95 has been under 200ms for six months, the SLO is too loose to be meaningful. Tighten it to P95 < 300ms to keep it relevant.
  • Competitors are faster. If your market has shifted and users now expect sub-200ms responses because your competitor delivers them, your SLO needs to reflect the new reality.
  • User expectations have risen. Mobile-first users, real-time collaboration features, and the general acceleration of web performance norms all push expectations tighter over time.

Loosen your SLOs when:

  • Your team is in constant budget depletion. If every sprint involves a firefight to stay within SLO, and the team is spending all its time on performance instead of building features, the SLO may be unrealistically tight. A good SLO is achievable with disciplined engineering, not heroic effort.
  • The SLO does not reflect user reality. If user satisfaction surveys and retention metrics are healthy despite occasional SLO violations, the threshold may be stricter than necessary.
  • Architecture constraints make the current SLO impractical. A migration to a new database, a shift to a different cloud region, or a fundamental architecture change may temporarily require looser SLOs while the new system is optimized.

When adjusting SLOs, change the threshold incrementally. A jump from P95 < 500ms to P95 < 200ms is a complete overhaul, not an adjustment. Move to P95 < 400ms, stabilize, then to P95 < 300ms. Each step should be achievable within a reasonable timeframe — one to two quarters.

Conclusion

Performance SLOs work when they are specific, measured, integrated into workflows, and backed by error budgets that create real consequences for violations. They fail when they are abstract, unmonitored, or too numerous to focus on.

Start with three to five SLOs covering your most critical user-facing operations. Baseline your current performance with load testing. Set achievable thresholds with room to tighten over time. Build automated measurement, dashboards, and CI/CD gates. Review error budgets regularly and use them to prioritize reliability work.

The result is a team that can answer "is our system performing well?" with data, not opinions — and that has the mechanisms in place to keep it that way.

For comprehensive performance testing methodology, see our performance testing guide. For understanding the percentile metrics that underpin most SLOs, see response time percentiles explained.

Try LoadForge free for 7 days

Set up your first load test in under 2 minutes. No commitment.