
Introduction
Modern teams rely on CI/CD pipelines to catch problems before they reach production. Unit tests, integration tests, linting, and security scans are now standard—but performance regressions often still slip through because load testing happens manually or too late in the release process.
That’s where automated load testing in CI/CD becomes powerful. If a pull request makes an API 40% slower, increases error rates, or causes timeouts under concurrency, your pipeline should be able to detect it and fail the build automatically. This guide shows you how to use LoadForge to set pass/fail thresholds and stop deployments when performance degrades.
In this guide, you’ll learn how to build realistic Locust-based load tests for CI/CD workflows, define performance expectations, and use LoadForge as a gate in your delivery pipeline. We’ll cover practical examples including authenticated API testing, regression checks for critical endpoints, and environment-specific performance validation.
If you want to implement performance testing, stress testing, and load testing as part of your DevOps workflow, this is the pattern to follow.
Prerequisites
Before you start, make sure you have:
- A LoadForge account
- A web application or API deployed in a test, staging, or ephemeral CI environment
- Access to the application’s authentication method, such as:
- Bearer tokens
- OAuth client credentials
- Session login endpoints
- A CI/CD platform such as:
- GitHub Actions
- GitLab CI
- Jenkins
- CircleCI
- A set of performance expectations for your application, such as:
- 95th percentile response time under 500ms
- Error rate below 1%
- Specific endpoint latency thresholds
- Basic familiarity with Python and Locust
You’ll also want to identify your most important user journeys. For CI/CD load testing, focus on a small number of critical flows that are likely to regress:
- User login
- Product search
- Checkout or order creation
- Dashboard or reporting APIs
- File upload or export endpoints
- Internal service-to-service APIs
LoadForge is especially useful here because it gives you cloud-based infrastructure, distributed testing, global test locations, real-time reporting, and CI/CD integration, making it practical to run automated performance checks on every release candidate.
Understanding CI/CD & DevOps Under Load
When teams talk about load testing in DevOps, they usually mean one of two things:
- Running performance tests as part of the pipeline
- Using performance results to make deployment decisions
The second part is what turns load testing into a real quality gate.
Why performance regressions happen in CI/CD
Even small changes can introduce measurable slowdowns:
- A new database join adds 100ms to every request
- An external API call is now made synchronously
- Caching headers are removed accidentally
- A search query becomes unindexed
- Authentication middleware performs extra lookups
- Serialization logic increases CPU usage
These changes may not break functionality, so traditional tests pass. But under load, they can create queueing, timeouts, and cascading failures.
Common bottlenecks you’ll catch with automated load testing
In CI/CD and DevOps workflows, performance testing often reveals:
- Slow application startup after deployment
- Database contention in newly added endpoints
- CPU spikes from expensive business logic
- Memory pressure causing increased response times
- Authentication bottlenecks under concurrent login bursts
- Rate limiting or upstream dependency saturation
- Misconfigured autoscaling thresholds
What makes CI-based load testing different?
A pipeline load test is usually:
- Shorter than a full-scale performance test
- Focused on critical endpoints
- Threshold-driven
- Repeatable across commits
- Designed to detect regressions rather than find absolute max capacity
For example, your nightly stress testing may run for 30 minutes across multiple regions, while your CI load testing job may run for 3–5 minutes with a smaller user count and strict pass/fail criteria.
Writing Your First Load Test
Let’s start with a realistic regression test for a common SaaS API. Imagine your CI pipeline deploys a staging build of an application with these endpoints:
POST /api/v1/auth/loginGET /api/v1/projectsGET /api/v1/projects/{id}/buildsPOST /api/v1/builds/{id}/retry
This first Locust script logs in, fetches projects, inspects recent builds, and retries a build occasionally. These are realistic actions for a DevOps platform or internal CI dashboard.
from locust import HttpUser, task, between
import random
class DevOpsApiUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
response = self.client.post(
"/api/v1/auth/login",
json={
"email": "ci-bot@example.com",
"password": "SuperSecurePassword123!"
},
headers={"Content-Type": "application/json"},
name="POST /api/v1/auth/login"
)
if response.status_code == 200:
data = response.json()
self.token = data["access_token"]
self.headers = {
"Authorization": f"Bearer {self.token}",
"Accept": "application/json"
}
else:
self.token = None
self.headers = {}
@task(3)
def list_projects(self):
self.client.get(
"/api/v1/projects?limit=20&sort=updated_at:desc",
headers=self.headers,
name="GET /api/v1/projects"
)
@task(2)
def list_project_builds(self):
project_id = random.choice([101, 102, 103, 104])
self.client.get(
f"/api/v1/projects/{project_id}/builds?status=failed&limit=10",
headers=self.headers,
name="GET /api/v1/projects/:id/builds"
)
@task(1)
def retry_build(self):
build_id = random.choice([9001, 9002, 9003])
self.client.post(
f"/api/v1/builds/{build_id}/retry",
json={
"reason": "Regression validation from CI pipeline",
"triggered_by": "loadforge-ci-check"
},
headers={**self.headers, "Content-Type": "application/json"},
name="POST /api/v1/builds/:id/retry"
)Why this script works well in CI/CD
This test is a good first step because it:
- Authenticates the same way a real client would
- Exercises business-critical API paths
- Uses realistic payloads
- Mixes read-heavy and write-heavy operations
- Produces endpoint-level metrics you can threshold in LoadForge
How to use this in a CI pipeline
In LoadForge, you would configure this test to run against your staging or preview environment after deployment. Then set pass/fail conditions such as:
- Overall error rate less than 1%
POST /api/v1/auth/loginp95 less than 800msGET /api/v1/projectsp95 less than 500msGET /api/v1/projects/:id/buildsp95 less than 700ms
If any threshold is exceeded, the test fails and your CI build can be marked as failed.
Advanced Load Testing Scenarios
Once you have a basic smoke-level performance gate, you can add more realistic regression scenarios.
Scenario 1: Authenticated pipeline health checks with token refresh
Many CI/CD systems use short-lived access tokens. If your application refreshes tokens during active sessions, you should test that flow too.
This example simulates a user session that logs in, refreshes the token, and accesses deployment endpoints.
from locust import HttpUser, task, between
class DeploymentApiUser(HttpUser):
wait_time = between(1, 2)
def on_start(self):
self.login()
def login(self):
response = self.client.post(
"/api/v1/auth/login",
json={
"email": "release-manager@example.com",
"password": "ReleasePassword456!"
},
headers={"Content-Type": "application/json"},
name="POST /api/v1/auth/login"
)
response.raise_for_status()
data = response.json()
self.access_token = data["access_token"]
self.refresh_token = data["refresh_token"]
self.headers = {
"Authorization": f"Bearer {self.access_token}",
"Accept": "application/json"
}
@task(4)
def list_deployments(self):
self.client.get(
"/api/v1/deployments?environment=staging&status=in_progress",
headers=self.headers,
name="GET /api/v1/deployments"
)
@task(2)
def get_deployment_details(self):
deployment_id = 55021
self.client.get(
f"/api/v1/deployments/{deployment_id}",
headers=self.headers,
name="GET /api/v1/deployments/:id"
)
@task(1)
def refresh_session(self):
response = self.client.post(
"/api/v1/auth/refresh",
json={"refresh_token": self.refresh_token},
headers={"Content-Type": "application/json"},
name="POST /api/v1/auth/refresh"
)
if response.status_code == 200:
data = response.json()
self.access_token = data["access_token"]
self.headers["Authorization"] = f"Bearer {self.access_token}"This is useful when performance regressions affect session handling, auth middleware, or token storage systems like Redis or database-backed session tables.
Scenario 2: Regression testing a build trigger and artifact workflow
A common DevOps use case is validating build and artifact endpoints. These APIs often become slower over time because they touch databases, queues, object storage, and audit logging systems.
from locust import HttpUser, task, between
import random
import uuid
class BuildPipelineUser(HttpUser):
wait_time = between(2, 4)
def on_start(self):
response = self.client.post(
"/api/v1/auth/token",
json={
"client_id": "ci-runner",
"client_secret": "ci-runner-secret",
"grant_type": "client_credentials",
"scope": "builds:read builds:write artifacts:read"
},
headers={"Content-Type": "application/json"},
name="POST /api/v1/auth/token"
)
response.raise_for_status()
token = response.json()["access_token"]
self.headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json"
}
@task(3)
def trigger_build(self):
branch = random.choice(["main", "develop", "release/2026.04"])
payload = {
"project_id": 2001,
"branch": branch,
"commit_sha": str(uuid.uuid4()).replace("-", "")[:12],
"pipeline_source": "ci_regression_test",
"variables": {
"RUN_E2E": "false",
"RUN_PERF_SMOKE": "true"
}
}
self.client.post(
"/api/v1/builds",
json=payload,
headers=self.headers,
name="POST /api/v1/builds"
)
@task(2)
def get_build_status(self):
build_id = random.choice([78110, 78111, 78112, 78113])
self.client.get(
f"/api/v1/builds/{build_id}/status",
headers=self.headers,
name="GET /api/v1/builds/:id/status"
)
@task(1)
def list_artifacts(self):
build_id = random.choice([78110, 78111, 78112])
self.client.get(
f"/api/v1/builds/{build_id}/artifacts?type=report",
headers=self.headers,
name="GET /api/v1/builds/:id/artifacts"
)This script is ideal for catching regressions in:
- Build creation latency
- Queue-backed status lookups
- Artifact metadata retrieval
- Auth and permission checks
- Audit/event publishing overhead
Scenario 3: Testing a database-heavy reporting endpoint in CI
Reporting endpoints are frequent regression hotspots because they aggregate data across builds, deployments, and environments. These APIs may still return 200 responses while becoming unacceptably slow.
from locust import HttpUser, task, between
class ReportingUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
response = self.client.post(
"/api/v1/session/login",
json={
"username": "analytics-bot",
"password": "AnalyticsPassword789!"
},
headers={"Content-Type": "application/json"},
name="POST /api/v1/session/login"
)
response.raise_for_status()
session_cookie = response.cookies.get("session_id")
self.headers = {
"Accept": "application/json",
"X-Requested-With": "XMLHttpRequest"
}
self.cookies = {"session_id": session_cookie}
@task(3)
def deployment_frequency_report(self):
self.client.get(
"/api/v1/reports/deployment-frequency?team=platform&window=30d",
headers=self.headers,
cookies=self.cookies,
name="GET /api/v1/reports/deployment-frequency"
)
@task(2)
def change_failure_rate_report(self):
self.client.get(
"/api/v1/reports/change-failure-rate?service=payments&window=90d",
headers=self.headers,
cookies=self.cookies,
name="GET /api/v1/reports/change-failure-rate"
)
@task(1)
def lead_time_report(self):
self.client.get(
"/api/v1/reports/lead-time?repository=checkout-service&branch=main",
headers=self.headers,
cookies=self.cookies,
name="GET /api/v1/reports/lead-time"
)This scenario helps identify:
- Slow SQL queries
- Missing indexes
- Expensive report generation logic
- Cache misses
- N+1 query issues
- Data warehouse or analytics backend latency
These database-heavy endpoints are perfect candidates for threshold-based build failure because they often degrade gradually over time.
Analyzing Your Results
After running your load test in LoadForge, the next step is deciding whether the build should pass or fail.
Key metrics to watch
For CI/CD performance testing, focus on these metrics:
- Response time percentiles:
- Median can hide problems
- p95 and p99 are better indicators of user experience under load
- Error rate:
- Watch both HTTP failures and application-level failures
- Requests per second:
- Useful for checking throughput consistency
- Endpoint-specific latency:
- Critical for identifying which route regressed
- Response distribution over time:
- Reveals warm-up issues, memory leaks, or resource exhaustion
Good threshold examples
A practical set of pass/fail conditions might look like:
- Total error rate < 1%
POST /api/v1/auth/loginp95 < 800msGET /api/v1/projectsp95 < 500msPOST /api/v1/buildsp95 < 1200msGET /api/v1/reports/deployment-frequencyp95 < 1500ms
You can also set stricter rules for core business flows and looser ones for secondary endpoints.
Comparing results against previous runs
The real value of CI/CD load testing is regression detection. A single test result matters less than the trend.
For example:
- Last week:
GET /api/v1/builds/:id/statusp95 = 220ms - Current branch: p95 = 680ms
Even if 680ms is technically “acceptable,” that jump may indicate a serious regression. LoadForge’s real-time reporting and historical test visibility make these changes easier to spot before they hit production.
Using LoadForge in CI/CD gates
A typical workflow looks like this:
- Deploy application to staging or preview environment
- Trigger LoadForge test via CI job
- Wait for test completion
- Check pass/fail status from LoadForge thresholds
- Fail the pipeline if thresholds are breached
Here is an example GitHub Actions step pattern for a CI gate:
name: performance-regression-check
on:
pull_request:
branches: [main]
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- name: Deploy preview environment
run: ./scripts/deploy-preview.sh
- name: Trigger LoadForge test
run: |
curl -X POST "https://api.loadforge.com/v1/tests/123456/run" \
-H "Authorization: Bearer ${{ secrets.LOADFORGE_API_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{
"environment_url": "https://pr-482.staging.example.com",
"notes": "PR #482 performance regression check"
}'
- name: Poll for result
run: ./scripts/check-loadforge-result.shAnd an example shell script that exits non-zero if the test failed:
#!/usr/bin/env bash
set -euo pipefail
TEST_RUN_ID="$1"
API_TOKEN="$LOADFORGE_API_TOKEN"
while true; do
RESPONSE=$(curl -s "https://api.loadforge.com/v1/test-runs/${TEST_RUN_ID}" \
-H "Authorization: Bearer ${API_TOKEN}")
STATUS=$(echo "$RESPONSE" | jq -r '.status')
RESULT=$(echo "$RESPONSE" | jq -r '.result')
if [[ "$STATUS" == "completed" ]]; then
if [[ "$RESULT" == "passed" ]]; then
echo "Load test passed"
exit 0
else
echo "Load test failed"
exit 1
fi
fi
echo "Waiting for test completion..."
sleep 15
doneThe exact API details may vary based on your LoadForge setup, but the pattern remains the same: trigger, poll, evaluate, fail the build if performance regressed.
Performance Optimization Tips
When your CI build fails due to load test thresholds, use that as a signal to investigate systematically.
Optimize the slowest endpoints first
Look at the endpoint-level breakdown in LoadForge and prioritize:
- Highest p95 latency
- Largest regression from baseline
- Highest error-producing routes
Review database access patterns
For database-heavy APIs:
- Add indexes for new filters or joins
- Reduce query count per request
- Avoid N+1 ORM behavior
- Cache frequently requested aggregates
- Paginate large result sets
Improve authentication performance
If login or token refresh is slow:
- Cache user/session lookups
- Reduce repeated permission checks
- Optimize JWT validation or key fetching
- Move expensive auth hooks off the request path where possible
Reduce payload and serialization overhead
Large JSON responses can become a hidden bottleneck:
- Return only fields needed by the client
- Compress responses
- Avoid deeply nested objects
- Stream large exports instead of generating them synchronously
Use environment-appropriate test sizes
Your CI performance test should be small enough to run quickly, but large enough to expose regressions. Then use broader LoadForge distributed testing in nightly or pre-release stages for deeper stress testing across cloud-based infrastructure and global test locations.
Common Pitfalls to Avoid
Testing too many endpoints in CI
Your pipeline should not run a full-scale performance testing suite on every commit. Keep CI tests focused on critical paths and known regression hotspots.
Using unrealistic user behavior
If your load test only hammers one endpoint with no auth, no session handling, and no realistic pacing, the results may not reflect real application behavior. Use actual login flows, realistic headers, and production-like payloads.
Ignoring percentiles
Average response time is not enough. A build can have a good average and still produce terrible tail latency. Always track p95 and p99.
Running against unstable environments
If your staging environment is noisy, underpowered, or shared with unrelated work, your CI load testing results may be inconsistent. Try to test against predictable environments or ephemeral deployments where possible.
Failing builds on overly aggressive thresholds
If your thresholds are too strict, teams will start ignoring failures. Start with realistic baselines and tighten over time.
Not separating smoke, load, and stress testing
These are related but different:
- Smoke performance test: quick regression check in CI
- Load testing: validate expected concurrency
- Stress testing: push beyond expected capacity
Use CI for regression-oriented checks and reserve larger tests for scheduled or pre-release pipelines.
Forgetting warm-up effects
Some applications perform poorly just after deployment due to cold caches, JIT compilation, or lazy initialization. Decide whether your threshold should include or exclude warm-up behavior based on your production expectations.
Conclusion
Automating load testing in CI/CD is one of the most effective ways to prevent performance regressions from reaching production. By defining pass/fail thresholds in LoadForge, you can turn performance testing into a real deployment gate instead of a manual afterthought.
Start with a focused Locust script that covers your most important endpoints, add realistic authentication and payloads, and define thresholds around p95 latency and error rate. As your process matures, expand into more advanced scenarios like token refresh, build orchestration, and reporting APIs. With LoadForge, you can run these tests on cloud-based infrastructure, scale them with distributed testing, view real-time reporting, and integrate them directly into your CI/CD pipeline.
If you’re ready to make load testing a standard part of your DevOps workflow, try LoadForge and start failing builds before performance regressions reach your users.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

Pre-Deployment Load Testing in CI/CD Pipelines
Run pre-deployment load tests in CI/CD with LoadForge to catch scalability issues before release.

ArgoCD Load Testing for Progressive Delivery
Combine ArgoCD and LoadForge to validate app performance during progressive delivery and Kubernetes rollouts.

How to Automate Load Testing in CircleCI
Use LoadForge with CircleCI to automate load testing in CI/CD and detect bottlenecks before production.