
Introduction
Continuous load testing in CI/CD is one of the most effective ways to catch performance regressions before they reach production. Teams often invest heavily in unit tests, integration tests, and security scans, but performance testing is still treated as a last-minute exercise. That approach is risky. A small code change, a new database query, or a third-party integration can quietly introduce latency, reduce throughput, or increase error rates.
By integrating load testing into your CI/CD pipeline with LoadForge, you can make performance testing a routine part of delivery. Instead of running occasional manual tests, you can automatically validate application behavior on every pull request, nightly build, or pre-release deployment. This helps engineering teams detect bottlenecks early, enforce service-level objectives, and prevent unpleasant surprises in production.
Because LoadForge is cloud-based and built on Locust, it gives you a practical way to run distributed load testing at scale while keeping test logic in Python. You can run lightweight smoke performance checks on every commit and more aggressive stress testing on a schedule, all with real-time reporting, global test locations, and CI/CD integration.
In this guide, you’ll learn how to design realistic continuous load testing workflows in CI/CD, write Locust-based scripts for common scenarios, and analyze the results to improve delivery confidence.
Prerequisites
Before you build a continuous load testing workflow with LoadForge, make sure you have the following:
- A LoadForge account
- A web application or API deployed in an environment your CI/CD system can reach
- A CI/CD platform such as GitHub Actions, GitLab CI, Jenkins, Azure DevOps, or CircleCI
- API authentication details for your application, such as:
- OAuth2 bearer tokens
- API keys
- Session-based login credentials
- A clear set of performance goals, such as:
- 95th percentile response time under 500ms
- Error rate below 1%
- Support for 200 concurrent users
- Test data or seeded environments for repeatable runs
It also helps to define different performance testing stages:
- Commit or pull request checks: short, low-impact load tests
- Pre-merge or staging validation: moderate load with realistic workflows
- Nightly or scheduled jobs: broader load testing and stress testing
- Pre-release gates: full validation before production deployment
Understanding CI/CD & DevOps Under Load
When you introduce load testing into CI/CD, you are not just testing the application. You are testing the entire delivery process’s ability to surface performance issues quickly and consistently.
Why continuous load testing matters
Traditional performance testing often happens too late. By the time a team runs a manual test, the code has already been merged, released, or tightly coupled with other changes. Continuous load testing shifts performance testing left by embedding it in automated workflows.
This helps teams answer questions like:
- Did this pull request slow down the checkout API?
- Did a new ORM query increase response times on product search?
- Does the latest build still meet latency targets under moderate traffic?
- Has a dependency update increased error rates under concurrency?
Common bottlenecks caught in CI/CD
Continuous load testing is especially effective at exposing:
- Slow database queries introduced by new code
- N+1 query issues in API endpoints
- Inefficient authentication or session handling
- Cache misses or broken cache invalidation
- Memory or CPU regressions in containerized services
- Rate limiting or misconfigured upstream services
- Resource starvation in staging or ephemeral environments
What makes CI/CD load testing different
Load testing in CI/CD should be:
- Fast enough to fit into delivery workflows
- Repeatable across environments
- Realistic enough to catch regressions
- Safe enough not to overwhelm shared systems
That means your scripts should focus on business-critical paths and meaningful assertions rather than brute-force traffic generation. LoadForge’s distributed testing infrastructure helps you scale when needed, while real-time reporting makes it easy to compare runs and identify regressions across builds.
Writing Your First Load Test
A good first step is to create a lightweight smoke performance test for a staging API. This test should run quickly in CI and validate that core endpoints still perform acceptably after each deployment.
Basic CI smoke test for health, login, and dashboard
from locust import HttpUser, task, between
import os
class ContinuousCISmokeUser(HttpUser):
wait_time = between(1, 2)
def on_start(self):
self.email = os.getenv("LOADFORGE_TEST_EMAIL", "perf-test@example.com")
self.password = os.getenv("LOADFORGE_TEST_PASSWORD", "StagingPassword123!")
self.access_token = None
self.login()
def login(self):
payload = {
"email": self.email,
"password": self.password,
"rememberMe": True
}
with self.client.post(
"/api/v1/auth/login",
json=payload,
name="POST /api/v1/auth/login",
catch_response=True
) as response:
if response.status_code == 200:
data = response.json()
self.access_token = data.get("accessToken")
if self.access_token:
self.client.headers.update({
"Authorization": f"Bearer {self.access_token}"
})
response.success()
else:
response.failure("Login succeeded but no access token returned")
else:
response.failure(f"Login failed with status {response.status_code}")
@task(2)
def health_check(self):
with self.client.get(
"/api/v1/health",
name="GET /api/v1/health",
catch_response=True
) as response:
if response.status_code == 200:
response.success()
else:
response.failure(f"Health check failed: {response.status_code}")
@task(3)
def dashboard_summary(self):
with self.client.get(
"/api/v1/dashboard/summary",
name="GET /api/v1/dashboard/summary",
catch_response=True
) as response:
if response.status_code == 200 and "activeUsers" in response.text:
response.success()
else:
response.failure("Dashboard summary response invalid")Why this script works well in CI/CD
This is a practical starting point for continuous load testing because it:
- Authenticates like a real user
- Hits a health endpoint and an authenticated business endpoint
- Uses assertions to fail on invalid responses
- Keeps runtime short enough for pipeline execution
You might configure this test in your CI/CD pipeline to run for 3 to 5 minutes with 5 to 20 concurrent users. The goal is not full-scale stress testing. The goal is to catch obvious regressions quickly.
Example GitHub Actions workflow
Here is a simple example of how a CI pipeline might trigger a LoadForge test using an API call or CLI wrapper your team provides.
name: Continuous Load Test
on:
pull_request:
branches: [main]
workflow_dispatch:
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- name: Trigger LoadForge test
run: |
curl -X POST "https://api.loadforge.com/v1/tests/trigger" \
-H "Authorization: Bearer $LOADFORGE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"test_id": "staging-ci-smoke-test",
"environment": "staging",
"variables": {
"LOADFORGE_TEST_EMAIL": "perf-test@example.com",
"LOADFORGE_TEST_PASSWORD": "'"$LOADFORGE_TEST_PASSWORD"'"
}
}'
env:
LOADFORGE_API_TOKEN: ${{ secrets.LOADFORGE_API_TOKEN }}
LOADFORGE_TEST_PASSWORD: ${{ secrets.LOADFORGE_TEST_PASSWORD }}This kind of integration makes performance testing part of your normal delivery workflow instead of a separate manual task.
Advanced Load Testing Scenarios
Once you have a basic smoke test in place, the next step is to cover realistic user journeys and deployment risks. Below are several advanced continuous load testing scenarios that work well in CI/CD and DevOps environments.
Scenario 1: Validate a critical e-commerce flow after every staging deployment
For many teams, the most important CI/CD performance test is not a generic API check. It is a business-critical user journey such as browse, search, add to cart, and checkout.
from locust import HttpUser, task, between
import os
import random
class EcommerceDeploymentValidationUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
self.email = os.getenv("LOADFORGE_TEST_EMAIL", "buyer@example.com")
self.password = os.getenv("LOADFORGE_TEST_PASSWORD", "BuyerPassword123!")
self.access_token = None
self.cart_id = None
self.product_ids = []
self.login()
self.load_catalog()
def login(self):
response = self.client.post(
"/api/v1/auth/login",
json={"email": self.email, "password": self.password},
name="POST /api/v1/auth/login"
)
if response.status_code == 200:
self.access_token = response.json().get("accessToken")
self.client.headers.update({
"Authorization": f"Bearer {self.access_token}"
})
def load_catalog(self):
response = self.client.get("/api/v1/products?category=electronics&limit=20", name="GET /api/v1/products")
if response.status_code == 200:
products = response.json().get("items", [])
self.product_ids = [p["id"] for p in products if "id" in p]
@task(3)
def search_products(self):
query = random.choice(["laptop", "headphones", "monitor", "keyboard"])
with self.client.get(
f"/api/v1/search?q={query}&sort=relevance",
name="GET /api/v1/search",
catch_response=True
) as response:
if response.status_code == 200 and "results" in response.text:
response.success()
else:
response.failure("Search failed or returned invalid response")
@task(2)
def view_product(self):
if not self.product_ids:
return
product_id = random.choice(self.product_ids)
with self.client.get(
f"/api/v1/products/{product_id}",
name="GET /api/v1/products/:id",
catch_response=True
) as response:
if response.status_code == 200 and "price" in response.text:
response.success()
else:
response.failure("Product detail invalid")
@task(1)
def add_to_cart(self):
if not self.product_ids:
return
product_id = random.choice(self.product_ids)
payload = {
"productId": product_id,
"quantity": random.randint(1, 2)
}
with self.client.post(
"/api/v1/cart/items",
json=payload,
name="POST /api/v1/cart/items",
catch_response=True
) as response:
if response.status_code in [200, 201]:
response.success()
else:
response.failure(f"Add to cart failed: {response.status_code}")This type of test is ideal for post-deployment validation in staging. It verifies that key user flows still perform well after each release candidate build.
Scenario 2: Token refresh and role-based API testing for internal platforms
Many modern applications use short-lived access tokens and multiple user roles. Continuous load testing should reflect those patterns, especially for internal admin portals, SaaS dashboards, and B2B APIs.
from locust import HttpUser, task, between
import os
import time
class RoleBasedAPITestUser(HttpUser):
wait_time = between(1, 2)
def on_start(self):
self.client_id = os.getenv("OAUTH_CLIENT_ID", "ci-perf-client")
self.client_secret = os.getenv("OAUTH_CLIENT_SECRET", "super-secret")
self.refresh_token = os.getenv("OAUTH_REFRESH_TOKEN", "")
self.access_token = None
self.token_expires_at = 0
self.authenticate()
def authenticate(self):
payload = {
"grant_type": "refresh_token",
"client_id": self.client_id,
"client_secret": self.client_secret,
"refresh_token": self.refresh_token
}
response = self.client.post(
"/oauth/token",
data=payload,
name="POST /oauth/token"
)
if response.status_code == 200:
data = response.json()
self.access_token = data["access_token"]
expires_in = data.get("expires_in", 3600)
self.token_expires_at = time.time() + expires_in - 30
self.client.headers.update({
"Authorization": f"Bearer {self.access_token}"
})
def ensure_token(self):
if time.time() >= self.token_expires_at:
self.authenticate()
@task(3)
def list_accounts(self):
self.ensure_token()
with self.client.get(
"/api/v2/admin/accounts?page=1&pageSize=25",
name="GET /api/v2/admin/accounts",
catch_response=True
) as response:
if response.status_code == 200 and "accounts" in response.text:
response.success()
else:
response.failure("Account listing failed")
@task(2)
def account_usage_report(self):
self.ensure_token()
with self.client.get(
"/api/v2/admin/reports/usage?range=last_30_days",
name="GET /api/v2/admin/reports/usage",
catch_response=True
) as response:
if response.status_code == 200 and "totals" in response.text:
response.success()
else:
response.failure("Usage report failed")
@task(1)
def update_account_flag(self):
self.ensure_token()
payload = {
"featureFlags": {
"betaDashboard": True,
"advancedExports": False
}
}
with self.client.patch(
"/api/v2/admin/accounts/acct_10294/settings",
json=payload,
name="PATCH /api/v2/admin/accounts/:id/settings",
catch_response=True
) as response:
if response.status_code in [200, 204]:
response.success()
else:
response.failure(f"Account update failed: {response.status_code}")This script is useful when your CI/CD pipeline deploys changes to admin APIs, identity services, or authorization middleware. It helps catch token handling regressions and role-specific latency issues.
Scenario 3: Database-heavy reporting workflows for nightly performance testing
Some operations are too expensive for every pull request but are perfect for scheduled CI/CD jobs. Reporting endpoints, exports, analytics dashboards, and search indexing often fall into this category.
from locust import HttpUser, task, between
import os
import random
class NightlyReportingUser(HttpUser):
wait_time = between(2, 5)
def on_start(self):
api_key = os.getenv("REPORTING_API_KEY", "nightly-report-key")
self.client.headers.update({
"X-API-Key": api_key,
"Content-Type": "application/json"
})
@task(2)
def generate_sales_report(self):
payload = {
"reportType": "sales_summary",
"dateRange": {
"from": "2026-03-01",
"to": "2026-03-31"
},
"filters": {
"region": random.choice(["us-east", "eu-west", "ap-southeast"]),
"channel": random.choice(["web", "mobile", "partner"])
},
"format": "json"
}
with self.client.post(
"/api/v1/reports/generate",
json=payload,
name="POST /api/v1/reports/generate",
catch_response=True
) as response:
if response.status_code == 202:
job_id = response.json().get("jobId")
if job_id:
response.success()
else:
response.failure("Report job accepted but no jobId returned")
else:
response.failure(f"Report generation failed: {response.status_code}")
@task(1)
def query_analytics_dashboard(self):
with self.client.get(
"/api/v1/analytics/dashboard?widgets=revenue,orders,aov,retention",
name="GET /api/v1/analytics/dashboard",
catch_response=True
) as response:
if response.status_code == 200 and "widgets" in response.text:
response.success()
else:
response.failure("Analytics dashboard failed")
@task(1)
def fetch_large_customer_segment(self):
with self.client.post(
"/api/v1/customers/segments/query",
json={
"segmentName": "high_value_repeat_buyers",
"conditions": [
{"field": "lifetimeValue", "operator": "gte", "value": 1000},
{"field": "orderCount", "operator": "gte", "value": 5}
],
"limit": 500
},
name="POST /api/v1/customers/segments/query",
catch_response=True
) as response:
if response.status_code == 200 and "customers" in response.text:
response.success()
else:
response.failure("Customer segment query failed")This is a strong example of continuous load testing beyond simple endpoint checks. Nightly jobs like this help teams detect database and query planner regressions without slowing down every merge.
Analyzing Your Results
Running a load test in CI/CD is only useful if you know how to interpret the outcome. LoadForge provides real-time reporting and historical visibility, which makes it easier to compare performance across builds and environments.
Key metrics to watch
For continuous load testing, focus on these core metrics:
- Response time percentiles:
- Median for general health
- 95th percentile for user experience
- 99th percentile for tail latency
- Error rate:
- HTTP 5xx responses
- Authentication failures
- Timeouts
- Assertion failures in Locust scripts
- Requests per second:
- Useful for validating throughput consistency
- Concurrent user behavior:
- Check whether performance degrades sharply as load increases
Build pass/fail criteria
In CI/CD, every test should have explicit thresholds. For example:
- 95th percentile of
POST /api/v1/auth/loginmust stay under 700ms - 95th percentile of
GET /api/v1/searchmust stay under 500ms - Error rate must remain below 1%
- No endpoint may exceed 3% failed assertions
These thresholds can become release gates. If a deployment violates them, the pipeline should fail or require manual approval.
Compare results over time
A single load testing run only tells part of the story. Continuous load testing becomes powerful when you compare trends across builds:
- Did response time increase after a framework upgrade?
- Did a new feature cause higher CPU-related latency?
- Did database-heavy endpoints slow down after schema changes?
LoadForge’s cloud-based infrastructure and centralized reporting make this easier, especially when you run tests from the same regions and with the same user profiles over time.
Distinguish environment noise from real regressions
CI/CD environments can be noisy. Shared staging clusters, autoscaling lag, and background jobs can distort results. To reduce false alarms:
- Run the same test multiple times before tightening thresholds
- Use stable staging environments where possible
- Keep test datasets consistent
- Separate quick PR checks from deeper nightly performance testing
Performance Optimization Tips
Once your continuous load testing starts surfacing issues, these are the most common optimization areas to investigate.
Optimize authentication flows
If login or token refresh is slow:
- Cache public keys for JWT validation
- Reduce unnecessary database lookups during auth
- Reuse session state when appropriate
- Avoid calling third-party identity providers on every request
Improve database performance
If reporting or search endpoints degrade under load:
- Add indexes for commonly filtered fields
- Eliminate N+1 query patterns
- Use pagination instead of returning oversized payloads
- Cache expensive aggregate queries
- Review ORM-generated SQL in hot paths
Reduce payload and serialization overhead
If APIs are returning large responses:
- Remove unused fields from JSON
- Compress responses where appropriate
- Use selective field queries
- Avoid expensive object serialization in high-traffic endpoints
Tune infrastructure for CI/CD environments
Sometimes the issue is not the code but the environment:
- Ensure staging resource limits resemble production enough to be meaningful
- Warm caches before heavier tests
- Verify autoscaling settings
- Check container CPU and memory throttling
Use the right test type at the right pipeline stage
Not every pipeline needs full stress testing:
- PR checks: short smoke load tests
- Staging deploys: realistic workflow validation
- Nightly builds: broader and database-heavy scenarios
- Release gates: pre-production distributed load testing from multiple global test locations
LoadForge is especially useful here because you can scale from small validation runs to larger distributed testing without changing the core Locust approach.
Common Pitfalls to Avoid
Continuous load testing in CI/CD is extremely valuable, but only if it is implemented thoughtfully.
Treating every pipeline run like a full-scale stress test
This is one of the most common mistakes. Full stress testing on every commit slows delivery and may overload shared environments. Use lightweight performance testing for fast feedback and reserve larger tests for scheduled or release stages.
Using unrealistic scripts
A script that only hits /health 10,000 times is not meaningful. Your load testing should reflect real user behavior, including:
- Authentication
- Search and browse flows
- Writes and updates
- Reporting and heavy queries
Ignoring assertions
A 200 response does not always mean success. Validate response content, tokens, IDs, and expected fields. Locust’s catch_response=True is essential for realistic CI/CD performance testing.
Failing to manage test data
Continuous load testing can create noisy or invalid data if you repeatedly add carts, users, reports, or transactions. Use isolated test accounts, seeded datasets, or cleanup jobs.
Running tests against unstable environments
If your staging environment is constantly changing, results may be inconsistent. Try to control variables as much as possible so regressions are easier to identify.
Not integrating results into release decisions
If performance testing runs but nobody acts on failures, it becomes background noise. Define thresholds, alerts, and ownership so results drive decisions.
Conclusion
Continuous load testing in CI/CD with LoadForge helps teams catch performance issues early, reduce release risk, and build confidence in every deployment. By embedding realistic Locust-based load testing scripts into your delivery pipeline, you can validate critical user journeys, monitor API regressions, and uncover scaling problems long before production users are affected.
Start small with a smoke performance test, expand into authenticated workflows and database-heavy scenarios, and then use historical comparisons to make performance testing a standard part of engineering quality. With LoadForge’s cloud-based infrastructure, distributed testing, real-time reporting, CI/CD integration, and global test locations, it’s straightforward to build a continuous performance testing practice that scales with your team.
If you’re ready to make load testing a natural part of your DevOps workflow, try LoadForge and start turning performance testing into an automated release safeguard.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

GitLab CI Load Testing Pipeline with LoadForge
Set up LoadForge in GitLab CI to automate load testing and performance checks on every deployment.

SLA Monitoring with Load Testing and LoadForge
Learn how to use load testing to validate SLA targets and monitor performance before users are impacted.

Terraform Load Testing Environments with LoadForge
Provision repeatable load testing environments with Terraform and run scalable performance tests with LoadForge.