Introduction

Modern teams rely on CI/CD pipelines to catch problems before they reach production. Unit tests, integration tests, linting, and security scans are now standard—but performance regressions often still slip through because load testing happens manually or too late in the release process.

That’s where automated load testing in CI/CD becomes powerful. If a pull request makes an API 40% slower, increases error rates, or causes timeouts under concurrency, your pipeline should be able to detect it and fail the build automatically. This guide shows you how to use LoadForge to set pass/fail thresholds and stop deployments when performance degrades.

In this guide, you’ll learn how to build realistic Locust-based load tests for CI/CD workflows, define performance expectations, and use LoadForge as a gate in your delivery pipeline. We’ll cover practical examples including authenticated API testing, regression checks for critical endpoints, and environment-specific performance validation.

If you want to implement performance testing, stress testing, and load testing as part of your DevOps workflow, this is the pattern to follow.

Prerequisites

Before you start, make sure you have:

A LoadForge account
A web application or API deployed in a test, staging, or ephemeral CI environment
Access to the application’s authentication method, such as:
- Bearer tokens
- OAuth client credentials
- Session login endpoints
A CI/CD platform such as:
- GitHub Actions
- GitLab CI
- Jenkins
- CircleCI
A set of performance expectations for your application, such as:
- 95th percentile response time under 500ms
- Error rate below 1%
- Specific endpoint latency thresholds
Basic familiarity with Python and Locust

You’ll also want to identify your most important user journeys. For CI/CD load testing, focus on a small number of critical flows that are likely to regress:

User login
Product search
Checkout or order creation
Dashboard or reporting APIs
File upload or export endpoints
Internal service-to-service APIs

LoadForge is especially useful here because it gives you cloud-based infrastructure, distributed testing, global test locations, real-time reporting, and CI/CD integration, making it practical to run automated performance checks on every release candidate.

Understanding CI/CD & DevOps Under Load

When teams talk about load testing in DevOps, they usually mean one of two things:

Running performance tests as part of the pipeline
Using performance results to make deployment decisions

The second part is what turns load testing into a real quality gate.

Why performance regressions happen in CI/CD

Even small changes can introduce measurable slowdowns:

A new database join adds 100ms to every request
An external API call is now made synchronously
Caching headers are removed accidentally
A search query becomes unindexed
Authentication middleware performs extra lookups
Serialization logic increases CPU usage

These changes may not break functionality, so traditional tests pass. But under load, they can create queueing, timeouts, and cascading failures.

Common bottlenecks you’ll catch with automated load testing

In CI/CD and DevOps workflows, performance testing often reveals:

Slow application startup after deployment
Database contention in newly added endpoints
CPU spikes from expensive business logic
Memory pressure causing increased response times
Authentication bottlenecks under concurrent login bursts
Rate limiting or upstream dependency saturation
Misconfigured autoscaling thresholds

What makes CI-based load testing different?

A pipeline load test is usually:

Shorter than a full-scale performance test
Focused on critical endpoints
Threshold-driven
Repeatable across commits
Designed to detect regressions rather than find absolute max capacity

For example, your nightly stress testing may run for 30 minutes across multiple regions, while your CI load testing job may run for 3–5 minutes with a smaller user count and strict pass/fail criteria.

Writing Your First Load Test

Let’s start with a realistic regression test for a common SaaS API. Imagine your CI pipeline deploys a staging build of an application with these endpoints:

POST /api/v1/auth/login
GET /api/v1/projects
GET /api/v1/projects/{id}/builds
POST /api/v1/builds/{id}/retry

This first Locust script logs in, fetches projects, inspects recent builds, and retries a build occasionally. These are realistic actions for a DevOps platform or internal CI dashboard.

python

from locust import HttpUser, task, between
import random
 
class DevOpsApiUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        response = self.client.post(
            "/api/v1/auth/login",
            json={
                "email": "ci-bot@example.com",
                "password": "SuperSecurePassword123!"
            },
            headers={"Content-Type": "application/json"},
            name="POST /api/v1/auth/login"
        )
 
        if response.status_code == 200:
            data = response.json()
            self.token = data["access_token"]
            self.headers = {
                "Authorization": f"Bearer {self.token}",
                "Accept": "application/json"
            }
        else:
            self.token = None
            self.headers = {}
 
    @task(3)
    def list_projects(self):
        self.client.get(
            "/api/v1/projects?limit=20&sort=updated_at:desc",
            headers=self.headers,
            name="GET /api/v1/projects"
        )
 
    @task(2)
    def list_project_builds(self):
        project_id = random.choice([101, 102, 103, 104])
        self.client.get(
            f"/api/v1/projects/{project_id}/builds?status=failed&limit=10",
            headers=self.headers,
            name="GET /api/v1/projects/:id/builds"
        )
 
    @task(1)
    def retry_build(self):
        build_id = random.choice([9001, 9002, 9003])
        self.client.post(
            f"/api/v1/builds/{build_id}/retry",
            json={
                "reason": "Regression validation from CI pipeline",
                "triggered_by": "loadforge-ci-check"
            },
            headers={**self.headers, "Content-Type": "application/json"},
            name="POST /api/v1/builds/:id/retry"
        )

Why this script works well in CI/CD

This test is a good first step because it:

Authenticates the same way a real client would
Exercises business-critical API paths
Uses realistic payloads
Mixes read-heavy and write-heavy operations
Produces endpoint-level metrics you can threshold in LoadForge

How to use this in a CI pipeline

In LoadForge, you would configure this test to run against your staging or preview environment after deployment. Then set pass/fail conditions such as:

Overall error rate less than 1%
POST /api/v1/auth/login p95 less than 800ms
GET /api/v1/projects p95 less than 500ms
GET /api/v1/projects/:id/builds p95 less than 700ms

If any threshold is exceeded, the test fails and your CI build can be marked as failed.

Advanced Load Testing Scenarios

Once you have a basic smoke-level performance gate, you can add more realistic regression scenarios.

Scenario 1: Authenticated pipeline health checks with token refresh

Many CI/CD systems use short-lived access tokens. If your application refreshes tokens during active sessions, you should test that flow too.

This example simulates a user session that logs in, refreshes the token, and accesses deployment endpoints.

python

from locust import HttpUser, task, between
 
class DeploymentApiUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        self.login()
 
    def login(self):
        response = self.client.post(
            "/api/v1/auth/login",
            json={
                "email": "release-manager@example.com",
                "password": "ReleasePassword456!"
            },
            headers={"Content-Type": "application/json"},
            name="POST /api/v1/auth/login"
        )
        response.raise_for_status()
 
        data = response.json()
        self.access_token = data["access_token"]
        self.refresh_token = data["refresh_token"]
        self.headers = {
            "Authorization": f"Bearer {self.access_token}",
            "Accept": "application/json"
        }
 
    @task(4)
    def list_deployments(self):
        self.client.get(
            "/api/v1/deployments?environment=staging&status=in_progress",
            headers=self.headers,
            name="GET /api/v1/deployments"
        )
 
    @task(2)
    def get_deployment_details(self):
        deployment_id = 55021
        self.client.get(
            f"/api/v1/deployments/{deployment_id}",
            headers=self.headers,
            name="GET /api/v1/deployments/:id"
        )
 
    @task(1)
    def refresh_session(self):
        response = self.client.post(
            "/api/v1/auth/refresh",
            json={"refresh_token": self.refresh_token},
            headers={"Content-Type": "application/json"},
            name="POST /api/v1/auth/refresh"
        )
 
        if response.status_code == 200:
            data = response.json()
            self.access_token = data["access_token"]
            self.headers["Authorization"] = f"Bearer {self.access_token}"

This is useful when performance regressions affect session handling, auth middleware, or token storage systems like Redis or database-backed session tables.

Scenario 2: Regression testing a build trigger and artifact workflow

A common DevOps use case is validating build and artifact endpoints. These APIs often become slower over time because they touch databases, queues, object storage, and audit logging systems.

python

from locust import HttpUser, task, between
import random
import uuid
 
class BuildPipelineUser(HttpUser):
    wait_time = between(2, 4)
 
    def on_start(self):
        response = self.client.post(
            "/api/v1/auth/token",
            json={
                "client_id": "ci-runner",
                "client_secret": "ci-runner-secret",
                "grant_type": "client_credentials",
                "scope": "builds:read builds:write artifacts:read"
            },
            headers={"Content-Type": "application/json"},
            name="POST /api/v1/auth/token"
        )
        response.raise_for_status()
        token = response.json()["access_token"]
        self.headers = {
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
            "Accept": "application/json"
        }
 
    @task(3)
    def trigger_build(self):
        branch = random.choice(["main", "develop", "release/2026.04"])
        payload = {
            "project_id": 2001,
            "branch": branch,
            "commit_sha": str(uuid.uuid4()).replace("-", "")[:12],
            "pipeline_source": "ci_regression_test",
            "variables": {
                "RUN_E2E": "false",
                "RUN_PERF_SMOKE": "true"
            }
        }
 
        self.client.post(
            "/api/v1/builds",
            json=payload,
            headers=self.headers,
            name="POST /api/v1/builds"
        )
 
    @task(2)
    def get_build_status(self):
        build_id = random.choice([78110, 78111, 78112, 78113])
        self.client.get(
            f"/api/v1/builds/{build_id}/status",
            headers=self.headers,
            name="GET /api/v1/builds/:id/status"
        )
 
    @task(1)
    def list_artifacts(self):
        build_id = random.choice([78110, 78111, 78112])
        self.client.get(
            f"/api/v1/builds/{build_id}/artifacts?type=report",
            headers=self.headers,
            name="GET /api/v1/builds/:id/artifacts"
        )

This script is ideal for catching regressions in:

Build creation latency
Queue-backed status lookups
Artifact metadata retrieval
Auth and permission checks
Audit/event publishing overhead

Scenario 3: Testing a database-heavy reporting endpoint in CI

Reporting endpoints are frequent regression hotspots because they aggregate data across builds, deployments, and environments. These APIs may still return 200 responses while becoming unacceptably slow.

python

from locust import HttpUser, task, between
 
class ReportingUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        response = self.client.post(
            "/api/v1/session/login",
            json={
                "username": "analytics-bot",
                "password": "AnalyticsPassword789!"
            },
            headers={"Content-Type": "application/json"},
            name="POST /api/v1/session/login"
        )
        response.raise_for_status()
        session_cookie = response.cookies.get("session_id")
        self.headers = {
            "Accept": "application/json",
            "X-Requested-With": "XMLHttpRequest"
        }
        self.cookies = {"session_id": session_cookie}
 
    @task(3)
    def deployment_frequency_report(self):
        self.client.get(
            "/api/v1/reports/deployment-frequency?team=platform&window=30d",
            headers=self.headers,
            cookies=self.cookies,
            name="GET /api/v1/reports/deployment-frequency"
        )
 
    @task(2)
    def change_failure_rate_report(self):
        self.client.get(
            "/api/v1/reports/change-failure-rate?service=payments&window=90d",
            headers=self.headers,
            cookies=self.cookies,
            name="GET /api/v1/reports/change-failure-rate"
        )
 
    @task(1)
    def lead_time_report(self):
        self.client.get(
            "/api/v1/reports/lead-time?repository=checkout-service&branch=main",
            headers=self.headers,
            cookies=self.cookies,
            name="GET /api/v1/reports/lead-time"
        )

This scenario helps identify:

Slow SQL queries
Missing indexes
Expensive report generation logic
Cache misses
N+1 query issues
Data warehouse or analytics backend latency

These database-heavy endpoints are perfect candidates for threshold-based build failure because they often degrade gradually over time.

Analyzing Your Results

After running your load test in LoadForge, the next step is deciding whether the build should pass or fail.

Key metrics to watch

For CI/CD performance testing, focus on these metrics:

Response time percentiles:
- Median can hide problems
- p95 and p99 are better indicators of user experience under load
Error rate:
- Watch both HTTP failures and application-level failures
Requests per second:
- Useful for checking throughput consistency
Endpoint-specific latency:
- Critical for identifying which route regressed
Response distribution over time:
- Reveals warm-up issues, memory leaks, or resource exhaustion

Good threshold examples

A practical set of pass/fail conditions might look like:

Total error rate < 1%
POST /api/v1/auth/login p95 < 800ms
GET /api/v1/projects p95 < 500ms
POST /api/v1/builds p95 < 1200ms
GET /api/v1/reports/deployment-frequency p95 < 1500ms

You can also set stricter rules for core business flows and looser ones for secondary endpoints.

Comparing results against previous runs

The real value of CI/CD load testing is regression detection. A single test result matters less than the trend.

For example:

Last week: GET /api/v1/builds/:id/status p95 = 220ms
Current branch: p95 = 680ms

Even if 680ms is technically “acceptable,” that jump may indicate a serious regression. LoadForge’s real-time reporting and historical test visibility make these changes easier to spot before they hit production.

Using LoadForge in CI/CD gates

A typical workflow looks like this:

Deploy application to staging or preview environment
Trigger LoadForge test via CI job
Wait for test completion
Check pass/fail status from LoadForge thresholds
Fail the pipeline if thresholds are breached

Here is an example GitHub Actions step pattern for a CI gate:

yaml

name: performance-regression-check
 
on:
  pull_request:
    branches: [main]
 
jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy preview environment
        run: ./scripts/deploy-preview.sh
 
      - name: Trigger LoadForge test
        run: |
          curl -X POST "https://api.loadforge.com/v1/tests/123456/run" \
            -H "Authorization: Bearer ${{ secrets.LOADFORGE_API_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d '{
              "environment_url": "https://pr-482.staging.example.com",
              "notes": "PR #482 performance regression check"
            }'
 
      - name: Poll for result
        run: ./scripts/check-loadforge-result.sh

And an example shell script that exits non-zero if the test failed:

bash

#!/usr/bin/env bash
set -euo pipefail
 
TEST_RUN_ID="$1"
API_TOKEN="$LOADFORGE_API_TOKEN"
 
while true; do
  RESPONSE=$(curl -s "https://api.loadforge.com/v1/test-runs/${TEST_RUN_ID}" \
    -H "Authorization: Bearer ${API_TOKEN}")
 
  STATUS=$(echo "$RESPONSE" | jq -r '.status')
  RESULT=$(echo "$RESPONSE" | jq -r '.result')
 
  if [[ "$STATUS" == "completed" ]]; then
    if [[ "$RESULT" == "passed" ]]; then
      echo "Load test passed"
      exit 0
    else
      echo "Load test failed"
      exit 1
    fi
  fi
 
  echo "Waiting for test completion..."
  sleep 15
done

The exact API details may vary based on your LoadForge setup, but the pattern remains the same: trigger, poll, evaluate, fail the build if performance regressed.

Performance Optimization Tips

When your CI build fails due to load test thresholds, use that as a signal to investigate systematically.

Optimize the slowest endpoints first

Look at the endpoint-level breakdown in LoadForge and prioritize:

Highest p95 latency
Largest regression from baseline
Highest error-producing routes

Review database access patterns

For database-heavy APIs:

Add indexes for new filters or joins
Reduce query count per request
Avoid N+1 ORM behavior
Cache frequently requested aggregates
Paginate large result sets

Improve authentication performance

If login or token refresh is slow:

Cache user/session lookups
Reduce repeated permission checks
Optimize JWT validation or key fetching
Move expensive auth hooks off the request path where possible

Reduce payload and serialization overhead

Large JSON responses can become a hidden bottleneck:

Return only fields needed by the client
Compress responses
Avoid deeply nested objects
Stream large exports instead of generating them synchronously

Use environment-appropriate test sizes

Your CI performance test should be small enough to run quickly, but large enough to expose regressions. Then use broader LoadForge distributed testing in nightly or pre-release stages for deeper stress testing across cloud-based infrastructure and global test locations.

Common Pitfalls to Avoid

Testing too many endpoints in CI

Your pipeline should not run a full-scale performance testing suite on every commit. Keep CI tests focused on critical paths and known regression hotspots.

Using unrealistic user behavior

If your load test only hammers one endpoint with no auth, no session handling, and no realistic pacing, the results may not reflect real application behavior. Use actual login flows, realistic headers, and production-like payloads.

Ignoring percentiles

Average response time is not enough. A build can have a good average and still produce terrible tail latency. Always track p95 and p99.

Running against unstable environments

If your staging environment is noisy, underpowered, or shared with unrelated work, your CI load testing results may be inconsistent. Try to test against predictable environments or ephemeral deployments where possible.

Failing builds on overly aggressive thresholds

If your thresholds are too strict, teams will start ignoring failures. Start with realistic baselines and tighten over time.

Not separating smoke, load, and stress testing

These are related but different:

Smoke performance test: quick regression check in CI
Load testing: validate expected concurrency
Stress testing: push beyond expected capacity

Use CI for regression-oriented checks and reserve larger tests for scheduled or pre-release pipelines.

Forgetting warm-up effects

Some applications perform poorly just after deployment due to cold caches, JIT compilation, or lazy initialization. Decide whether your threshold should include or exclude warm-up behavior based on your production expectations.

Conclusion

Automating load testing in CI/CD is one of the most effective ways to prevent performance regressions from reaching production. By defining pass/fail thresholds in LoadForge, you can turn performance testing into a real deployment gate instead of a manual afterthought.

Start with a focused Locust script that covers your most important endpoints, add realistic authentication and payloads, and define thresholds around p95 latency and error rate. As your process matures, expand into more advanced scenarios like token refresh, build orchestration, and reporting APIs. With LoadForge, you can run these tests on cloud-based infrastructure, scale them with distributed testing, view real-time reporting, and integrate them directly into your CI/CD pipeline.

If you’re ready to make load testing a standard part of your DevOps workflow, try LoadForge and start failing builds before performance regressions reach your users.

How to Fail CI Builds on Load Test Performance Regressions

Introduction

Prerequisites

Understanding CI/CD & DevOps Under Load

Why performance regressions happen in CI/CD

Common bottlenecks you’ll catch with automated load testing

What makes CI-based load testing different?

Writing Your First Load Test

Why this script works well in CI/CD

How to use this in a CI pipeline

Advanced Load Testing Scenarios

Scenario 1: Authenticated pipeline health checks with token refresh

Scenario 2: Regression testing a build trigger and artifact workflow

Scenario 3: Testing a database-heavy reporting endpoint in CI

Analyzing Your Results

Key metrics to watch

Good threshold examples

Comparing results against previous runs

Using LoadForge in CI/CD gates

Performance Optimization Tips

Optimize the slowest endpoints first

Review database access patterns

Improve authentication performance

Reduce payload and serialization overhead

Use environment-appropriate test sizes

Common Pitfalls to Avoid

Testing too many endpoints in CI

Using unrealistic user behavior

Ignoring percentiles

Running against unstable environments

Failing builds on overly aggressive thresholds

Not separating smoke, load, and stress testing

Forgetting warm-up effects

Conclusion

Try LoadForge free for 7 days

Related guides

Pre-Deployment Load Testing in CI/CD Pipelines

ArgoCD Load Testing for Progressive Delivery

How to Automate Load Testing in CircleCI