
Introduction
Modern delivery teams ship faster than ever, which means performance regressions can slip into production just as quickly as functional bugs. That’s why adding load testing to your GitLab CI pipeline is such a practical step: every deployment can be validated not just for correctness, but for speed, scalability, and stability.
In this guide, you’ll learn how to build a GitLab CI load testing pipeline with LoadForge so you can automate performance testing as part of your CI/CD workflow. We’ll cover how to trigger realistic Locust-based tests against your application, validate authenticated API flows, test deployment candidates, and interpret the results inside a DevOps-friendly process.
Because LoadForge uses Locust under the hood, your scripts are written in Python and remain flexible enough for everything from simple smoke-style performance checks to full stress testing scenarios. Combined with LoadForge’s cloud-based infrastructure, distributed testing, real-time reporting, global test locations, and CI/CD integration, GitLab teams can add meaningful performance gates without maintaining their own load generators.
Prerequisites
Before setting up a GitLab CI performance testing pipeline with LoadForge, make sure you have the following:
- A GitLab repository with a working
.gitlab-ci.yml - A deployed application or review environment to test
- A LoadForge account
- A LoadForge test created for your target application
- A LoadForge API token stored securely in GitLab CI/CD variables
- Basic familiarity with:
- GitLab CI jobs and stages
- REST APIs
- Python and Locust basics
It also helps to define what you want your automated load testing to prove. Common goals include:
- Catching performance regressions after deployment
- Verifying API response times under expected traffic
- Running stress testing before releases
- Validating login, search, checkout, or other critical user journeys
- Ensuring infrastructure changes do not reduce throughput
In GitLab, you’ll typically store these variables under your project’s CI/CD settings:
LOADFORGE_API_TOKENLOADFORGE_TEST_IDTARGET_HOSTAPP_USERNAMEAPP_PASSWORDAPI_CLIENT_IDAPI_CLIENT_SECRET
Understanding GitLab CI Under Load
GitLab CI itself is not the system being load tested in most cases. Instead, GitLab CI acts as the orchestration layer that automatically triggers performance testing against your application.
That distinction matters.
When people talk about GitLab CI load testing, they usually mean one of these workflows:
- GitLab CI triggers load tests against a deployed app after a build or deploy
- GitLab CI runs lightweight Locust checks directly in the pipeline
- GitLab CI calls LoadForge via API to launch larger distributed tests in the cloud
For serious performance testing and stress testing, the third option is usually best. Running high-concurrency tests directly inside a CI runner can create false results because:
- Shared runners have limited CPU and memory
- Network throughput is inconsistent
- The runner itself becomes the bottleneck
- You cannot easily scale to thousands of users
That’s where LoadForge is valuable. GitLab CI can trigger a test, while LoadForge provides distributed cloud-based infrastructure to generate realistic traffic from multiple regions.
Common Bottlenecks Found in CI-Driven Load Testing
When teams automate load testing in GitLab CI, they often uncover issues such as:
- Slow authentication endpoints under burst traffic
- Database contention after deployments
- Cache warm-up delays in new environments
- API gateway rate limiting
- Session handling problems
- Background jobs overwhelming app servers
- File upload or report generation endpoints timing out
A good GitLab CI load testing pipeline focuses on realistic flows, not synthetic ping tests. That means authenticated requests, real payloads, meaningful waits, and assertions that reflect business-critical behavior.
Writing Your First Load Test
Let’s start with a practical Locust script for a typical web application deployed through GitLab CI. This example simulates users logging in, loading a dashboard, viewing projects, and calling an API endpoint.
Basic authenticated application load test
from locust import HttpUser, task, between
import os
class GitLabCIPipelineUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
username = os.getenv("APP_USERNAME", "test.user@example.com")
password = os.getenv("APP_PASSWORD", "ChangeMe123!")
login_payload = {
"email": username,
"password": password,
"remember_me": True
}
with self.client.post(
"/api/v1/auth/login",
json=login_payload,
name="POST /api/v1/auth/login",
catch_response=True
) as response:
if response.status_code == 200 and "token" in response.text:
self.token = response.json()["token"]
self.client.headers.update({
"Authorization": f"Bearer {self.token}",
"Content-Type": "application/json"
})
response.success()
else:
response.failure(f"Login failed: {response.status_code} {response.text}")
@task(3)
def load_dashboard(self):
self.client.get("/dashboard", name="GET /dashboard")
@task(2)
def list_projects(self):
self.client.get("/api/v1/projects?per_page=20&page=1", name="GET /api/v1/projects")
@task(1)
def get_account_profile(self):
self.client.get("/api/v1/account/profile", name="GET /api/v1/account/profile")This is a strong starting point for CI/CD performance testing because it models a real authenticated user session. It does a few important things right:
- Uses
on_start()to authenticate once per virtual user - Stores a bearer token for subsequent requests
- Names requests clearly for reporting in LoadForge
- Exercises both HTML and API endpoints
- Includes realistic pacing with
between(1, 3)
Running this test in LoadForge from GitLab CI
A common approach is to store the script in your repository and keep the test configuration in LoadForge. Then your GitLab pipeline triggers the test after deployment.
Here’s a simple GitLab CI job that kicks off a LoadForge test through the API:
stages:
- build
- deploy
- performance
load_test_production:
stage: performance
image: alpine:3.20
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
before_script:
- apk add --no-cache curl jq
script:
- |
response=$(curl -s -X POST "https://app.loadforge.com/api/v1/tests/${LOADFORGE_TEST_ID}/start/" \
-H "Authorization: Token ${LOADFORGE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d "{
\"host\": \"${TARGET_HOST}\",
\"users\": 100,
\"spawn_rate\": 10,
\"run_time\": 300
}")
echo "$response" | jq .This job is useful for post-deployment load testing. After your deploy stage completes, GitLab CI starts a LoadForge test against the new environment.
Advanced Load Testing Scenarios
Once your basic test is working, the next step is to model the kinds of user behavior and API traffic that actually create load in production.
Scenario 1: OAuth token flow for API-heavy applications
Many modern services use OAuth2 or machine-to-machine authentication rather than simple login forms. If your GitLab CI pipeline validates backend APIs after deployment, this pattern is common.
from locust import HttpUser, task, between
import os
class OAuthAPIUser(HttpUser):
wait_time = between(0.5, 2)
def on_start(self):
client_id = os.getenv("API_CLIENT_ID", "web-frontend")
client_secret = os.getenv("API_CLIENT_SECRET", "super-secret-value")
token_payload = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"audience": "https://api.example.internal"
}
with self.client.post(
"/oauth/token",
json=token_payload,
name="POST /oauth/token",
catch_response=True
) as response:
if response.status_code == 200:
access_token = response.json().get("access_token")
if access_token:
self.client.headers.update({
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json"
})
response.success()
else:
response.failure("No access_token in response")
else:
response.failure(f"OAuth token request failed: {response.status_code}")
@task(4)
def get_orders(self):
self.client.get(
"/api/v2/orders?status=open&limit=50",
name="GET /api/v2/orders"
)
@task(2)
def get_order_metrics(self):
self.client.get(
"/api/v2/metrics/orders?window=24h",
name="GET /api/v2/metrics/orders"
)
@task(1)
def create_order_draft(self):
payload = {
"customer_id": "cust_10482",
"currency": "USD",
"items": [
{"sku": "SKU-CHAIR-BLK", "quantity": 2, "unit_price": 149.99},
{"sku": "SKU-DESK-OAK", "quantity": 1, "unit_price": 399.00}
],
"source": "web",
"notes": "Created by automated performance test"
}
self.client.post(
"/api/v2/orders/drafts",
json=payload,
name="POST /api/v2/orders/drafts"
)This script is ideal for API performance testing in CI/CD because it reflects how service-to-service traffic often behaves in production.
Scenario 2: Testing a deployment candidate with search and write-heavy flows
Suppose each merge request creates a review app. You want GitLab CI to run load testing against that review environment before promoting it.
This script simulates a more realistic user journey: login, search, fetch product details, and add items to a cart.
from locust import HttpUser, task, between
import random
import os
class ReviewAppUser(HttpUser):
wait_time = between(1, 4)
product_ids = [1012, 1044, 1098, 1121, 1203]
search_terms = ["office chair", "standing desk", "monitor arm", "desk lamp"]
def on_start(self):
credentials = {
"email": os.getenv("APP_USERNAME", "buyer@example.com"),
"password": os.getenv("APP_PASSWORD", "SecurePass123!")
}
with self.client.post(
"/api/auth/session",
json=credentials,
name="POST /api/auth/session",
catch_response=True
) as response:
if response.status_code == 200:
data = response.json()
token = data.get("access_token")
if token:
self.client.headers.update({
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
})
self.cart_id = data.get("cart_id")
response.success()
else:
response.failure("Missing access token")
else:
response.failure(f"Authentication failed: {response.text}")
@task(3)
def search_products(self):
term = random.choice(self.search_terms)
self.client.get(
f"/api/catalog/search?q={term}&sort=relevance&page=1",
name="GET /api/catalog/search"
)
@task(2)
def view_product(self):
product_id = random.choice(self.product_ids)
self.client.get(
f"/api/catalog/products/{product_id}",
name="GET /api/catalog/products/:id"
)
@task(1)
def add_to_cart(self):
product_id = random.choice(self.product_ids)
payload = {
"product_id": product_id,
"quantity": random.randint(1, 3)
}
self.client.post(
f"/api/carts/{self.cart_id}/items",
json=payload,
name="POST /api/carts/:id/items"
)This kind of test is especially useful for catching regressions caused by:
- Search index changes
- Slow product detail queries
- Cart write contention
- Session or token bugs introduced during deployment
Scenario 3: Polling LoadForge results and failing the GitLab pipeline on thresholds
A mature GitLab CI load testing pipeline should not just start a test. It should also evaluate the outcome and fail the pipeline when performance degrades.
Below is a practical GitLab CI job that starts a LoadForge test, polls for completion, and enforces a response time threshold.
performance_gate:
stage: performance
image: alpine:3.20
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
before_script:
- apk add --no-cache curl jq
script:
- |
start_response=$(curl -s -X POST "https://app.loadforge.com/api/v1/tests/${LOADFORGE_TEST_ID}/start/" \
-H "Authorization: Token ${LOADFORGE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d "{
\"host\": \"${TARGET_HOST}\",
\"users\": 200,
\"spawn_rate\": 20,
\"run_time\": 600
}")
echo "$start_response" | jq .
run_id=$(echo "$start_response" | jq -r '.id')
if [ "$run_id" = "null" ] || [ -z "$run_id" ]; then
echo "Failed to start LoadForge test"
exit 1
fi
echo "Waiting for test run ${run_id} to complete..."
while true; do
status_response=$(curl -s -X GET "https://app.loadforge.com/api/v1/test-runs/${run_id}/" \
-H "Authorization: Token ${LOADFORGE_API_TOKEN}")
status=$(echo "$status_response" | jq -r '.status')
echo "Current status: $status"
if [ "$status" = "completed" ]; then
avg_response_time=$(echo "$status_response" | jq -r '.avg_response_time')
error_rate=$(echo "$status_response" | jq -r '.error_rate')
echo "Average response time: ${avg_response_time} ms"
echo "Error rate: ${error_rate}%"
if [ "$(printf "%.0f" "$avg_response_time")" -gt 800 ]; then
echo "Performance gate failed: average response time exceeded 800 ms"
exit 1
fi
if awk "BEGIN {exit !($error_rate > 1.5)}"; then
echo "Performance gate failed: error rate exceeded 1.5%"
exit 1
fi
echo "Performance gate passed"
break
fi
if [ "$status" = "failed" ]; then
echo "LoadForge test run failed"
exit 1
fi
sleep 15
doneThis is where GitLab CI and LoadForge become especially powerful together. Your deployment pipeline can automatically stop a rollout if load testing reveals unacceptable latency or errors.
Scenario 4: Multi-step admin workflow with report generation
Some applications have expensive backend operations that only appear under realistic admin usage, such as report generation, exports, or audit log queries.
from locust import HttpUser, task, between
import os
import time
class AdminWorkflowUser(HttpUser):
wait_time = between(2, 5)
def on_start(self):
payload = {
"username": os.getenv("APP_USERNAME", "admin@example.com"),
"password": os.getenv("APP_PASSWORD", "AdminPass123!")
}
with self.client.post(
"/api/admin/login",
json=payload,
name="POST /api/admin/login",
catch_response=True
) as response:
if response.status_code == 200:
token = response.json().get("jwt")
if token:
self.client.headers.update({
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
})
response.success()
else:
response.failure("JWT token missing")
else:
response.failure(f"Admin login failed: {response.status_code}")
@task(2)
def query_audit_logs(self):
self.client.get(
"/api/admin/audit-logs?actor_type=user&from=2026-04-01&to=2026-04-06&page=1&page_size=100",
name="GET /api/admin/audit-logs"
)
@task(1)
def generate_usage_report(self):
report_request = {
"report_type": "usage_summary",
"date_range": {
"from": "2026-03-01",
"to": "2026-03-31"
},
"format": "csv",
"filters": {
"region": ["us-east-1", "eu-west-1"],
"plan": ["team", "enterprise"]
}
}
with self.client.post(
"/api/admin/reports",
json=report_request,
name="POST /api/admin/reports",
catch_response=True
) as response:
if response.status_code == 202:
report_id = response.json().get("report_id")
if report_id:
time.sleep(2)
self.client.get(
f"/api/admin/reports/{report_id}/status",
name="GET /api/admin/reports/:id/status"
)
response.success()
else:
response.failure("No report_id returned")
else:
response.failure(f"Report generation failed: {response.status_code}")This is a strong example of realistic performance testing for internal tools or SaaS admin backends, especially after infrastructure or query-layer changes.
Analyzing Your Results
Once GitLab CI triggers your LoadForge test, the next step is understanding what the data tells you.
LoadForge provides real-time reporting that makes it easier to evaluate test runs without manually aggregating Locust output. For GitLab CI workflows, focus on these metrics:
Response time percentiles
Average response time is useful, but percentiles are more meaningful.
Look at:
- P50: typical experience
- P95: slow but common requests
- P99: worst-case user experience under load
A deployment may look healthy on average while P95 and P99 degrade sharply. That often indicates database contention, lock waits, or overloaded downstream services.
Error rate
Even a small increase in errors can signal a serious regression. Investigate:
- 401 or 403 spikes after auth changes
- 429 responses from rate limiting
- 500 errors from application crashes
- 502 or 504 errors from gateway or upstream timeouts
Throughput
Requests per second helps you understand whether the system is scaling as expected. If user count rises but throughput plateaus, your app may have hit a bottleneck.
Endpoint-level breakdown
Because the Locust scripts above use explicit request names, LoadForge can show exactly which routes degrade:
POST /api/v1/auth/loginGET /api/catalog/searchPOST /api/carts/:id/itemsPOST /api/admin/reports
This is critical for CI/CD performance testing because you want to know whether a deployment hurt login, search, checkout, or admin operations.
Compare runs over time
One of the most valuable practices is comparing current results to previous GitLab CI pipeline runs. If the same test suddenly shows:
- 30% slower search responses
- doubled login latency
- higher cart write failures
you’ve likely introduced a regression. Automated load testing is most powerful when used as a trend-monitoring tool, not just a one-off stress test.
Performance Optimization Tips
If your GitLab CI load testing pipeline uncovers problems, these are common areas to optimize:
Cache expensive reads
Search endpoints, dashboards, and reporting APIs often benefit from caching. If load testing shows repeated slow reads, verify that cache keys, TTLs, and invalidation logic are working correctly.
Reduce authentication overhead
If login or token endpoints are slow, consider:
- token reuse where appropriate
- shorter auth database paths
- optimized session storage
- reduced external identity provider latency
Tune database queries
Write-heavy flows like cart updates, order creation, and report generation often expose:
- missing indexes
- N+1 queries
- lock contention
- poor pagination strategies
Use your slow query logs alongside LoadForge results.
Warm up new environments
Fresh deployments can suffer from cold caches, autoscaling lag, or just-in-time compilation overhead. If GitLab CI runs tests immediately after deployment, consider a short warm-up phase before the main load test.
Separate smoke tests from full load tests
Not every pipeline needs a 10-minute, 5,000-user test. A practical strategy is:
- Merge requests: lightweight performance smoke tests
- Main branch deploys: moderate load testing
- Release candidates: full stress testing
LoadForge makes this easier by letting you run different test profiles from the same CI/CD workflow.
Test from relevant geographies
If your users are global, performance can vary by region. LoadForge’s global test locations are useful for validating latency and throughput from the markets that matter most.
Common Pitfalls to Avoid
Teams often add load testing to GitLab CI but get misleading or low-value results. Avoid these mistakes:
Running heavy tests directly on CI runners
This is one of the biggest mistakes in CI/CD performance testing. Shared runners are not reliable load generators. Use LoadForge’s distributed testing instead.
Testing unrealistic endpoints only
A health check like /health or /ping tells you almost nothing about real application performance. Focus on business-critical flows.
Ignoring authentication
Many production bottlenecks happen during login, token issuance, session validation, and permission checks. If your test skips auth, you may miss major issues.
Using fake traffic patterns
If every virtual user requests the same endpoint at the same interval, results can be misleading. Use weighted tasks, realistic waits, and varied payloads.
Failing to set pass/fail thresholds
A load test that always “passes” is just reporting, not quality control. Define thresholds for latency, error rate, and throughput in your GitLab CI pipeline.
Overloading fragile test environments
Review apps and staging systems may not match production capacity. That’s fine, but calibrate expectations. Use them for regression detection, not necessarily for maximum-scale stress testing.
Not versioning your Locust scripts
Store your Locust files in the same GitLab repository as your application or infrastructure code. That way, your performance tests evolve with your app.
Conclusion
A GitLab CI load testing pipeline with LoadForge gives your team a practical way to catch performance regressions before users do. By combining GitLab’s automation with LoadForge’s cloud-based infrastructure, distributed testing, real-time reporting, global test locations, and CI/CD integration, you can turn load testing and performance testing into a repeatable part of every deployment.
Start simple with an authenticated Locust script, then expand into realistic API flows, admin workflows, and pipeline-based performance gates. Over time, this approach helps your team ship faster with more confidence.
If you’re ready to automate performance testing in GitLab CI, try LoadForge and build a pipeline that validates speed and scalability on every release.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

Continuous Load Testing in CI/CD with LoadForge
Build a continuous load testing workflow in CI/CD with LoadForge to catch performance issues early.

SLA Monitoring with Load Testing and LoadForge
Learn how to use load testing to validate SLA targets and monitor performance before users are impacted.

Terraform Load Testing Environments with LoadForge
Provision repeatable load testing environments with Terraform and run scalable performance tests with LoadForge.