LoadForge LogoLoadForge

GitLab CI Load Testing Pipeline with LoadForge

GitLab CI Load Testing Pipeline with LoadForge

Introduction

Modern delivery teams ship faster than ever, which means performance regressions can slip into production just as quickly as functional bugs. That’s why adding load testing to your GitLab CI pipeline is such a practical step: every deployment can be validated not just for correctness, but for speed, scalability, and stability.

In this guide, you’ll learn how to build a GitLab CI load testing pipeline with LoadForge so you can automate performance testing as part of your CI/CD workflow. We’ll cover how to trigger realistic Locust-based tests against your application, validate authenticated API flows, test deployment candidates, and interpret the results inside a DevOps-friendly process.

Because LoadForge uses Locust under the hood, your scripts are written in Python and remain flexible enough for everything from simple smoke-style performance checks to full stress testing scenarios. Combined with LoadForge’s cloud-based infrastructure, distributed testing, real-time reporting, global test locations, and CI/CD integration, GitLab teams can add meaningful performance gates without maintaining their own load generators.

Prerequisites

Before setting up a GitLab CI performance testing pipeline with LoadForge, make sure you have the following:

  • A GitLab repository with a working .gitlab-ci.yml
  • A deployed application or review environment to test
  • A LoadForge account
  • A LoadForge test created for your target application
  • A LoadForge API token stored securely in GitLab CI/CD variables
  • Basic familiarity with:
    • GitLab CI jobs and stages
    • REST APIs
    • Python and Locust basics

It also helps to define what you want your automated load testing to prove. Common goals include:

  • Catching performance regressions after deployment
  • Verifying API response times under expected traffic
  • Running stress testing before releases
  • Validating login, search, checkout, or other critical user journeys
  • Ensuring infrastructure changes do not reduce throughput

In GitLab, you’ll typically store these variables under your project’s CI/CD settings:

  • LOADFORGE_API_TOKEN
  • LOADFORGE_TEST_ID
  • TARGET_HOST
  • APP_USERNAME
  • APP_PASSWORD
  • API_CLIENT_ID
  • API_CLIENT_SECRET

Understanding GitLab CI Under Load

GitLab CI itself is not the system being load tested in most cases. Instead, GitLab CI acts as the orchestration layer that automatically triggers performance testing against your application.

That distinction matters.

When people talk about GitLab CI load testing, they usually mean one of these workflows:

  1. GitLab CI triggers load tests against a deployed app after a build or deploy
  2. GitLab CI runs lightweight Locust checks directly in the pipeline
  3. GitLab CI calls LoadForge via API to launch larger distributed tests in the cloud

For serious performance testing and stress testing, the third option is usually best. Running high-concurrency tests directly inside a CI runner can create false results because:

  • Shared runners have limited CPU and memory
  • Network throughput is inconsistent
  • The runner itself becomes the bottleneck
  • You cannot easily scale to thousands of users

That’s where LoadForge is valuable. GitLab CI can trigger a test, while LoadForge provides distributed cloud-based infrastructure to generate realistic traffic from multiple regions.

Common Bottlenecks Found in CI-Driven Load Testing

When teams automate load testing in GitLab CI, they often uncover issues such as:

  • Slow authentication endpoints under burst traffic
  • Database contention after deployments
  • Cache warm-up delays in new environments
  • API gateway rate limiting
  • Session handling problems
  • Background jobs overwhelming app servers
  • File upload or report generation endpoints timing out

A good GitLab CI load testing pipeline focuses on realistic flows, not synthetic ping tests. That means authenticated requests, real payloads, meaningful waits, and assertions that reflect business-critical behavior.

Writing Your First Load Test

Let’s start with a practical Locust script for a typical web application deployed through GitLab CI. This example simulates users logging in, loading a dashboard, viewing projects, and calling an API endpoint.

Basic authenticated application load test

python
from locust import HttpUser, task, between
import os
 
class GitLabCIPipelineUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        username = os.getenv("APP_USERNAME", "test.user@example.com")
        password = os.getenv("APP_PASSWORD", "ChangeMe123!")
 
        login_payload = {
            "email": username,
            "password": password,
            "remember_me": True
        }
 
        with self.client.post(
            "/api/v1/auth/login",
            json=login_payload,
            name="POST /api/v1/auth/login",
            catch_response=True
        ) as response:
            if response.status_code == 200 and "token" in response.text:
                self.token = response.json()["token"]
                self.client.headers.update({
                    "Authorization": f"Bearer {self.token}",
                    "Content-Type": "application/json"
                })
                response.success()
            else:
                response.failure(f"Login failed: {response.status_code} {response.text}")
 
    @task(3)
    def load_dashboard(self):
        self.client.get("/dashboard", name="GET /dashboard")
 
    @task(2)
    def list_projects(self):
        self.client.get("/api/v1/projects?per_page=20&page=1", name="GET /api/v1/projects")
 
    @task(1)
    def get_account_profile(self):
        self.client.get("/api/v1/account/profile", name="GET /api/v1/account/profile")

This is a strong starting point for CI/CD performance testing because it models a real authenticated user session. It does a few important things right:

  • Uses on_start() to authenticate once per virtual user
  • Stores a bearer token for subsequent requests
  • Names requests clearly for reporting in LoadForge
  • Exercises both HTML and API endpoints
  • Includes realistic pacing with between(1, 3)

Running this test in LoadForge from GitLab CI

A common approach is to store the script in your repository and keep the test configuration in LoadForge. Then your GitLab pipeline triggers the test after deployment.

Here’s a simple GitLab CI job that kicks off a LoadForge test through the API:

yaml
stages:
  - build
  - deploy
  - performance
 
load_test_production:
  stage: performance
  image: alpine:3.20
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
  before_script:
    - apk add --no-cache curl jq
  script:
    - |
      response=$(curl -s -X POST "https://app.loadforge.com/api/v1/tests/${LOADFORGE_TEST_ID}/start/" \
        -H "Authorization: Token ${LOADFORGE_API_TOKEN}" \
        -H "Content-Type: application/json" \
        -d "{
          \"host\": \"${TARGET_HOST}\",
          \"users\": 100,
          \"spawn_rate\": 10,
          \"run_time\": 300
        }")
      echo "$response" | jq .

This job is useful for post-deployment load testing. After your deploy stage completes, GitLab CI starts a LoadForge test against the new environment.

Advanced Load Testing Scenarios

Once your basic test is working, the next step is to model the kinds of user behavior and API traffic that actually create load in production.

Scenario 1: OAuth token flow for API-heavy applications

Many modern services use OAuth2 or machine-to-machine authentication rather than simple login forms. If your GitLab CI pipeline validates backend APIs after deployment, this pattern is common.

python
from locust import HttpUser, task, between
import os
 
class OAuthAPIUser(HttpUser):
    wait_time = between(0.5, 2)
 
    def on_start(self):
        client_id = os.getenv("API_CLIENT_ID", "web-frontend")
        client_secret = os.getenv("API_CLIENT_SECRET", "super-secret-value")
 
        token_payload = {
            "grant_type": "client_credentials",
            "client_id": client_id,
            "client_secret": client_secret,
            "audience": "https://api.example.internal"
        }
 
        with self.client.post(
            "/oauth/token",
            json=token_payload,
            name="POST /oauth/token",
            catch_response=True
        ) as response:
            if response.status_code == 200:
                access_token = response.json().get("access_token")
                if access_token:
                    self.client.headers.update({
                        "Authorization": f"Bearer {access_token}",
                        "Content-Type": "application/json"
                    })
                    response.success()
                else:
                    response.failure("No access_token in response")
            else:
                response.failure(f"OAuth token request failed: {response.status_code}")
 
    @task(4)
    def get_orders(self):
        self.client.get(
            "/api/v2/orders?status=open&limit=50",
            name="GET /api/v2/orders"
        )
 
    @task(2)
    def get_order_metrics(self):
        self.client.get(
            "/api/v2/metrics/orders?window=24h",
            name="GET /api/v2/metrics/orders"
        )
 
    @task(1)
    def create_order_draft(self):
        payload = {
            "customer_id": "cust_10482",
            "currency": "USD",
            "items": [
                {"sku": "SKU-CHAIR-BLK", "quantity": 2, "unit_price": 149.99},
                {"sku": "SKU-DESK-OAK", "quantity": 1, "unit_price": 399.00}
            ],
            "source": "web",
            "notes": "Created by automated performance test"
        }
 
        self.client.post(
            "/api/v2/orders/drafts",
            json=payload,
            name="POST /api/v2/orders/drafts"
        )

This script is ideal for API performance testing in CI/CD because it reflects how service-to-service traffic often behaves in production.

Scenario 2: Testing a deployment candidate with search and write-heavy flows

Suppose each merge request creates a review app. You want GitLab CI to run load testing against that review environment before promoting it.

This script simulates a more realistic user journey: login, search, fetch product details, and add items to a cart.

python
from locust import HttpUser, task, between
import random
import os
 
class ReviewAppUser(HttpUser):
    wait_time = between(1, 4)
 
    product_ids = [1012, 1044, 1098, 1121, 1203]
    search_terms = ["office chair", "standing desk", "monitor arm", "desk lamp"]
 
    def on_start(self):
        credentials = {
            "email": os.getenv("APP_USERNAME", "buyer@example.com"),
            "password": os.getenv("APP_PASSWORD", "SecurePass123!")
        }
 
        with self.client.post(
            "/api/auth/session",
            json=credentials,
            name="POST /api/auth/session",
            catch_response=True
        ) as response:
            if response.status_code == 200:
                data = response.json()
                token = data.get("access_token")
                if token:
                    self.client.headers.update({
                        "Authorization": f"Bearer {token}",
                        "Content-Type": "application/json"
                    })
                    self.cart_id = data.get("cart_id")
                    response.success()
                else:
                    response.failure("Missing access token")
            else:
                response.failure(f"Authentication failed: {response.text}")
 
    @task(3)
    def search_products(self):
        term = random.choice(self.search_terms)
        self.client.get(
            f"/api/catalog/search?q={term}&sort=relevance&page=1",
            name="GET /api/catalog/search"
        )
 
    @task(2)
    def view_product(self):
        product_id = random.choice(self.product_ids)
        self.client.get(
            f"/api/catalog/products/{product_id}",
            name="GET /api/catalog/products/:id"
        )
 
    @task(1)
    def add_to_cart(self):
        product_id = random.choice(self.product_ids)
        payload = {
            "product_id": product_id,
            "quantity": random.randint(1, 3)
        }
 
        self.client.post(
            f"/api/carts/{self.cart_id}/items",
            json=payload,
            name="POST /api/carts/:id/items"
        )

This kind of test is especially useful for catching regressions caused by:

  • Search index changes
  • Slow product detail queries
  • Cart write contention
  • Session or token bugs introduced during deployment

Scenario 3: Polling LoadForge results and failing the GitLab pipeline on thresholds

A mature GitLab CI load testing pipeline should not just start a test. It should also evaluate the outcome and fail the pipeline when performance degrades.

Below is a practical GitLab CI job that starts a LoadForge test, polls for completion, and enforces a response time threshold.

yaml
performance_gate:
  stage: performance
  image: alpine:3.20
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
  before_script:
    - apk add --no-cache curl jq
  script:
    - |
      start_response=$(curl -s -X POST "https://app.loadforge.com/api/v1/tests/${LOADFORGE_TEST_ID}/start/" \
        -H "Authorization: Token ${LOADFORGE_API_TOKEN}" \
        -H "Content-Type: application/json" \
        -d "{
          \"host\": \"${TARGET_HOST}\",
          \"users\": 200,
          \"spawn_rate\": 20,
          \"run_time\": 600
        }")
 
      echo "$start_response" | jq .
      run_id=$(echo "$start_response" | jq -r '.id')
 
      if [ "$run_id" = "null" ] || [ -z "$run_id" ]; then
        echo "Failed to start LoadForge test"
        exit 1
      fi
 
      echo "Waiting for test run ${run_id} to complete..."
 
      while true; do
        status_response=$(curl -s -X GET "https://app.loadforge.com/api/v1/test-runs/${run_id}/" \
          -H "Authorization: Token ${LOADFORGE_API_TOKEN}")
        status=$(echo "$status_response" | jq -r '.status')
        echo "Current status: $status"
 
        if [ "$status" = "completed" ]; then
          avg_response_time=$(echo "$status_response" | jq -r '.avg_response_time')
          error_rate=$(echo "$status_response" | jq -r '.error_rate')
 
          echo "Average response time: ${avg_response_time} ms"
          echo "Error rate: ${error_rate}%"
 
          if [ "$(printf "%.0f" "$avg_response_time")" -gt 800 ]; then
            echo "Performance gate failed: average response time exceeded 800 ms"
            exit 1
          fi
 
          if awk "BEGIN {exit !($error_rate > 1.5)}"; then
            echo "Performance gate failed: error rate exceeded 1.5%"
            exit 1
          fi
 
          echo "Performance gate passed"
          break
        fi
 
        if [ "$status" = "failed" ]; then
          echo "LoadForge test run failed"
          exit 1
        fi
 
        sleep 15
      done

This is where GitLab CI and LoadForge become especially powerful together. Your deployment pipeline can automatically stop a rollout if load testing reveals unacceptable latency or errors.

Scenario 4: Multi-step admin workflow with report generation

Some applications have expensive backend operations that only appear under realistic admin usage, such as report generation, exports, or audit log queries.

python
from locust import HttpUser, task, between
import os
import time
 
class AdminWorkflowUser(HttpUser):
    wait_time = between(2, 5)
 
    def on_start(self):
        payload = {
            "username": os.getenv("APP_USERNAME", "admin@example.com"),
            "password": os.getenv("APP_PASSWORD", "AdminPass123!")
        }
 
        with self.client.post(
            "/api/admin/login",
            json=payload,
            name="POST /api/admin/login",
            catch_response=True
        ) as response:
            if response.status_code == 200:
                token = response.json().get("jwt")
                if token:
                    self.client.headers.update({
                        "Authorization": f"Bearer {token}",
                        "Content-Type": "application/json"
                    })
                    response.success()
                else:
                    response.failure("JWT token missing")
            else:
                response.failure(f"Admin login failed: {response.status_code}")
 
    @task(2)
    def query_audit_logs(self):
        self.client.get(
            "/api/admin/audit-logs?actor_type=user&from=2026-04-01&to=2026-04-06&page=1&page_size=100",
            name="GET /api/admin/audit-logs"
        )
 
    @task(1)
    def generate_usage_report(self):
        report_request = {
            "report_type": "usage_summary",
            "date_range": {
                "from": "2026-03-01",
                "to": "2026-03-31"
            },
            "format": "csv",
            "filters": {
                "region": ["us-east-1", "eu-west-1"],
                "plan": ["team", "enterprise"]
            }
        }
 
        with self.client.post(
            "/api/admin/reports",
            json=report_request,
            name="POST /api/admin/reports",
            catch_response=True
        ) as response:
            if response.status_code == 202:
                report_id = response.json().get("report_id")
                if report_id:
                    time.sleep(2)
                    self.client.get(
                        f"/api/admin/reports/{report_id}/status",
                        name="GET /api/admin/reports/:id/status"
                    )
                    response.success()
                else:
                    response.failure("No report_id returned")
            else:
                response.failure(f"Report generation failed: {response.status_code}")

This is a strong example of realistic performance testing for internal tools or SaaS admin backends, especially after infrastructure or query-layer changes.

Analyzing Your Results

Once GitLab CI triggers your LoadForge test, the next step is understanding what the data tells you.

LoadForge provides real-time reporting that makes it easier to evaluate test runs without manually aggregating Locust output. For GitLab CI workflows, focus on these metrics:

Response time percentiles

Average response time is useful, but percentiles are more meaningful.

Look at:

  • P50: typical experience
  • P95: slow but common requests
  • P99: worst-case user experience under load

A deployment may look healthy on average while P95 and P99 degrade sharply. That often indicates database contention, lock waits, or overloaded downstream services.

Error rate

Even a small increase in errors can signal a serious regression. Investigate:

  • 401 or 403 spikes after auth changes
  • 429 responses from rate limiting
  • 500 errors from application crashes
  • 502 or 504 errors from gateway or upstream timeouts

Throughput

Requests per second helps you understand whether the system is scaling as expected. If user count rises but throughput plateaus, your app may have hit a bottleneck.

Endpoint-level breakdown

Because the Locust scripts above use explicit request names, LoadForge can show exactly which routes degrade:

  • POST /api/v1/auth/login
  • GET /api/catalog/search
  • POST /api/carts/:id/items
  • POST /api/admin/reports

This is critical for CI/CD performance testing because you want to know whether a deployment hurt login, search, checkout, or admin operations.

Compare runs over time

One of the most valuable practices is comparing current results to previous GitLab CI pipeline runs. If the same test suddenly shows:

  • 30% slower search responses
  • doubled login latency
  • higher cart write failures

you’ve likely introduced a regression. Automated load testing is most powerful when used as a trend-monitoring tool, not just a one-off stress test.

Performance Optimization Tips

If your GitLab CI load testing pipeline uncovers problems, these are common areas to optimize:

Cache expensive reads

Search endpoints, dashboards, and reporting APIs often benefit from caching. If load testing shows repeated slow reads, verify that cache keys, TTLs, and invalidation logic are working correctly.

Reduce authentication overhead

If login or token endpoints are slow, consider:

  • token reuse where appropriate
  • shorter auth database paths
  • optimized session storage
  • reduced external identity provider latency

Tune database queries

Write-heavy flows like cart updates, order creation, and report generation often expose:

  • missing indexes
  • N+1 queries
  • lock contention
  • poor pagination strategies

Use your slow query logs alongside LoadForge results.

Warm up new environments

Fresh deployments can suffer from cold caches, autoscaling lag, or just-in-time compilation overhead. If GitLab CI runs tests immediately after deployment, consider a short warm-up phase before the main load test.

Separate smoke tests from full load tests

Not every pipeline needs a 10-minute, 5,000-user test. A practical strategy is:

  • Merge requests: lightweight performance smoke tests
  • Main branch deploys: moderate load testing
  • Release candidates: full stress testing

LoadForge makes this easier by letting you run different test profiles from the same CI/CD workflow.

Test from relevant geographies

If your users are global, performance can vary by region. LoadForge’s global test locations are useful for validating latency and throughput from the markets that matter most.

Common Pitfalls to Avoid

Teams often add load testing to GitLab CI but get misleading or low-value results. Avoid these mistakes:

Running heavy tests directly on CI runners

This is one of the biggest mistakes in CI/CD performance testing. Shared runners are not reliable load generators. Use LoadForge’s distributed testing instead.

Testing unrealistic endpoints only

A health check like /health or /ping tells you almost nothing about real application performance. Focus on business-critical flows.

Ignoring authentication

Many production bottlenecks happen during login, token issuance, session validation, and permission checks. If your test skips auth, you may miss major issues.

Using fake traffic patterns

If every virtual user requests the same endpoint at the same interval, results can be misleading. Use weighted tasks, realistic waits, and varied payloads.

Failing to set pass/fail thresholds

A load test that always “passes” is just reporting, not quality control. Define thresholds for latency, error rate, and throughput in your GitLab CI pipeline.

Overloading fragile test environments

Review apps and staging systems may not match production capacity. That’s fine, but calibrate expectations. Use them for regression detection, not necessarily for maximum-scale stress testing.

Not versioning your Locust scripts

Store your Locust files in the same GitLab repository as your application or infrastructure code. That way, your performance tests evolve with your app.

Conclusion

A GitLab CI load testing pipeline with LoadForge gives your team a practical way to catch performance regressions before users do. By combining GitLab’s automation with LoadForge’s cloud-based infrastructure, distributed testing, real-time reporting, global test locations, and CI/CD integration, you can turn load testing and performance testing into a repeatable part of every deployment.

Start simple with an authenticated Locust script, then expand into realistic API flows, admin workflows, and pipeline-based performance gates. Over time, this approach helps your team ship faster with more confidence.

If you’re ready to automate performance testing in GitLab CI, try LoadForge and build a pipeline that validates speed and scalability on every release.

Try LoadForge free for 7 days

Set up your first load test in under 2 minutes. No commitment.