LoadForge LogoLoadForge

How to Load Test an API: A Step-by-Step Tutorial

How to Load Test an API: A Step-by-Step Tutorial

APIs power nearly every modern application. Whether you are building a mobile app, a single-page frontend, a microservices architecture, or exposing data to third-party partners, your API is the critical layer that holds everything together. A single slow or failing endpoint can cascade into broken user experiences, lost revenue, and damaged trust. Load testing your API is how you find those failure points before your users do.

This tutorial walks you through the entire process of load testing a REST API, from identifying which endpoints to target all the way through interpreting your results. Every step includes practical Python code you can adapt to your own API.

Why Load Test Your API?

An API that responds in 50 milliseconds for a single request might take 5 seconds -- or time out entirely -- when 1,000 users hit it simultaneously. Functional tests tell you whether your API returns the right data. Load tests tell you whether it can keep doing that under pressure.

Specifically, load testing your API reveals:

  • Throughput ceiling: The maximum number of requests per second your API can sustain before performance degrades.
  • Latency under load: How response times change as concurrent users increase. An endpoint that averages 100ms at 10 users might climb to 2 seconds at 500 users.
  • Error behavior: Which endpoints start returning 500 errors first, and at what concurrency level. Some endpoints will fail long before others.
  • Resource bottlenecks: Whether the limiting factor is CPU, memory, database connections, or a downstream service. Pairing load test data with server metrics pinpoints exactly where to invest optimization effort.

Without load testing, you are guessing about your API's capacity. With it, you have hard numbers to inform architecture decisions, capacity planning, and SLA commitments.

Prerequisites

Before you begin, make sure you have the following:

  • An API to test: A REST API with documented endpoints. You need to know the base URL, available routes, expected request payloads, and authentication requirements. A staging or development environment is strongly recommended over production.
  • A LoadForge account or local Locust installation: LoadForge runs Locust-based tests in the cloud with distributed load generation across multiple regions. You can also run Locust locally for initial script development. Install it with pip install locust if you want to test scripts on your machine first.
  • Basic Python knowledge: Locust test scripts are written in Python. You do not need to be an expert -- the examples in this tutorial cover everything you need -- but familiarity with classes, functions, and dictionaries will help.

Step 1: Identify Critical Endpoints

Not all endpoints deserve equal attention. Start by cataloging your API's endpoints and prioritizing them based on real-world usage patterns.

What to Prioritize

Focus your load testing effort on endpoints that meet one or more of these criteria:

  • High-traffic endpoints: The routes that receive the most requests in production. Check your access logs or analytics to identify these.
  • Resource-intensive operations: Endpoints that trigger complex database queries, large joins, aggregations, or heavy computation.
  • Authentication flows: Login and token refresh endpoints are critical. If authentication breaks under load, every authenticated endpoint is effectively down.
  • Search and filter endpoints: These often involve full-text search, multiple joins, or external service calls. They tend to degrade faster than simple CRUD operations.

Document Your Test Plan

Create a simple table mapping out the endpoints you intend to test. This becomes your test plan and helps you assign realistic traffic weights later.

PriorityMethodEndpointDescription
HighGET/api/productsProduct listing (most traffic)
HighPOST/api/auth/loginAuthentication
MediumGET/api/products/Single product
MediumPOST/api/ordersCreate order
LowGET/api/user/profileUser profile

With this table in hand, you can write test scripts that mirror actual usage patterns rather than hammering a single endpoint in isolation.

Step 2: Write a Basic GET Test

Let's start with the simplest possible load test: hitting GET endpoints with simulated users. The following Locust script creates virtual users that browse product listings and view individual products.

from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 2)

    @task(3)
    def list_products(self):
        with self.client.get("/api/products", catch_response=True) as response:
            if response.status_code != 200:
                response.failure(f"Got {response.status_code}")

    @task(1)
    def get_product(self):
        with self.client.get("/api/products/1", catch_response=True) as response:
            if response.status_code != 200:
                response.failure(f"Got {response.status_code}")

Key Concepts

Task weighting controls how often each task runs relative to the others. The @task(3) decorator on list_products means it will be called roughly three times for every one call to get_product. This lets you model realistic traffic distributions. If your product listing page gets 75% of traffic, weight it accordingly.

catch_response=True gives you control over what counts as a success or failure. By default, Locust considers any non-exception response a success -- even a 500 error. With catch_response=True, you can inspect the response and explicitly mark it as a failure using response.failure(). This is essential for API testing, where a 200 status code with an empty body or error message in the JSON payload should still count as a failure.

wait_time = between(1, 2) adds a random pause of 1 to 2 seconds between each user's requests. This simulates realistic human behavior rather than flooding the API with zero-delay requests, which would not represent real-world usage.

Step 3: Testing POST Endpoints

GET requests are only half the picture. Most APIs accept data through POST, PUT, and PATCH endpoints. Testing these is critical because write operations typically involve database transactions, validation logic, and side effects that are more resource-intensive than reads.

import json
from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 2)

    @task
    def create_order(self):
        payload = {
            "product_id": 1,
            "quantity": 2,
            "shipping_address": {
                "street": "123 Test Street",
                "city": "Loadville",
                "zip": "10001"
            }
        }
        headers = {"Content-Type": "application/json"}
        with self.client.post("/api/orders", json=payload, headers=headers, catch_response=True) as response:
            if response.status_code not in [200, 201]:
                response.failure(f"Order failed: {response.status_code}")

What to Watch For

JSON payloads: The json=payload parameter automatically serializes your Python dictionary to JSON and sets the appropriate content type. You can also pass the headers dictionary explicitly if your API requires additional headers like API keys or custom content types.

Status code validation: POST endpoints typically return either 200 OK or 201 Created. Your validation logic should accept both. Some APIs return 202 Accepted for asynchronous operations -- adjust your expected codes accordingly.

Side effects: Every successful POST request in your load test creates real data. If you are running 500 virtual users each creating orders every few seconds, you will generate thousands of records. Plan for this by using a dedicated test database, implementing cleanup scripts, or testing against an environment where data pollution is acceptable.

Step 4: Handling Authentication

Most production APIs require authentication. The most common pattern for REST APIs is token-based authentication, where you send credentials to a login endpoint and receive a token that you include in subsequent requests.

from locust import HttpUser, task, between

class AuthenticatedAPIUser(HttpUser):
    wait_time = between(1, 2)
    token = None

    def on_start(self):
        """Login and store the auth token"""
        response = self.client.post("/api/auth/login", json={
            "email": "testuser@example.com",
            "password": "testpassword123"
        })
        self.token = response.json()["access_token"]

    @task
    def get_profile(self):
        headers = {"Authorization": f"Bearer {self.token}"}
        self.client.get("/api/user/profile", headers=headers)

    @task
    def list_orders(self):
        headers = {"Authorization": f"Bearer {self.token}"}
        self.client.get("/api/user/orders", headers=headers)

How This Works

on_start is a lifecycle method that Locust calls once for each simulated user before it begins executing tasks. This is the right place to perform login, session setup, or any other initialization logic. Each virtual user gets its own instance of the class, so each user logs in independently and stores its own token.

Token storage: The self.token attribute persists across all task executions for that virtual user. Every subsequent request includes the token in the Authorization header using the standard Bearer token pattern.

Important Considerations

  • Use a dedicated test account. Never use production user credentials in load tests. Create one or more test accounts specifically for this purpose.
  • Handle token expiration. If your tokens expire during the test, you will need to implement refresh logic. You can add a check at the beginning of each task, or catch 401 responses and re-authenticate.
  • Multiple test users. If your API enforces per-user rate limits, you may need to cycle through multiple test accounts. You can load credentials from a CSV file or use parameterized usernames.

Step 5: Dynamic Data and Parameterization

Hitting the same endpoint with identical parameters every time is not realistic. Real users browse different products, search for different terms, and submit different data. Parameterization makes your load test more realistic and prevents your API's caching layer from masking performance problems.

import random
from locust import HttpUser, task, between

class DynamicAPIUser(HttpUser):
    wait_time = between(1, 2)

    product_ids = list(range(1, 101))

    @task
    def get_random_product(self):
        product_id = random.choice(self.product_ids)
        self.client.get(f"/api/products/{product_id}", name="/api/products/[id]")

    @task
    def search_products(self):
        queries = ["laptop", "phone", "headphones", "keyboard", "monitor"]
        query = random.choice(queries)
        self.client.get(f"/api/products/search?q={query}", name="/api/products/search")

The name Parameter

The name parameter is critical when testing parameterized URLs. Without it, Locust would treat /api/products/1, /api/products/2, and /api/products/99 as 100 separate endpoints in the results. By setting name="/api/products/[id]", all requests to individual product endpoints are grouped into a single statistics entry. This gives you a clean, readable results table instead of hundreds of individual rows.

Expanding to Larger Datasets

For more sophisticated parameterization, load test data from a CSV file:

import csv
import random
from locust import HttpUser, task, between

class CSVDrivenUser(HttpUser):
    wait_time = between(1, 2)
    test_data = []

    def on_start(self):
        if not CSVDrivenUser.test_data:
            with open("test_products.csv", "r") as f:
                reader = csv.DictReader(f)
                CSVDrivenUser.test_data = list(reader)

    @task
    def get_product(self):
        item = random.choice(self.test_data)
        self.client.get(
            f"/api/products/{item['id']}",
            name="/api/products/[id]"
        )

This approach lets you test with production-representative data -- real product IDs, realistic search queries, or actual user identifiers -- without hardcoding values in your script.

Step 6: Configuring Load Profiles

How you ramp up and sustain virtual users has a significant impact on what your test reveals. Different load profiles answer different questions.

Ramp-Up (Gradual Increase)

Start with a small number of users and increase over time. This is the best approach for finding your API's breaking point. You can observe exactly when response times start degrading and when errors begin appearing. A typical configuration might start at 10 users and add 5 users per second until you reach 500.

Steady State (Constant Load)

Hold a fixed number of users for an extended period. This is ideal for endurance testing -- verifying that your API can sustain a known load without degradation over time. Memory leaks, connection pool exhaustion, and database lock contention often only surface after minutes or hours of sustained load.

Spike (Sudden Burst)

Jump from a low user count to a high one instantly. This tests your API's ability to handle sudden traffic surges and is particularly useful for validating auto-scaling configurations. If your infrastructure takes 3 minutes to scale up but your traffic spike happens in 10 seconds, your users will experience failures during that gap.

If you are testing an API for the first time, start with this configuration:

  • Users: 50
  • Spawn rate: 5 users per second
  • Duration: 10 minutes

This gives you a gentle ramp-up (reaching 50 users in 10 seconds) and enough sustained load to see meaningful patterns. Once this baseline test completes cleanly, double the user count in each subsequent run. Continue until you find the point where response times exceed your SLA threshold or error rates climb above acceptable levels.

On LoadForge, you can configure these parameters directly in the test settings, and distribute the load across multiple geographic regions to simulate realistic global traffic patterns.

Step 7: Analyzing API Load Test Results

Running the test is only half the work. The real value comes from interpreting the results and turning them into actionable improvements.

Response Times by Endpoint

Look at median, p95, and p99 response times for each endpoint individually. The average is often misleading because a few very fast responses can mask occasional slow ones. Your slowest endpoints are where optimization effort will have the greatest impact.

p95 vs p99 Latency

If your SLA promises responses under 500 milliseconds, check the p99 latency, not the average. The p99 value tells you the response time that 99% of requests were faster than. If your p99 is 2 seconds, that means 1 in 100 requests takes over 2 seconds -- and at 10,000 requests per minute, that is 100 users per minute having a bad experience.

Throughput Ceiling

Watch your requests per second (RPS) as user count increases. In a healthy system, RPS climbs proportionally with users. When RPS plateaus or starts declining despite adding more users, you have found a bottleneck. The system is saturated and additional users are just queuing up.

Error Rate by Endpoint

Not all endpoints fail at the same time. Your product listing endpoint might handle 1,000 concurrent users while your order creation endpoint starts failing at 200. Breaking down error rates by endpoint tells you exactly which parts of your API need attention.

Correlating with Server Metrics

The most powerful analysis combines load test results with server-side metrics: CPU utilization, memory usage, database connection pool usage, and disk I/O. When response times spike at 300 users and your database CPU hits 100% at the same moment, you know the database is the bottleneck. Without this correlation, you are left guessing about root causes.

LoadForge provides detailed graphs of response times, throughput, and error rates over the course of each test run. You can overlay these with your infrastructure monitoring tools (Datadog, Grafana, CloudWatch, etc.) to get the complete picture.

Common API Load Testing Pitfalls

Even experienced developers make these mistakes when load testing APIs. Avoid them to get reliable, actionable results.

  • Rate limiting: Your API's rate limiter will treat load test traffic the same as any other traffic and block requests once thresholds are exceeded. Before testing, whitelist your test IPs or temporarily increase rate limits on your staging environment. Otherwise, your results will reflect rate limiter behavior rather than actual API performance.

  • Test data pollution: POST, PUT, and DELETE requests create, modify, and remove real data. A load test that creates 50,000 orders will leave 50,000 orders in your database. Always test against a dedicated test environment, and have a cleanup strategy -- whether that is database snapshots you can restore, truncation scripts, or an environment that gets rebuilt automatically.

  • Ignoring dependent services: Your API does not exist in isolation. It calls databases, caches, message queues, third-party APIs, and other microservices. Load testing your API gateway while the downstream payment service cannot handle the load will give you misleading results. Ensure all dependencies can sustain the test load, or mock external services that you do not control.

  • Unrealistic payloads: Sending the same tiny JSON payload on every request does not reflect production reality. If your real users submit product reviews with 2,000 characters of text and three image URLs, your test payloads should be similar. Data size affects serialization time, validation cost, database write performance, and network transfer.

  • Single endpoint testing: Testing only your most popular endpoint in isolation tells you very little about real-world performance. In production, your API handles a mix of reads, writes, authentication requests, and searches simultaneously. These operations compete for database connections, CPU, and memory. Test the full mix with realistic proportions.

Conclusion

Load testing your API is not optional if you are building anything that needs to handle real traffic. This tutorial has taken you from identifying critical endpoints through writing parameterized test scripts with authentication, all the way to analyzing results and avoiding common mistakes.

The key takeaways are straightforward: prioritize the endpoints that matter most, write test scripts that reflect realistic user behavior, start with conservative load levels and scale up methodically, and always analyze results at the per-endpoint level rather than relying on aggregate numbers.

If you are new to load testing concepts, our guide on what is load testing covers the fundamentals in depth. For testing full websites with page navigation and browser-like behavior rather than raw API endpoints, see our load testing tutorial.

When you are ready to move beyond local testing, LoadForge lets you run these same Locust scripts at scale from distributed cloud infrastructure. You can simulate thousands of concurrent API users from multiple global regions, schedule recurring tests to catch regressions, and track performance trends over time -- all without managing your own load generation servers.

Try LoadForge free for 7 days

Set up your first load test in under 2 minutes. No commitment.