Introduction

Serverless applications promise automatic scaling, lower operational overhead, and fast iteration cycles. But “auto-scaling” does not mean “infinitely fast” or “immune to failure.” Whether you are running AWS Lambda behind API Gateway, Azure Functions, Google Cloud Functions, or edge runtimes such as Cloudflare Workers and Vercel Edge Functions, you still need load testing to understand how your application behaves under real traffic.

Load testing serverless applications is especially important because serverless platforms introduce unique performance characteristics: cold starts, concurrency limits, burst scaling behavior, downstream dependency bottlenecks, and provider-specific throttling. A serverless app may look fine with a few users, then suddenly show elevated latency, 429 responses, or timeout errors as traffic ramps up.

In this guide, you’ll learn how to load test serverless applications using LoadForge and Locust. We’ll cover realistic scenarios for public APIs, authenticated endpoints, asynchronous job workflows, and edge function traffic. You’ll also see how to interpret performance testing results and avoid common mistakes when stress testing serverless systems.

Because LoadForge is built on Locust, every example uses practical Python scripts you can run in a cloud-based, distributed load testing environment. That means you can simulate traffic from multiple global test locations, watch real-time reporting, and integrate performance testing into your CI/CD pipeline.

Prerequisites

Before you start load testing your serverless application, make sure you have:

A deployed serverless application with publicly accessible endpoints
Permission to test the environment, ideally a staging or performance environment
Knowledge of your API routes, authentication requirements, and expected traffic patterns
Test credentials such as API keys, JWT tokens, or OAuth client credentials
A list of downstream dependencies your serverless functions call, such as:
- DynamoDB, Firestore, Cosmos DB, or other databases
- S3, Blob Storage, or object storage
- Third-party APIs like Stripe, SendGrid, or Auth0
- Queues such as SQS, Pub/Sub, or Service Bus
Expected service limits, including:
- Function timeout settings
- Memory allocation
- Reserved concurrency or burst concurrency
- API Gateway or edge rate limits

For LoadForge specifically, it helps to have:

A LoadForge account
Your Locust script ready for upload
Environment-specific variables such as host URLs, tokens, and test data
A target user count and ramp-up plan for load testing and stress testing

Understanding Serverless Under Load

Serverless applications behave differently from traditional monoliths or containerized services. To run meaningful performance testing, you need to understand the main bottlenecks.

Cold starts

When a function instance is not already running, the platform may need to initialize it before serving a request. This cold start can add noticeable latency, especially for:

Large deployment packages
Functions with heavy framework initialization
VPC-attached Lambdas
Functions using large dependency trees
Edge runtimes loading dynamic logic

During load testing, cold starts often appear as a latency spike at the beginning of a test or during sudden traffic bursts.

Concurrency scaling

Serverless platforms scale by creating more function instances, but there are still limits. For example:

AWS Lambda has account-level and function-level concurrency limits
API Gateway may throttle requests
Azure Functions may scale more gradually depending on the hosting plan
Edge runtimes may have CPU or execution limits per request

A load test helps you identify when concurrency scaling stops being smooth and starts causing throttling or timeouts.

Downstream services become the real bottleneck

In many serverless architectures, the function itself is not the slowest part. The real bottleneck is often:

A database query
A call to an external payment API
A cache miss
Object storage reads or writes
Queue backlog growth

This is why realistic load testing matters. If your function triggers multiple downstream calls, your test should reflect that usage pattern.

Event-driven and asynchronous processing

Many serverless apps accept a request quickly, enqueue work, and return a job ID. That means performance testing should measure both:

Front-door responsiveness
Completion time of the background workflow

Edge runtimes and geographic behavior

Edge functions run close to users, but latency still depends on origin calls, cache hit rates, and regional routing. Distributed testing from multiple geographies is particularly useful here, and LoadForge’s global test locations can help uncover region-specific issues.

Writing Your First Load Test

Let’s begin with a simple but realistic serverless API test. Imagine an e-commerce backend built on AWS Lambda + API Gateway. It exposes public endpoints for product browsing and search.

We’ll test:

GET /prod/health
GET /prod/products
GET /prod/products/
POST /prod/search

This baseline script is useful for measuring response times, throughput, and error rates under normal traffic.

python

from locust import HttpUser, task, between
import random
 
PRODUCT_IDS = [
    "sku_1001", "sku_1002", "sku_1003", "sku_1004", "sku_1005"
]
 
SEARCH_TERMS = [
    "wireless headphones",
    "mechanical keyboard",
    "usb-c hub",
    "4k monitor",
    "gaming mouse"
]
 
class ServerlessStoreUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        self.client.headers.update({
            "Accept": "application/json",
            "User-Agent": "LoadForge-Locust-Serverless-Test/1.0"
        })
 
    @task(2)
    def health_check(self):
        self.client.get("/prod/health", name="GET /health")
 
    @task(5)
    def list_products(self):
        self.client.get("/prod/products?category=electronics&limit=20", name="GET /products")
 
    @task(4)
    def product_detail(self):
        product_id = random.choice(PRODUCT_IDS)
        self.client.get(f"/prod/products/{product_id}", name="GET /products/:id")
 
    @task(3)
    def search_products(self):
        payload = {
            "query": random.choice(SEARCH_TERMS),
            "filters": {
                "inStock": True,
                "priceMin": 25,
                "priceMax": 500
            },
            "sort": "relevance",
            "limit": 10
        }
        self.client.post("/prod/search", json=payload, name="POST /search")

What this test does

This script simulates realistic browsing behavior against a serverless API. It mixes low-cost endpoints like health checks with more meaningful requests such as product lookups and search payloads.

Why this matters for serverless load testing

Even a basic test can reveal:

Cold start latency on infrequently called routes
API Gateway throttling
Search function performance under concurrent JSON payload processing
Slow product detail lookups caused by database reads

When you run this in LoadForge, start with a moderate ramp, such as 25 to 100 users, then increase gradually. Watch p95 and p99 latency, not just averages.

Advanced Load Testing Scenarios

Basic endpoint testing is a good start, but real serverless performance testing needs to reflect authentication, asynchronous workflows, and edge-specific patterns.

Scenario 1: Testing authenticated serverless APIs with JWT tokens

Many serverless applications protect endpoints using OAuth 2.0 or JWT authorizers. In this example, we authenticate against an identity endpoint, then call user-specific routes.

Imagine a Lambda-backed API with these routes:

POST /auth/token
GET /prod/account/profile
GET /prod/account/orders
POST /prod/cart/items

python

from locust import HttpUser, task, between
import random
 
class AuthenticatedServerlessUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        token_response = self.client.post(
            "/auth/token",
            json={
                "client_id": "load-test-client",
                "client_secret": "super-secret-for-staging",
                "audience": "serverless-store-api",
                "grant_type": "client_credentials"
            },
            name="POST /auth/token"
        )
 
        if token_response.status_code == 200:
            access_token = token_response.json().get("access_token")
            self.client.headers.update({
                "Authorization": f"Bearer {access_token}",
                "Accept": "application/json",
                "Content-Type": "application/json"
            })
 
    @task(3)
    def get_profile(self):
        self.client.get("/prod/account/profile", name="GET /account/profile")
 
    @task(4)
    def get_orders(self):
        self.client.get("/prod/account/orders?limit=10", name="GET /account/orders")
 
    @task(2)
    def add_to_cart(self):
        payload = {
            "productId": random.choice(["sku_1001", "sku_1002", "sku_1003"]),
            "quantity": random.randint(1, 3),
            "currency": "USD"
        }
        self.client.post("/prod/cart/items", json=payload, name="POST /cart/items")

What this scenario validates

This type of load test is useful for measuring:

Authentication overhead
JWT authorizer performance
Latency added by identity providers
Per-user API path behavior
How authenticated traffic differs from anonymous traffic

For serverless applications, authentication can become a hidden bottleneck. A Lambda authorizer or token validation step may add significant latency under load, especially if it depends on external key lookups or identity APIs.

Scenario 2: Testing asynchronous job processing in serverless workflows

A common serverless pattern is to accept a request, enqueue work, and return a job ID. The client then polls for completion. This is common in:

Report generation
Video or image processing
Document conversion
Data exports
AI inference workflows

Let’s test a report generation API:

POST /prod/reports
GET /prod/reports/

python

from locust import HttpUser, task, between
import random
import time
 
class ReportGenerationUser(HttpUser):
    wait_time = between(2, 5)
 
    def on_start(self):
        self.client.headers.update({
            "x-api-key": "staging-serverless-api-key",
            "Accept": "application/json",
            "Content-Type": "application/json"
        })
 
    @task
    def generate_and_poll_report(self):
        payload = {
            "reportType": "sales_summary",
            "dateRange": {
                "from": "2026-03-01",
                "to": "2026-03-31"
            },
            "filters": {
                "region": random.choice(["us-east-1", "eu-west-1", "ap-southeast-1"]),
                "channel": random.choice(["web", "mobile", "partner"])
            },
            "format": random.choice(["pdf", "csv"])
        }
 
        with self.client.post("/prod/reports", json=payload, name="POST /reports", catch_response=True) as response:
            if response.status_code != 202:
                response.failure(f"Unexpected status code: {response.status_code}")
                return
 
            job_id = response.json().get("jobId")
            if not job_id:
                response.failure("Missing jobId in response")
                return
 
        for _ in range(5):
            time.sleep(2)
            with self.client.get(f"/prod/reports/{job_id}", name="GET /reports/:jobId", catch_response=True) as poll_response:
                if poll_response.status_code == 200:
                    body = poll_response.json()
                    status = body.get("status")
 
                    if status == "completed":
                        poll_response.success()
                        return
                    elif status in ["queued", "processing"]:
                        poll_response.success()
                    else:
                        poll_response.failure(f"Unexpected report status: {status}")
                        return
                else:
                    poll_response.failure(f"Unexpected polling status code: {poll_response.status_code}")
                    return

Why asynchronous testing matters

If you only test the initial POST request, you may completely miss the real bottleneck. The function that enqueues a job may be fast, while the queue consumer, background Lambda, or storage layer becomes overloaded.

This scenario helps you evaluate:

Queue depth pressure
Background worker scaling
Time to completion
Retry behavior
End-user experience for long-running processes

In LoadForge, this is especially useful with distributed testing because you can simulate realistic bursts of report requests from many concurrent users.

Scenario 3: Load testing edge runtimes with personalization and origin fallback

Edge functions often handle request personalization, redirects, A/B testing, and lightweight API logic. They are fast when cache-friendly, but can slow down significantly when they call an origin service.

Imagine an edge application with these routes:

GET /edge/home
GET /edge/content/
POST /edge/track

The edge layer reads headers and cookies, applies personalization, and may fall back to an origin API.

python

from locust import HttpUser, task, between
import random
import uuid
 
SLUGS = [
    "spring-launch",
    "pricing",
    "docs-getting-started",
    "blog-serverless-performance",
    "enterprise-security"
]
 
class EdgeRuntimeUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        session_id = str(uuid.uuid4())
        variant = random.choice(["A", "B"])
        self.client.headers.update({
            "Accept": "application/json",
            "X-Geo-Country": random.choice(["US", "DE", "JP", "AU"]),
            "X-Device-Type": random.choice(["desktop", "mobile"]),
            "Cookie": f"session_id={session_id}; ab_variant={variant}",
            "User-Agent": "LoadForge-Edge-Test/1.0"
        })
 
    @task(5)
    def homepage(self):
        self.client.get("/edge/home", name="GET /edge/home")
 
    @task(4)
    def content_page(self):
        slug = random.choice(SLUGS)
        self.client.get(f"/edge/content/{slug}", name="GET /edge/content/:slug")
 
    @task(2)
    def track_event(self):
        payload = {
            "event": "page_view",
            "path": random.choice(["/edge/home", "/edge/pricing", "/edge/docs"]),
            "timestamp": "2026-04-06T12:00:00Z",
            "metadata": {
                "campaign": random.choice(["spring-2026", "retargeting", "organic"]),
                "referrer": random.choice(["google", "linkedin", "newsletter"])
            }
        }
        self.client.post("/edge/track", json=payload, name="POST /edge/track")

What this reveals

This test is useful for edge and serverless performance testing because it surfaces:

Cache effectiveness versus origin fallback latency
Geographic routing differences
Header and cookie processing overhead
Personalization logic costs
Event ingestion performance

When combined with LoadForge’s global test locations, you can compare how edge behavior changes by region and identify where users experience higher latency.

Analyzing Your Results

After running your load test, the next step is understanding what the data means for a serverless architecture.

Focus on percentile latency

Average response time can hide serious issues. For serverless applications, p95 and p99 are often more important because cold starts and scaling delays disproportionately affect tail latency.

Look for:

Low average latency but high p95 or p99
Sudden latency jumps during ramp-up
Specific routes with unstable response times

This often indicates cold starts, concurrency exhaustion, or downstream bottlenecks.

Watch error rates by endpoint

Different error patterns usually point to different problems:

429 Too Many Requests: API Gateway or function throttling
502/503: upstream integration failures or platform instability
504: function timeout or slow downstream service
401/403: authentication or token handling issues
500: application logic failures under concurrent load

LoadForge’s real-time reporting helps you spot these endpoint-level failures quickly as the test runs.

Compare throughput to concurrency

If users increase but requests per second flatten, your serverless application may be hitting a scaling limit. Investigate:

Reserved concurrency settings
Provider burst limits
Database connection exhaustion
Third-party API rate limits

Identify cold start signatures

Cold starts often appear as:

Very high latency for the first requests
Spikes after idle periods
More severe delays on less frequently used routes

If cold starts are a major issue, consider warming strategies, runtime optimization, or reducing package size.

Correlate with cloud metrics

Your load testing results are much more useful when paired with platform telemetry such as:

AWS CloudWatch Lambda duration, concurrent executions, throttles, init duration
Azure Monitor function execution count and duration
Google Cloud Monitoring request count and latency
Edge provider analytics for cache hit ratio and origin fetch latency

LoadForge gives you the client-side performance testing view, while cloud metrics reveal what happened inside the platform.

Performance Optimization Tips

Once your load test exposes bottlenecks, these are common ways to improve serverless performance.

Reduce cold start time

Minimize deployment package size
Remove unused dependencies
Use lighter frameworks where possible
Avoid expensive initialization in global scope
Consider provisioned concurrency for critical paths

Optimize memory and CPU allocation

In many serverless platforms, more memory also means more CPU. If functions are CPU-bound, increasing memory can reduce latency significantly.

Cache aggressively

For serverless and edge workloads, caching can dramatically improve performance:

Cache API responses where appropriate
Use CDN and edge caching
Store frequently accessed data in Redis or managed cache
Cache auth metadata or JWKS keys carefully

Protect downstream systems

Your function may scale faster than your database can handle. To avoid this:

Use connection pooling or serverless-friendly data access patterns
Add queues for burst absorption
Batch writes where possible
Use read replicas or caching layers for hot reads

Tune timeouts and retries

Retries can amplify load during incidents. Make sure your:

Function timeouts are realistic
Client retry policies are controlled
Queue retry behavior does not create cascading failures

Test with realistic traffic patterns

Steady-state load testing is useful, but serverless systems also need burst testing and stress testing. Real traffic often arrives in spikes, especially for APIs, webhooks, and event-driven workloads.

Common Pitfalls to Avoid

Testing only warm functions

If you run repeated tests back-to-back, you may underestimate latency because your functions stay warm. Include tests that simulate idle periods or sudden bursts.

Ignoring authentication overhead

Many teams load test only public endpoints, then discover later that authenticated routes are much slower due to token validation or authorizer logic.

Using unrealistic payloads

Tiny JSON payloads may not reflect production behavior. Use realistic request bodies, query parameters, and headers.

Forgetting downstream dependencies

A Lambda function that performs well in isolation may still fail when its database, queue, or third-party API is under pressure. Good performance testing should reflect the full request path.

Overlooking provider limits

Serverless platforms are elastic, but not limitless. Always account for:

Concurrency quotas
API rate limits
Execution duration limits
Payload size restrictions

Running tests from a single region

This is especially problematic for edge runtimes and globally distributed APIs. Use distributed testing to understand how users experience your application from different locations.

Measuring only request success

For asynchronous serverless workflows, a 202 Accepted response does not mean the user’s task completed successfully. Measure end-to-end workflow behavior whenever possible.

Conclusion

Load testing serverless applications is essential if you want to validate scaling behavior, catch cold start issues, and understand how your APIs, functions, and edge runtimes perform under real traffic. Whether you are testing AWS Lambda, Azure Functions, Google Cloud Functions, or edge platforms, the key is to use realistic scenarios that include authentication, asynchronous workflows, and downstream dependencies.

With LoadForge, you can run cloud-based, distributed load testing using Locust scripts like the ones in this guide, monitor real-time reporting, test from global locations, and integrate performance testing into your CI/CD process. If you want to confidently load test and stress test your serverless architecture before production traffic does it for you, try LoadForge.

How to Load Test Serverless Applications

Introduction

Prerequisites

Understanding Serverless Under Load

Cold starts

Concurrency scaling

Downstream services become the real bottleneck

Event-driven and asynchronous processing

Edge runtimes and geographic behavior

Writing Your First Load Test

What this test does

Why this matters for serverless load testing

Advanced Load Testing Scenarios

Scenario 1: Testing authenticated serverless APIs with JWT tokens

What this scenario validates

Scenario 2: Testing asynchronous job processing in serverless workflows

Why asynchronous testing matters

Scenario 3: Load testing edge runtimes with personalization and origin fallback

What this reveals

Analyzing Your Results

Focus on percentile latency

Watch error rates by endpoint

Compare throughput to concurrency

Identify cold start signatures

Correlate with cloud metrics

Performance Optimization Tips

Reduce cold start time

Optimize memory and CPU allocation

Cache aggressively

Protect downstream systems

Tune timeouts and retries

Test with realistic traffic patterns

Common Pitfalls to Avoid

Testing only warm functions

Ignoring authentication overhead

Using unrealistic payloads

Forgetting downstream dependencies

Overlooking provider limits

Running tests from a single region

Measuring only request success

Conclusion

Try LoadForge free for 7 days

Related guides

Apache Load Testing Guide with LoadForge

AWS Load Testing Guide with LoadForge

Azure Functions Load Testing Guide