LoadForge LogoLoadForge

How to Load Test Serverless Applications

How to Load Test Serverless Applications

Introduction

Serverless applications promise automatic scaling, lower operational overhead, and fast iteration cycles. But “auto-scaling” does not mean “infinitely fast” or “immune to failure.” Whether you are running AWS Lambda behind API Gateway, Azure Functions, Google Cloud Functions, or edge runtimes such as Cloudflare Workers and Vercel Edge Functions, you still need load testing to understand how your application behaves under real traffic.

Load testing serverless applications is especially important because serverless platforms introduce unique performance characteristics: cold starts, concurrency limits, burst scaling behavior, downstream dependency bottlenecks, and provider-specific throttling. A serverless app may look fine with a few users, then suddenly show elevated latency, 429 responses, or timeout errors as traffic ramps up.

In this guide, you’ll learn how to load test serverless applications using LoadForge and Locust. We’ll cover realistic scenarios for public APIs, authenticated endpoints, asynchronous job workflows, and edge function traffic. You’ll also see how to interpret performance testing results and avoid common mistakes when stress testing serverless systems.

Because LoadForge is built on Locust, every example uses practical Python scripts you can run in a cloud-based, distributed load testing environment. That means you can simulate traffic from multiple global test locations, watch real-time reporting, and integrate performance testing into your CI/CD pipeline.

Prerequisites

Before you start load testing your serverless application, make sure you have:

  • A deployed serverless application with publicly accessible endpoints
  • Permission to test the environment, ideally a staging or performance environment
  • Knowledge of your API routes, authentication requirements, and expected traffic patterns
  • Test credentials such as API keys, JWT tokens, or OAuth client credentials
  • A list of downstream dependencies your serverless functions call, such as:
    • DynamoDB, Firestore, Cosmos DB, or other databases
    • S3, Blob Storage, or object storage
    • Third-party APIs like Stripe, SendGrid, or Auth0
    • Queues such as SQS, Pub/Sub, or Service Bus
  • Expected service limits, including:
    • Function timeout settings
    • Memory allocation
    • Reserved concurrency or burst concurrency
    • API Gateway or edge rate limits

For LoadForge specifically, it helps to have:

  • A LoadForge account
  • Your Locust script ready for upload
  • Environment-specific variables such as host URLs, tokens, and test data
  • A target user count and ramp-up plan for load testing and stress testing

Understanding Serverless Under Load

Serverless applications behave differently from traditional monoliths or containerized services. To run meaningful performance testing, you need to understand the main bottlenecks.

Cold starts

When a function instance is not already running, the platform may need to initialize it before serving a request. This cold start can add noticeable latency, especially for:

  • Large deployment packages
  • Functions with heavy framework initialization
  • VPC-attached Lambdas
  • Functions using large dependency trees
  • Edge runtimes loading dynamic logic

During load testing, cold starts often appear as a latency spike at the beginning of a test or during sudden traffic bursts.

Concurrency scaling

Serverless platforms scale by creating more function instances, but there are still limits. For example:

  • AWS Lambda has account-level and function-level concurrency limits
  • API Gateway may throttle requests
  • Azure Functions may scale more gradually depending on the hosting plan
  • Edge runtimes may have CPU or execution limits per request

A load test helps you identify when concurrency scaling stops being smooth and starts causing throttling or timeouts.

Downstream services become the real bottleneck

In many serverless architectures, the function itself is not the slowest part. The real bottleneck is often:

  • A database query
  • A call to an external payment API
  • A cache miss
  • Object storage reads or writes
  • Queue backlog growth

This is why realistic load testing matters. If your function triggers multiple downstream calls, your test should reflect that usage pattern.

Event-driven and asynchronous processing

Many serverless apps accept a request quickly, enqueue work, and return a job ID. That means performance testing should measure both:

  • Front-door responsiveness
  • Completion time of the background workflow

Edge runtimes and geographic behavior

Edge functions run close to users, but latency still depends on origin calls, cache hit rates, and regional routing. Distributed testing from multiple geographies is particularly useful here, and LoadForge’s global test locations can help uncover region-specific issues.

Writing Your First Load Test

Let’s begin with a simple but realistic serverless API test. Imagine an e-commerce backend built on AWS Lambda + API Gateway. It exposes public endpoints for product browsing and search.

We’ll test:

  • GET /prod/health
  • GET /prod/products
  • GET /prod/products/
  • POST /prod/search

This baseline script is useful for measuring response times, throughput, and error rates under normal traffic.

python
from locust import HttpUser, task, between
import random
 
PRODUCT_IDS = [
    "sku_1001", "sku_1002", "sku_1003", "sku_1004", "sku_1005"
]
 
SEARCH_TERMS = [
    "wireless headphones",
    "mechanical keyboard",
    "usb-c hub",
    "4k monitor",
    "gaming mouse"
]
 
class ServerlessStoreUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        self.client.headers.update({
            "Accept": "application/json",
            "User-Agent": "LoadForge-Locust-Serverless-Test/1.0"
        })
 
    @task(2)
    def health_check(self):
        self.client.get("/prod/health", name="GET /health")
 
    @task(5)
    def list_products(self):
        self.client.get("/prod/products?category=electronics&limit=20", name="GET /products")
 
    @task(4)
    def product_detail(self):
        product_id = random.choice(PRODUCT_IDS)
        self.client.get(f"/prod/products/{product_id}", name="GET /products/:id")
 
    @task(3)
    def search_products(self):
        payload = {
            "query": random.choice(SEARCH_TERMS),
            "filters": {
                "inStock": True,
                "priceMin": 25,
                "priceMax": 500
            },
            "sort": "relevance",
            "limit": 10
        }
        self.client.post("/prod/search", json=payload, name="POST /search")

What this test does

This script simulates realistic browsing behavior against a serverless API. It mixes low-cost endpoints like health checks with more meaningful requests such as product lookups and search payloads.

Why this matters for serverless load testing

Even a basic test can reveal:

  • Cold start latency on infrequently called routes
  • API Gateway throttling
  • Search function performance under concurrent JSON payload processing
  • Slow product detail lookups caused by database reads

When you run this in LoadForge, start with a moderate ramp, such as 25 to 100 users, then increase gradually. Watch p95 and p99 latency, not just averages.

Advanced Load Testing Scenarios

Basic endpoint testing is a good start, but real serverless performance testing needs to reflect authentication, asynchronous workflows, and edge-specific patterns.

Scenario 1: Testing authenticated serverless APIs with JWT tokens

Many serverless applications protect endpoints using OAuth 2.0 or JWT authorizers. In this example, we authenticate against an identity endpoint, then call user-specific routes.

Imagine a Lambda-backed API with these routes:

  • POST /auth/token
  • GET /prod/account/profile
  • GET /prod/account/orders
  • POST /prod/cart/items
python
from locust import HttpUser, task, between
import random
 
class AuthenticatedServerlessUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        token_response = self.client.post(
            "/auth/token",
            json={
                "client_id": "load-test-client",
                "client_secret": "super-secret-for-staging",
                "audience": "serverless-store-api",
                "grant_type": "client_credentials"
            },
            name="POST /auth/token"
        )
 
        if token_response.status_code == 200:
            access_token = token_response.json().get("access_token")
            self.client.headers.update({
                "Authorization": f"Bearer {access_token}",
                "Accept": "application/json",
                "Content-Type": "application/json"
            })
 
    @task(3)
    def get_profile(self):
        self.client.get("/prod/account/profile", name="GET /account/profile")
 
    @task(4)
    def get_orders(self):
        self.client.get("/prod/account/orders?limit=10", name="GET /account/orders")
 
    @task(2)
    def add_to_cart(self):
        payload = {
            "productId": random.choice(["sku_1001", "sku_1002", "sku_1003"]),
            "quantity": random.randint(1, 3),
            "currency": "USD"
        }
        self.client.post("/prod/cart/items", json=payload, name="POST /cart/items")

What this scenario validates

This type of load test is useful for measuring:

  • Authentication overhead
  • JWT authorizer performance
  • Latency added by identity providers
  • Per-user API path behavior
  • How authenticated traffic differs from anonymous traffic

For serverless applications, authentication can become a hidden bottleneck. A Lambda authorizer or token validation step may add significant latency under load, especially if it depends on external key lookups or identity APIs.

Scenario 2: Testing asynchronous job processing in serverless workflows

A common serverless pattern is to accept a request, enqueue work, and return a job ID. The client then polls for completion. This is common in:

  • Report generation
  • Video or image processing
  • Document conversion
  • Data exports
  • AI inference workflows

Let’s test a report generation API:

  • POST /prod/reports
  • GET /prod/reports/
python
from locust import HttpUser, task, between
import random
import time
 
class ReportGenerationUser(HttpUser):
    wait_time = between(2, 5)
 
    def on_start(self):
        self.client.headers.update({
            "x-api-key": "staging-serverless-api-key",
            "Accept": "application/json",
            "Content-Type": "application/json"
        })
 
    @task
    def generate_and_poll_report(self):
        payload = {
            "reportType": "sales_summary",
            "dateRange": {
                "from": "2026-03-01",
                "to": "2026-03-31"
            },
            "filters": {
                "region": random.choice(["us-east-1", "eu-west-1", "ap-southeast-1"]),
                "channel": random.choice(["web", "mobile", "partner"])
            },
            "format": random.choice(["pdf", "csv"])
        }
 
        with self.client.post("/prod/reports", json=payload, name="POST /reports", catch_response=True) as response:
            if response.status_code != 202:
                response.failure(f"Unexpected status code: {response.status_code}")
                return
 
            job_id = response.json().get("jobId")
            if not job_id:
                response.failure("Missing jobId in response")
                return
 
        for _ in range(5):
            time.sleep(2)
            with self.client.get(f"/prod/reports/{job_id}", name="GET /reports/:jobId", catch_response=True) as poll_response:
                if poll_response.status_code == 200:
                    body = poll_response.json()
                    status = body.get("status")
 
                    if status == "completed":
                        poll_response.success()
                        return
                    elif status in ["queued", "processing"]:
                        poll_response.success()
                    else:
                        poll_response.failure(f"Unexpected report status: {status}")
                        return
                else:
                    poll_response.failure(f"Unexpected polling status code: {poll_response.status_code}")
                    return

Why asynchronous testing matters

If you only test the initial POST request, you may completely miss the real bottleneck. The function that enqueues a job may be fast, while the queue consumer, background Lambda, or storage layer becomes overloaded.

This scenario helps you evaluate:

  • Queue depth pressure
  • Background worker scaling
  • Time to completion
  • Retry behavior
  • End-user experience for long-running processes

In LoadForge, this is especially useful with distributed testing because you can simulate realistic bursts of report requests from many concurrent users.

Scenario 3: Load testing edge runtimes with personalization and origin fallback

Edge functions often handle request personalization, redirects, A/B testing, and lightweight API logic. They are fast when cache-friendly, but can slow down significantly when they call an origin service.

Imagine an edge application with these routes:

  • GET /edge/home
  • GET /edge/content/
  • POST /edge/track

The edge layer reads headers and cookies, applies personalization, and may fall back to an origin API.

python
from locust import HttpUser, task, between
import random
import uuid
 
SLUGS = [
    "spring-launch",
    "pricing",
    "docs-getting-started",
    "blog-serverless-performance",
    "enterprise-security"
]
 
class EdgeRuntimeUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        session_id = str(uuid.uuid4())
        variant = random.choice(["A", "B"])
        self.client.headers.update({
            "Accept": "application/json",
            "X-Geo-Country": random.choice(["US", "DE", "JP", "AU"]),
            "X-Device-Type": random.choice(["desktop", "mobile"]),
            "Cookie": f"session_id={session_id}; ab_variant={variant}",
            "User-Agent": "LoadForge-Edge-Test/1.0"
        })
 
    @task(5)
    def homepage(self):
        self.client.get("/edge/home", name="GET /edge/home")
 
    @task(4)
    def content_page(self):
        slug = random.choice(SLUGS)
        self.client.get(f"/edge/content/{slug}", name="GET /edge/content/:slug")
 
    @task(2)
    def track_event(self):
        payload = {
            "event": "page_view",
            "path": random.choice(["/edge/home", "/edge/pricing", "/edge/docs"]),
            "timestamp": "2026-04-06T12:00:00Z",
            "metadata": {
                "campaign": random.choice(["spring-2026", "retargeting", "organic"]),
                "referrer": random.choice(["google", "linkedin", "newsletter"])
            }
        }
        self.client.post("/edge/track", json=payload, name="POST /edge/track")

What this reveals

This test is useful for edge and serverless performance testing because it surfaces:

  • Cache effectiveness versus origin fallback latency
  • Geographic routing differences
  • Header and cookie processing overhead
  • Personalization logic costs
  • Event ingestion performance

When combined with LoadForge’s global test locations, you can compare how edge behavior changes by region and identify where users experience higher latency.

Analyzing Your Results

After running your load test, the next step is understanding what the data means for a serverless architecture.

Focus on percentile latency

Average response time can hide serious issues. For serverless applications, p95 and p99 are often more important because cold starts and scaling delays disproportionately affect tail latency.

Look for:

  • Low average latency but high p95 or p99
  • Sudden latency jumps during ramp-up
  • Specific routes with unstable response times

This often indicates cold starts, concurrency exhaustion, or downstream bottlenecks.

Watch error rates by endpoint

Different error patterns usually point to different problems:

  • 429 Too Many Requests: API Gateway or function throttling
  • 502/503: upstream integration failures or platform instability
  • 504: function timeout or slow downstream service
  • 401/403: authentication or token handling issues
  • 500: application logic failures under concurrent load

LoadForge’s real-time reporting helps you spot these endpoint-level failures quickly as the test runs.

Compare throughput to concurrency

If users increase but requests per second flatten, your serverless application may be hitting a scaling limit. Investigate:

  • Reserved concurrency settings
  • Provider burst limits
  • Database connection exhaustion
  • Third-party API rate limits

Identify cold start signatures

Cold starts often appear as:

  • Very high latency for the first requests
  • Spikes after idle periods
  • More severe delays on less frequently used routes

If cold starts are a major issue, consider warming strategies, runtime optimization, or reducing package size.

Correlate with cloud metrics

Your load testing results are much more useful when paired with platform telemetry such as:

  • AWS CloudWatch Lambda duration, concurrent executions, throttles, init duration
  • Azure Monitor function execution count and duration
  • Google Cloud Monitoring request count and latency
  • Edge provider analytics for cache hit ratio and origin fetch latency

LoadForge gives you the client-side performance testing view, while cloud metrics reveal what happened inside the platform.

Performance Optimization Tips

Once your load test exposes bottlenecks, these are common ways to improve serverless performance.

Reduce cold start time

  • Minimize deployment package size
  • Remove unused dependencies
  • Use lighter frameworks where possible
  • Avoid expensive initialization in global scope
  • Consider provisioned concurrency for critical paths

Optimize memory and CPU allocation

In many serverless platforms, more memory also means more CPU. If functions are CPU-bound, increasing memory can reduce latency significantly.

Cache aggressively

For serverless and edge workloads, caching can dramatically improve performance:

  • Cache API responses where appropriate
  • Use CDN and edge caching
  • Store frequently accessed data in Redis or managed cache
  • Cache auth metadata or JWKS keys carefully

Protect downstream systems

Your function may scale faster than your database can handle. To avoid this:

  • Use connection pooling or serverless-friendly data access patterns
  • Add queues for burst absorption
  • Batch writes where possible
  • Use read replicas or caching layers for hot reads

Tune timeouts and retries

Retries can amplify load during incidents. Make sure your:

  • Function timeouts are realistic
  • Client retry policies are controlled
  • Queue retry behavior does not create cascading failures

Test with realistic traffic patterns

Steady-state load testing is useful, but serverless systems also need burst testing and stress testing. Real traffic often arrives in spikes, especially for APIs, webhooks, and event-driven workloads.

Common Pitfalls to Avoid

Testing only warm functions

If you run repeated tests back-to-back, you may underestimate latency because your functions stay warm. Include tests that simulate idle periods or sudden bursts.

Ignoring authentication overhead

Many teams load test only public endpoints, then discover later that authenticated routes are much slower due to token validation or authorizer logic.

Using unrealistic payloads

Tiny JSON payloads may not reflect production behavior. Use realistic request bodies, query parameters, and headers.

Forgetting downstream dependencies

A Lambda function that performs well in isolation may still fail when its database, queue, or third-party API is under pressure. Good performance testing should reflect the full request path.

Overlooking provider limits

Serverless platforms are elastic, but not limitless. Always account for:

  • Concurrency quotas
  • API rate limits
  • Execution duration limits
  • Payload size restrictions

Running tests from a single region

This is especially problematic for edge runtimes and globally distributed APIs. Use distributed testing to understand how users experience your application from different locations.

Measuring only request success

For asynchronous serverless workflows, a 202 Accepted response does not mean the user’s task completed successfully. Measure end-to-end workflow behavior whenever possible.

Conclusion

Load testing serverless applications is essential if you want to validate scaling behavior, catch cold start issues, and understand how your APIs, functions, and edge runtimes perform under real traffic. Whether you are testing AWS Lambda, Azure Functions, Google Cloud Functions, or edge platforms, the key is to use realistic scenarios that include authentication, asynchronous workflows, and downstream dependencies.

With LoadForge, you can run cloud-based, distributed load testing using Locust scripts like the ones in this guide, monitor real-time reporting, test from global locations, and integrate performance testing into your CI/CD process. If you want to confidently load test and stress test your serverless architecture before production traffic does it for you, try LoadForge.

Try LoadForge free for 7 days

Set up your first load test in under 2 minutes. No commitment.