
Introduction
HTTP/2 changed how modern web applications and APIs handle traffic. With multiplexing, header compression, and connection reuse, HTTP/2 can significantly improve latency and throughput compared to HTTP/1.1. But those benefits only show up when your application, reverse proxy, CDN, and backend services are configured correctly under real-world load.
That’s why load testing HTTP/2 applications matters. It’s not enough to confirm that an endpoint returns a 200 OK in development. You need to understand how your HTTP/2 stack behaves when hundreds or thousands of users share persistent connections, send concurrent requests, and stress your API gateway, TLS termination layer, and upstream services.
In this guide, you’ll learn how to load test HTTP/2 applications with LoadForge using Locust-based Python scripts. We’ll cover realistic HTTP/2 scenarios including authenticated API traffic, multiplexed requests, large JSON payloads, and mixed read/write workloads. You’ll also learn how to analyze performance testing results and identify bottlenecks related to connection efficiency, latency, and server-side concurrency.
LoadForge makes this easier with cloud-based infrastructure, distributed testing, real-time reporting, CI/CD integration, and global test locations—so you can validate HTTP/2 performance from multiple regions and at meaningful scale.
Prerequisites
Before you start load testing an HTTP/2 application with LoadForge, make sure you have the following:
- A target application or API that supports HTTP/2
- A staging or pre-production environment that mirrors production as closely as possible
- Test credentials for authenticated endpoints
- Knowledge of your key user flows and API paths
- Expected performance goals, such as:
- p95 latency under 300 ms
- error rate below 1%
- stable throughput at 2,000 requests per second
- efficient connection reuse under sustained load
You should also confirm that your HTTP/2 support is actually enabled end-to-end. In many environments, HTTP/2 is terminated at a load balancer or reverse proxy such as NGINX, Envoy, AWS ALB, or Cloudflare, while backend services still communicate over HTTP/1.1. That’s not necessarily a problem, but it does affect what exactly you are testing.
Useful things to verify before running a performance test:
curl -I --http2 https://api.example.com/v1/healthYou may also want to inspect TLS and ALPN negotiation using tools like openssl, browser developer tools, or your ingress/controller logs.
For LoadForge specifically, you’ll create a Locust test script and upload it as your test scenario. LoadForge handles the distributed execution and reporting, so you can focus on modeling realistic HTTP/2 traffic patterns.
Understanding HTTP/2 Under Load
HTTP/2 behaves differently from HTTP/1.1, especially under concurrency. When you load test HTTP/2 applications, you are not just measuring raw request speed—you are evaluating how effectively the stack handles multiplexed streams, connection persistence, and protocol-level efficiency.
Key HTTP/2 behaviors that affect load testing
Multiplexing
HTTP/2 allows multiple requests and responses to share a single TCP connection. This reduces head-of-line blocking at the application layer and can improve performance for APIs and web apps that make many parallel requests.
Under load, you want to observe:
- Whether latency remains stable as concurrent streams increase
- Whether the server enforces stream limits too aggressively
- Whether upstream services become the real bottleneck even if the frontend connection looks efficient
Header compression
HTTP/2 uses HPACK to compress headers. This can reduce overhead, especially for APIs with repeated authorization or tracing headers.
However, large or highly dynamic headers can still create pressure on proxies and gateways. Watch for:
- Increased CPU usage at the edge
- Latency spikes on authenticated endpoints
- Issues with large JWT tokens or tracing metadata
Connection reuse
HTTP/2 is designed to reduce the need for many parallel TCP connections. In theory, fewer connections can support more work. In practice, connection reuse depends on:
- Client behavior
- Load balancer configuration
- Idle timeout settings
- TLS termination performance
- Proxy buffering and stream settings
TLS overhead
Most HTTP/2 traffic runs over TLS. That means your performance testing should account for certificate handling, handshake behavior, and connection lifetime. A poorly tuned TLS layer can erase many HTTP/2 benefits.
Common bottlenecks in HTTP/2 systems
When load testing HTTP/2 APIs, the bottleneck is often not the protocol itself. Common problem areas include:
- API gateways with low concurrent stream settings
- Reverse proxies with insufficient worker processes
- CPU saturation during TLS termination
- Backend services unable to keep up with frontend concurrency
- Database contention behind highly efficient API layers
- Misconfigured keepalive or timeout values causing unnecessary reconnects
A good HTTP/2 load test should therefore simulate realistic user behavior rather than just hammering one endpoint in isolation.
Writing Your First Load Test
Let’s start with a basic Locust script for an HTTP/2-enabled REST API. This example targets a fictional SaaS platform with common endpoints such as health checks, product listings, and account summaries.
Even though your application uses HTTP/2, the focus in Locust remains modeling realistic HTTP traffic patterns. LoadForge then helps you scale that script across distributed workers to observe behavior under meaningful load.
Basic HTTP/2 API load test
from locust import HttpUser, task, between
class Http2ApiUser(HttpUser):
wait_time = between(1, 3)
host = "https://api.shopstream.example"
default_headers = {
"Accept": "application/json",
"User-Agent": "LoadForge-Locust-HTTP2-Test/1.0"
}
@task(3)
def health_check(self):
self.client.get(
"/v1/health",
headers=self.default_headers,
name="GET /v1/health"
)
@task(5)
def list_products(self):
self.client.get(
"/v1/products?category=electronics&limit=24&sort=popular",
headers=self.default_headers,
name="GET /v1/products"
)
@task(2)
def account_summary(self):
self.client.get(
"/v1/account/summary",
headers=self.default_headers,
name="GET /v1/account/summary"
)What this script does
This first script models a lightweight read-heavy workload:
GET /v1/healthchecks service responsivenessGET /v1/productssimulates a catalog browsing requestGET /v1/account/summaryrepresents a personalized API call
The @task weights define relative frequency. Product listing runs more often than account summary or health checks, which is more realistic for many public-facing APIs.
Why this matters for HTTP/2
With HTTP/2, these requests may share fewer persistent connections than they would under HTTP/1.1. During load testing, pay attention to:
- Whether response times stay consistent as user count rises
- Whether the product listing endpoint causes backend saturation
- Whether personalized endpoints perform worse due to auth, caching, or database lookups
This script is a good baseline for initial performance testing in LoadForge before moving to more realistic authenticated and write-heavy scenarios.
Advanced Load Testing Scenarios
Basic tests are useful, but most HTTP/2 applications need deeper coverage. Below are more realistic scenarios that reflect how modern APIs behave in production.
Scenario 1: Authenticated HTTP/2 API traffic with bearer tokens
Many HTTP/2 applications are secured behind OAuth 2.0 or JWT-based authentication. This example logs in once per user session, stores an access token, and exercises authenticated endpoints.
from locust import HttpUser, task, between
import json
class AuthenticatedHttp2User(HttpUser):
wait_time = between(1, 2)
host = "https://api.shopstream.example"
def on_start(self):
login_payload = {
"email": "loadtest.user@example.com",
"password": "SuperSecurePass123!",
"device_id": "lf-http2-user-001"
}
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"User-Agent": "LoadForge-Locust-HTTP2-Test/1.0"
}
with self.client.post(
"/v1/auth/login",
data=json.dumps(login_payload),
headers=headers,
name="POST /v1/auth/login",
catch_response=True
) as response:
if response.status_code == 200:
body = response.json()
self.access_token = body.get("access_token")
if not self.access_token:
response.failure("No access_token returned")
else:
response.failure(f"Login failed: {response.status_code}")
def auth_headers(self):
return {
"Accept": "application/json",
"Authorization": f"Bearer {self.access_token}",
"User-Agent": "LoadForge-Locust-HTTP2-Test/1.0"
}
@task(4)
def get_profile(self):
self.client.get(
"/v1/users/me",
headers=self.auth_headers(),
name="GET /v1/users/me"
)
@task(3)
def get_orders(self):
self.client.get(
"/v1/orders?status=processing&limit=10",
headers=self.auth_headers(),
name="GET /v1/orders"
)
@task(2)
def get_notifications(self):
self.client.get(
"/v1/notifications?unread=true",
headers=self.auth_headers(),
name="GET /v1/notifications"
)What to watch in this scenario
This test is useful for measuring:
- Authentication overhead under concurrent sessions
- Header compression effectiveness with repeated
Authorizationheaders - Latency differences between cached and uncached authenticated endpoints
- Whether token validation or identity middleware becomes a bottleneck
If your API gateway performs JWT verification on every request, HTTP/2 can reduce connection overhead, but CPU usage may still spike due to auth processing.
Scenario 2: Mixed read/write workload with realistic JSON payloads
Read-only testing rarely tells the full story. Many HTTP/2 APIs support writes, updates, and multi-step workflows. This script simulates cart activity and checkout operations in an e-commerce API.
from locust import HttpUser, task, between
import json
import random
import uuid
class EcommerceHttp2User(HttpUser):
wait_time = between(1, 4)
host = "https://api.shopstream.example"
def on_start(self):
self.access_token = None
login_payload = {
"email": "checkout.tester@example.com",
"password": "CheckoutPass456!"
}
headers = {
"Content-Type": "application/json",
"Accept": "application/json"
}
response = self.client.post(
"/v1/auth/login",
data=json.dumps(login_payload),
headers=headers,
name="POST /v1/auth/login"
)
if response.status_code == 200:
self.access_token = response.json().get("access_token")
def headers(self):
return {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"Bearer {self.access_token}"
}
@task(5)
def browse_product(self):
product_id = random.choice([1012, 1018, 1044, 1099, 1107])
self.client.get(
f"/v1/products/{product_id}",
headers=self.headers(),
name="GET /v1/products/:id"
)
@task(3)
def add_to_cart(self):
payload = {
"product_id": random.choice([1012, 1018, 1044, 1099, 1107]),
"quantity": random.randint(1, 3),
"currency": "USD"
}
self.client.post(
"/v1/cart/items",
data=json.dumps(payload),
headers=self.headers(),
name="POST /v1/cart/items"
)
@task(1)
def checkout(self):
payload = {
"cart_id": str(uuid.uuid4()),
"payment_method": {
"type": "card",
"token": "tok_visa_4242_http2_test"
},
"shipping_address": {
"first_name": "Load",
"last_name": "Tester",
"line1": "123 Performance Ave",
"city": "Austin",
"state": "TX",
"postal_code": "78701",
"country": "US"
}
}
with self.client.post(
"/v1/checkout",
data=json.dumps(payload),
headers=self.headers(),
name="POST /v1/checkout",
catch_response=True
) as response:
if response.status_code not in [200, 201, 202]:
response.failure(f"Unexpected checkout status: {response.status_code}")Why this scenario is important
This script exercises:
- Read-heavy traffic on product endpoints
- Write contention on cart operations
- More expensive transaction processing during checkout
For HTTP/2 load testing, this is where you can start seeing whether efficient connection handling exposes backend bottlenecks faster. If HTTP/2 reduces network overhead, your application may drive more traffic into databases, caches, and payment orchestration layers than before.
Scenario 3: Parallel resource loading and dashboard APIs
HTTP/2 shines when clients request multiple resources concurrently. A common example is a dashboard or SPA making several API requests at once after login. This Locust example uses a task set to simulate a user loading a dashboard with multiple dependent API calls.
from locust import HttpUser, task, between
import json
class DashboardHttp2User(HttpUser):
wait_time = between(2, 5)
host = "https://api.analytics.example"
def on_start(self):
payload = {
"client_id": "dashboard-web",
"client_secret": "lf-secret-demo",
"audience": "https://api.analytics.example",
"grant_type": "client_credentials"
}
headers = {
"Content-Type": "application/json",
"Accept": "application/json"
}
response = self.client.post(
"/oauth/token",
data=json.dumps(payload),
headers=headers,
name="POST /oauth/token"
)
self.token = response.json().get("access_token")
def auth_headers(self):
return {
"Accept": "application/json",
"Authorization": f"Bearer {self.token}"
}
@task
def load_dashboard(self):
self.client.get(
"/v2/dashboard/summary",
headers=self.auth_headers(),
name="GET /v2/dashboard/summary"
)
self.client.get(
"/v2/dashboard/traffic?range=24h",
headers=self.auth_headers(),
name="GET /v2/dashboard/traffic"
)
self.client.get(
"/v2/dashboard/errors?range=24h&limit=50",
headers=self.auth_headers(),
name="GET /v2/dashboard/errors"
)
self.client.get(
"/v2/dashboard/top-services?range=24h",
headers=self.auth_headers(),
name="GET /v2/dashboard/top-services"
)
self.client.get(
"/v2/dashboard/alerts?status=open",
headers=self.auth_headers(),
name="GET /v2/dashboard/alerts"
)How to use this scenario
This is a strong candidate for stress testing because dashboard-style traffic often creates bursts of parallel requests. With LoadForge, you can scale this pattern across many workers and regions to see:
- Whether API gateway stream limits cause queuing
- Whether dashboard endpoints compete for the same database or cache resources
- Whether p95 and p99 latency rise sharply during bursty traffic
- Whether HTTP/2 connection reuse improves efficiency compared to older HTTP/1.1 behavior
Analyzing Your Results
Once your test runs in LoadForge, the next step is interpreting the results correctly. For HTTP/2 applications, average response time alone is not enough.
Metrics that matter most
Response time percentiles
Look closely at:
- p50 for typical user experience
- p95 for degraded but still common behavior
- p99 for tail latency and burst sensitivity
HTTP/2 often improves averages, but tail latency can still be poor if backend services are overloaded.
Requests per second
A higher request rate with stable latency is usually a good sign. But if throughput rises while errors also rise, you may simply be overwhelming the application more efficiently.
Error rate
Watch for:
429 Too Many Requests502 Bad Gateway503 Service Unavailable504 Gateway Timeout
In HTTP/2 environments, these often point to overloaded proxies, gateways, or upstream pools rather than protocol failure itself.
Response distribution by endpoint
Break down results per route:
/v1/auth/login/v1/products/v1/cart/items/v1/checkout
This helps you identify whether slow performance is isolated to expensive operations or systemic across the API.
Connection and infrastructure signals
LoadForge gives you real-time reporting, but you should also correlate with infrastructure metrics from your app stack:
- CPU and memory on load balancers and API gateways
- TLS handshake rate
- Active connections and stream counts
- Backend service latency
- Database query duration and lock contention
- Cache hit ratio
What healthy HTTP/2 behavior looks like
A well-performing HTTP/2 application under load often shows:
- Stable latency as concurrency increases gradually
- Lower connection churn than equivalent HTTP/1.1 tests
- Efficient handling of repeated authenticated requests
- Better throughput for multi-request workflows like dashboards or SPAs
What unhealthy HTTP/2 behavior looks like
Potential warning signs include:
- Rising p95/p99 latency even when average latency looks fine
- Sharp error spikes at moderate concurrency
- Login endpoints becoming slow due to auth provider bottlenecks
- Gateway or ingress CPU saturation before app servers are busy
- Write endpoints degrading much faster than read endpoints
LoadForge’s distributed testing is especially useful here. If performance differs by region, the issue may involve CDN routing, TLS termination geography, or cross-region backend calls.
Performance Optimization Tips
When your HTTP/2 load testing reveals problems, these are some of the most common fixes to investigate.
Tune your reverse proxy or gateway
Check settings for:
- maximum concurrent streams
- keepalive timeouts
- worker processes
- upstream connection pools
- header size limits
A default gateway configuration may not be suitable for high-volume HTTP/2 traffic.
Reduce auth overhead
If JWT verification or session lookup is expensive:
- cache token validation results where appropriate
- reduce unnecessary claims in tokens
- avoid oversized headers
- optimize middleware chains
Optimize backend dependencies
HTTP/2 can make your frontend more efficient, which may expose slow databases or caches faster. Review:
- slow SQL queries
- N+1 query patterns
- cache miss rates
- lock contention
- downstream API latency
Separate heavy endpoints
Endpoints like checkout, report generation, or analytics aggregation often need different scaling strategies than lightweight reads.
Consider:
- queueing expensive work
- adding endpoint-specific rate limits
- precomputing common dashboard queries
- isolating heavy services from general API traffic
Validate TLS performance
Since HTTP/2 usually runs over TLS, optimize:
- certificate chain configuration
- cipher selection
- session resumption
- load balancer TLS offload capacity
Test from multiple locations
Use LoadForge’s global test locations to identify whether latency or throughput issues are region-specific. This is especially important for APIs behind CDNs, global load balancers, or regionally distributed backends.
Common Pitfalls to Avoid
Load testing HTTP/2 applications has a few traps that can lead to misleading results.
Assuming HTTP/2 automatically means better performance
HTTP/2 improves transport efficiency, but it does not fix slow code, overloaded databases, or poor caching. Always interpret protocol gains in the context of the whole stack.
Testing only one endpoint
A single fast endpoint tells you very little about real-world performance. Use mixed workloads that reflect production traffic patterns.
Ignoring authentication and headers
Auth flows, bearer tokens, cookies, and tracing headers can significantly affect HTTP/2 performance. Include them in your tests when they exist in production.
Using unrealistic user behavior
If your real users browse products, update carts, and load dashboards, your load test should too. Synthetic tests that only hit /health or a cached endpoint will not reveal meaningful bottlenecks.
Not correlating with server metrics
Locust and LoadForge show request-level performance, but you also need logs and infrastructure telemetry to understand why latency or errors increased.
Running tests against production without safeguards
Stress testing live systems can impact real users. Prefer staging environments, controlled traffic windows, and clear rollback plans.
Forgetting regional differences
HTTP/2 performance can vary by geography due to TLS termination points, CDN routing, and backend placement. Distributed testing helps uncover this.
Overlooking CI/CD performance regression testing
HTTP/2 regressions often appear after changes to ingress config, auth middleware, or API gateway rules. Add LoadForge to your CI/CD pipeline so performance testing becomes part of regular delivery.
Conclusion
HTTP/2 can deliver major gains in latency, multiplexing efficiency, and connection reuse—but only when your full application stack is ready for real traffic. By load testing realistic authenticated flows, mixed read/write APIs, and bursty multi-request dashboards, you can uncover the bottlenecks that matter before they affect users.
With LoadForge, you can run cloud-based HTTP/2 load testing at scale using Locust scripts, analyze results in real time, test from global locations, and integrate performance testing into your CI/CD workflow. If you want to validate how your HTTP/2 APIs behave under real-world load, now is the perfect time to try LoadForge.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

How to Load Test API Rate Limiting with LoadForge
Test API rate limiting with LoadForge to verify throttling rules, retry behavior, and service stability during traffic spikes.

Load Testing API Gateways with LoadForge
Discover how to load test API gateways with LoadForge to measure routing performance, latency, and resilience under heavy traffic.

Load Testing GraphQL APIs with LoadForge
Discover how to load test GraphQL APIs with LoadForge, including queries, mutations, concurrency, and performance bottlenecks.