
Introduction
FastAPI is one of the most popular Python web frameworks for building high-performance APIs. Its asynchronous request handling, automatic OpenAPI generation, and strong typing make it especially attractive for teams building modern backend services. But even though FastAPI is designed for speed, that does not mean your application will automatically perform well under real-world traffic.
A FastAPI app can still suffer from slow database queries, blocking I/O, inefficient dependency injection, overloaded authentication flows, or poor horizontal scaling behavior. That is why load testing FastAPI applications is essential before releasing new features, onboarding large customers, or preparing for traffic spikes.
In this FastAPI load testing guide, you will learn how to use LoadForge to run realistic load testing, performance testing, and stress testing scenarios against FastAPI services. We will cover basic endpoint validation, authenticated API traffic, mixed user journeys, and heavier workflows such as file uploads and report generation. Since LoadForge uses Locust under the hood, every example in this guide uses practical Python-based Locust scripts you can run and extend easily.
With LoadForge, you can execute distributed testing from global test locations, monitor real-time reporting, and integrate performance tests into your CI/CD pipeline so FastAPI regressions are caught early.
Prerequisites
Before you start load testing FastAPI with LoadForge, make sure you have the following:
- A running FastAPI application in a test or staging environment
- Base URL for your API, such as
https://api-staging.example.com - Test accounts or a way to generate authentication tokens
- Sample data seeded into your database
- A list of important endpoints to validate
- Expected performance targets, such as:
- 95th percentile latency under 300 ms
- Error rate below 1%
- Stable throughput at 500 requests per second
You should also know a few implementation details about your FastAPI app:
- Whether endpoints are async or sync
- Which routes require OAuth2 or JWT authentication
- Whether background tasks or Celery workers are involved
- Which endpoints are database-heavy
- Whether file upload, report export, or search endpoints are critical
For realistic results, avoid testing against local development servers like uvicorn --reload. Instead, use an environment that resembles production, including your ASGI server setup, reverse proxy, database, cache, and authentication service.
Understanding FastAPI Under Load
FastAPI is built on Starlette and commonly served with Uvicorn or Gunicorn/Uvicorn workers. It performs very well for concurrent I/O-bound workloads, especially when endpoints are implemented with proper async patterns. However, load testing FastAPI is not just about measuring raw framework speed. It is about understanding how your full application stack behaves under concurrency.
How FastAPI handles concurrent requests
FastAPI can process many concurrent requests efficiently when:
- Endpoints are truly asynchronous
- Database drivers support async I/O
- External API calls are non-blocking
- Long-running tasks are offloaded properly
This makes FastAPI a strong choice for APIs with many simultaneous users. But if your code includes blocking database calls, synchronous HTTP requests, CPU-heavy serialization, or expensive authentication checks, performance can degrade quickly.
Common FastAPI bottlenecks
When running load testing or stress testing on FastAPI, these are the most common issues teams discover:
Blocking calls inside async endpoints
An endpoint defined with async def can still block the event loop if it uses synchronous libraries. For example:
requestsinstead ofhttpx.AsyncClient- synchronous SQLAlchemy sessions
- slow filesystem access
- CPU-heavy JSON processing
Database connection exhaustion
FastAPI itself may be fast, but your database pool may not be. Under load, symptoms include:
- rising response times
- timeout errors
- 500 responses from overloaded connection pools
- increased lock contention
Authentication overhead
JWT validation, token refresh, and permission lookups can become expensive if every request triggers database access or external identity service calls.
Serialization and validation costs
FastAPI relies heavily on Pydantic models. Large nested payloads or high-volume response serialization can add measurable latency under heavy traffic.
Background task contention
If your API triggers report generation, image processing, or notifications, background workers may become the actual bottleneck even when HTTP responses look healthy.
That is why realistic performance testing for FastAPI must include both lightweight and heavy endpoints, authenticated traffic, and workflows that resemble real user behavior.
Writing Your First Load Test
Let’s start with a simple but realistic FastAPI load test. Imagine your application exposes these endpoints:
GET /healthGET /api/v1/productsGET /api/v1/products/{product_id}
This first Locust script validates basic read traffic and provides a baseline for response times.
from locust import HttpUser, task, between
class FastAPIBasicUser(HttpUser):
wait_time = between(1, 3)
@task(2)
def health_check(self):
self.client.get("/health", name="/health")
@task(3)
def list_products(self):
params = {
"category": "laptops",
"limit": 20,
"sort": "price_asc"
}
self.client.get("/api/v1/products", params=params, name="/api/v1/products")
@task(1)
def get_product_detail(self):
product_id = 1001
self.client.get(f"/api/v1/products/{product_id}", name="/api/v1/products/:id")What this script does
This script simulates a user who:
- checks service health
- browses a product listing endpoint
- opens a product detail page
The task weights make product listing more frequent than product detail requests, which is common in e-commerce and catalog APIs.
Why this matters for FastAPI
This basic test helps you answer several questions:
- Can FastAPI maintain low latency on common GET endpoints?
- Are lightweight endpoints truly fast under concurrency?
- Is routing, validation, and serialization overhead acceptable?
- Does performance stay consistent as virtual users increase?
Running this in LoadForge
In LoadForge, you can paste this Locust script into a test, set your target host, and scale users gradually. Since LoadForge provides cloud-based infrastructure and distributed testing, you can simulate traffic from multiple regions if your FastAPI API serves a global audience.
A good first test plan might be:
- 25 users for 2 minutes
- 100 users for 5 minutes
- 300 users for 10 minutes
This gives you a baseline before moving into more advanced authenticated and write-heavy scenarios.
Advanced Load Testing Scenarios
Basic endpoint testing is useful, but real FastAPI performance testing should include authentication, writes, and heavier business workflows. Below are several realistic Locust examples tailored to common FastAPI application patterns.
Authenticated JWT workflow with FastAPI OAuth2
FastAPI apps often use OAuth2 password flow with JWT bearer tokens. The following test logs users in through /api/v1/auth/login, stores the token, and uses it for subsequent requests.
from locust import HttpUser, task, between
import random
class FastAPIAuthenticatedUser(HttpUser):
wait_time = between(1, 2)
def on_start(self):
credentials = random.choice([
{"username": "loadtest1@example.com", "password": "TestPass123!"},
{"username": "loadtest2@example.com", "password": "TestPass123!"},
{"username": "loadtest3@example.com", "password": "TestPass123!"},
])
response = self.client.post(
"/api/v1/auth/login",
json=credentials,
name="/api/v1/auth/login"
)
if response.status_code == 200:
token = response.json().get("access_token")
self.client.headers.update({
"Authorization": f"Bearer {token}"
})
@task(4)
def get_profile(self):
self.client.get("/api/v1/users/me", name="/api/v1/users/me")
@task(3)
def list_orders(self):
self.client.get(
"/api/v1/orders",
params={"status": "completed", "limit": 10},
name="/api/v1/orders"
)
@task(2)
def get_notifications(self):
self.client.get(
"/api/v1/notifications",
params={"unread_only": "true"},
name="/api/v1/notifications"
)
@task(1)
def refresh_token(self):
self.client.post("/api/v1/auth/refresh", name="/api/v1/auth/refresh")What this test reveals
This scenario is useful for measuring:
- login endpoint performance under concurrent authentication
- JWT issuance and validation overhead
- latency added by protected routes
- whether user-specific queries cause database contention
For FastAPI applications, this is especially important because authentication dependencies often run on every protected request. If your dependency chain performs database lookups or external calls, latency can rise quickly under load.
API write operations and validation-heavy endpoints
FastAPI is often used for internal business APIs, SaaS backends, and workflow systems. In these cases, POST and PATCH endpoints matter just as much as GET requests. The next script simulates users creating support tickets, updating preferences, and searching records.
from locust import HttpUser, task, between
import random
import uuid
class FastAPIWorkflowUser(HttpUser):
wait_time = between(2, 5)
def on_start(self):
response = self.client.post(
"/api/v1/auth/login",
json={
"username": "agent@example.com",
"password": "SupportPass123!"
},
name="/api/v1/auth/login"
)
if response.status_code == 200:
token = response.json()["access_token"]
self.client.headers.update({
"Authorization": f"Bearer {token}"
})
@task(3)
def search_customers(self):
params = {
"query": "smith",
"page": 1,
"page_size": 25,
"include_inactive": "false"
}
self.client.get("/api/v1/customers/search", params=params, name="/api/v1/customers/search")
@task(2)
def create_ticket(self):
payload = {
"customer_id": random.randint(1000, 1500),
"subject": f"Payment issue {uuid.uuid4().hex[:8]}",
"priority": random.choice(["low", "medium", "high"]),
"category": "billing",
"message": "Customer reports duplicate charge on latest invoice.",
"tags": ["billing", "chargeback-review"]
}
self.client.post("/api/v1/tickets", json=payload, name="/api/v1/tickets")
@task(1)
def update_preferences(self):
payload = {
"email_notifications": random.choice([True, False]),
"sms_notifications": False,
"theme": random.choice(["light", "dark"]),
"timezone": "America/New_York"
}
self.client.patch("/api/v1/users/me/preferences", json=payload, name="/api/v1/users/me/preferences")Why this scenario is realistic
This test is valuable because it exercises:
- request body parsing
- Pydantic validation
- authenticated writes
- database inserts and updates
- search endpoints with filtering
These are all common pressure points in FastAPI applications. If performance degrades here, the issue may not be FastAPI itself but your validation models, ORM usage, indexing strategy, or transaction handling.
File upload and report generation scenario
Many FastAPI apps support document uploads, CSV imports, or async report generation. These workflows are often much heavier than standard API requests. They can also expose bottlenecks in reverse proxies, object storage integrations, and background workers.
from locust import HttpUser, task, between
import io
import csv
import random
import time
class FastAPIFileProcessingUser(HttpUser):
wait_time = between(3, 6)
def on_start(self):
response = self.client.post(
"/api/v1/auth/login",
json={
"username": "ops@example.com",
"password": "OpsPass123!"
},
name="/api/v1/auth/login"
)
if response.status_code == 200:
token = response.json()["access_token"]
self.client.headers.update({
"Authorization": f"Bearer {token}"
})
def generate_csv_file(self):
output = io.StringIO()
writer = csv.writer(output)
writer.writerow(["email", "first_name", "last_name", "plan"])
for i in range(100):
writer.writerow([
f"user{i}@example.com",
f"User{i}",
"LoadTest",
random.choice(["free", "pro", "enterprise"])
])
return output.getvalue().encode("utf-8")
@task(2)
def upload_customer_import(self):
csv_content = self.generate_csv_file()
files = {
"file": ("customers.csv", csv_content, "text/csv")
}
data = {
"source": "dashboard",
"send_welcome_email": "false"
}
self.client.post(
"/api/v1/imports/customers",
files=files,
data=data,
name="/api/v1/imports/customers"
)
@task(1)
def generate_sales_report(self):
payload = {
"start_date": "2025-01-01",
"end_date": "2025-01-31",
"group_by": "region",
"format": "csv"
}
response = self.client.post(
"/api/v1/reports/sales",
json=payload,
name="/api/v1/reports/sales"
)
if response.status_code == 202:
report_id = response.json().get("report_id")
for _ in range(5):
status_response = self.client.get(
f"/api/v1/reports/{report_id}/status",
name="/api/v1/reports/:id/status"
)
if status_response.status_code == 200:
status = status_response.json().get("status")
if status == "completed":
self.client.get(
f"/api/v1/reports/{report_id}/download",
name="/api/v1/reports/:id/download"
)
break
time.sleep(1)What this test helps you uncover
This scenario is excellent for stress testing FastAPI systems that depend on:
- multipart file parsing
- object storage
- background jobs
- polling APIs
- large response generation
It also helps you distinguish between synchronous API responsiveness and downstream processing capacity. A FastAPI endpoint may return 202 Accepted quickly, but the real bottleneck may appear in job queues, worker pools, or report generation services.
Analyzing Your Results
After running your FastAPI load test in LoadForge, focus on the metrics that actually tell you whether the application is healthy under load.
Response time percentiles
Do not rely only on average response time. Percentiles are more useful:
- 50th percentile shows typical experience
- 95th percentile shows how slower requests behave
- 99th percentile exposes tail latency problems
FastAPI APIs often look fine at the median while authenticated or database-heavy endpoints degrade at the tail.
Throughput
Look at requests per second across key endpoints:
- Is throughput stable as user count rises?
- Does it flatten earlier than expected?
- Does one endpoint dominate server resources?
If throughput stops increasing while latency spikes, you may have reached a bottleneck in your app server, database, or external dependencies.
Error rate
Watch for:
- 401 or 403 errors from broken auth flows
- 422 validation errors from malformed test payloads
- 429 responses from rate limiting
- 500 errors from application failures
- 502/504 errors from reverse proxies or upstream timeouts
In FastAPI specifically, a high number of 422 responses may indicate your load test payloads do not match the schema expected by your Pydantic models.
Endpoint-level comparison
Compare endpoint performance side by side:
/healthshould remain fast even under load- list endpoints should scale predictably
- authenticated endpoints may show moderate overhead
- write-heavy or file-processing endpoints will usually degrade first
LoadForge’s real-time reporting makes it easier to spot which FastAPI routes are causing latency spikes or failures during a test run.
Ramp-up behavior
A gradual increase in latency is normal. A sudden jump often indicates:
- exhausted database pool
- event loop saturation
- worker process limits
- cache misses under concurrency
- queueing at reverse proxy or load balancer level
With LoadForge’s distributed testing, you can also compare whether latency varies by region, which is useful for globally deployed FastAPI APIs.
Performance Optimization Tips
Once your load testing identifies weak points, these optimizations commonly improve FastAPI performance.
Use async-compatible libraries consistently
If your FastAPI endpoints are async, make sure the rest of the stack is too:
- use
httpxfor outbound HTTP - use async database drivers where appropriate
- avoid blocking file or network operations in request handlers
Tune ASGI worker configuration
FastAPI performance depends heavily on deployment setup. Test different combinations of:
- Uvicorn workers
- Gunicorn worker count
- keep-alive settings
- timeout values
Load testing helps you find the right balance for CPU and memory usage.
Optimize database access
A large percentage of FastAPI performance issues come from the data layer. Focus on:
- query indexing
- reducing N+1 queries
- connection pool sizing
- caching hot reads
- limiting payload size for list endpoints
Reduce response payload overhead
Large JSON responses increase serialization time and bandwidth usage. Consider:
- pagination
- field filtering
- compressed responses
- avoiding deeply nested models where not needed
Cache expensive dependencies
If authentication, permissions, or configuration lookups are repeated on every request, caching can significantly reduce latency.
Offload long-running work
Use background workers for:
- exports
- image processing
- email sending
- data imports
- analytics jobs
Then load test both the API layer and the worker pipeline separately.
Add performance testing to CI/CD
FastAPI teams move quickly, and regressions can happen after schema, ORM, or dependency changes. LoadForge supports CI/CD integration so you can run repeatable performance testing as part of your deployment workflow.
Common Pitfalls to Avoid
When load testing FastAPI, avoid these common mistakes.
Testing only /health or trivial GET endpoints
FastAPI can handle simple routes very well, but that does not reflect real production usage. Include authentication, writes, search, and expensive endpoints.
Using unrealistic test data
If all users hit the same record or reuse the same payload, results may be distorted by database caching or lock contention. Use varied IDs, search terms, and request bodies where possible.
Ignoring authentication flows
Protected endpoints often behave very differently from public ones. Always include realistic token generation and authenticated traffic in your performance testing.
Overlooking background systems
Your FastAPI app may return quickly while queues, workers, storage, or third-party APIs become overloaded. Measure the full workflow, not just the initial HTTP response.
Running tests against development configuration
Local dev servers, debug mode, and auto-reload settings do not represent production performance. Test against a production-like deployment.
Treating async as automatically fast
Async helps with concurrency, but it does not fix blocking code, slow SQL, or bad architecture. Load testing is how you verify whether your FastAPI implementation truly benefits from asynchronous design.
Skipping gradual ramp-up
Jumping directly to extreme traffic can make results harder to interpret. Start with baseline load testing, then move to stress testing and spike testing.
Conclusion
FastAPI is capable of excellent performance, but real-world speed depends on much more than the framework itself. Authentication, database access, serialization, background jobs, and deployment configuration all affect how your API behaves under load. By using realistic Locust scripts and running them on LoadForge, you can validate async performance, measure latency accurately, and scale with confidence.
LoadForge makes FastAPI load testing easier with cloud-based infrastructure, distributed testing, global test locations, real-time reporting, and CI/CD integration. If you are ready to uncover bottlenecks and improve your FastAPI application before users do, try LoadForge and start building performance tests that reflect real production traffic.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

ASP.NET Load Testing Guide with LoadForge
Learn how to load test ASP.NET applications with LoadForge to find performance issues and ensure your app handles peak traffic.

CakePHP Load Testing Guide with LoadForge
Load test CakePHP applications with LoadForge to benchmark app performance, simulate traffic, and improve scalability.

Django Load Testing Guide with LoadForge
Discover how to load test Django applications with LoadForge to measure performance, handle traffic spikes, and improve stability.