
Introduction
Running applications on Fly.io gives developers a powerful edge-first deployment model, global regions, and the ability to place workloads close to users. That architecture can dramatically improve responsiveness, but it also changes how you should approach load testing. When your app is distributed across regions, traditional single-origin performance testing is no longer enough. You need to understand global latency, concurrency behavior, cache effectiveness, cold-start patterns, and how your Fly.io app performs when traffic spikes hit multiple edge locations at once.
This Fly.io load testing guide with LoadForge shows you how to measure the real-world performance of applications deployed on Fly.io. We’ll cover how Fly.io applications behave under load, how to write Locust-based test scripts for realistic scenarios, and how to analyze the results to identify bottlenecks before they affect users.
Because LoadForge is built on Locust, you can create flexible Python-based load testing and stress testing scenarios while taking advantage of cloud-based infrastructure, distributed testing, real-time reporting, CI/CD integration, and global test locations. That makes it especially useful for testing Fly.io applications, where geography and edge routing are part of the performance story.
Prerequisites
Before you start load testing your Fly.io application, make sure you have:
- A Fly.io application deployed and reachable over HTTPS
- The application hostname, such as:
https://myapp.fly.dev- or a custom domain mapped to Fly.io
- A basic understanding of your application’s key user flows
- Any authentication credentials or API tokens needed for testing
- Test data that can safely be used in a staging or production-like environment
- A LoadForge account to run distributed load tests from multiple regions
It also helps to know:
- Which Fly.io regions your app is deployed in
- Whether your app uses:
- Fly Machines
- autoscaling
- edge caching
- Postgres or external databases
- WebSocket or API-heavy traffic
- Any rate limiting, WAF, or auth middleware that may affect test users
For the examples below, we’ll assume a realistic Fly.io-hosted SaaS API with endpoints like:
GET /healthPOST /api/v1/auth/loginGET /api/v1/projectsGET /api/v1/projects/{id}/metricsPOST /api/v1/uploads/presignPOST /api/v1/events/ingest
These patterns are common for applications deployed on Fly.io, especially globally distributed APIs and edge-facing services.
Understanding Fly.io Under Load
Fly.io is designed to run apps close to users, but that doesn’t automatically guarantee good performance under heavy traffic. Load testing Fly.io applications requires thinking about several layers of behavior.
Regional routing and latency
A Fly.io app may route users to the nearest healthy region, but latency still depends on:
- where the request originates
- whether the app instance is warm
- how traffic is balanced across regions
- whether backend services are centralized elsewhere
If your app runs in multiple Fly.io regions but your database lives in only one, users may see fast edge connection times but slower full response times for database-heavy endpoints.
Concurrency and instance saturation
Fly.io applications can handle concurrency differently depending on:
- runtime and framework
- CPU and memory allocation
- connection pooling
- autoscaling thresholds
- per-instance request limits
A lightweight health endpoint may scale well, while authenticated API requests with database queries may degrade quickly once instance concurrency is saturated.
Cold starts and machine startup behavior
If you use Fly Machines or scaled-to-zero patterns, sudden bursts of traffic may trigger startup delays. A load test can reveal:
- how long new instances take to become responsive
- whether cold starts affect only certain endpoints
- how latency changes during rapid ramp-up
Edge performance versus origin performance
Some Fly.io apps benefit from edge caching or static asset acceleration, while dynamic API endpoints still depend on application logic and data access. This means your performance testing strategy should separate:
- cacheable requests
- authenticated API traffic
- write-heavy workloads
- background event ingestion
Common bottlenecks in Fly.io deployments
When load testing Fly.io applications, the most common bottlenecks include:
- database latency from a single-region backend
- insufficient connection pooling
- CPU exhaustion on small VM sizes
- slow startup time during scaling events
- application-level locks or synchronous processing
- rate limiting misconfiguration
- file upload flows that depend on object storage latency
The goal of load testing is not just to find the maximum requests per second. It’s to identify where the Fly.io architecture performs well and where edge distribution stops helping because another dependency becomes the bottleneck.
Writing Your First Load Test
Let’s start with a simple Fly.io load test that validates baseline responsiveness. This is useful for smoke testing, health checks, and measuring latency from different LoadForge regions.
Basic health and homepage test
from locust import HttpUser, task, between
class FlyAppSmokeUser(HttpUser):
wait_time = between(1, 3)
@task(3)
def health_check(self):
self.client.get("/health", name="GET /health")
@task(1)
def homepage(self):
self.client.get("/", name="GET /")How this test works
This script simulates a lightweight user checking two common endpoints:
GET /healthfor service healthGET /for the public landing page or root route
In LoadForge, set the host to your Fly.io app, for example:
https://myapp.fly.devThis basic load test is useful for:
- verifying the app is reachable from multiple regions
- measuring baseline latency
- checking whether Fly.io edge routing is working as expected
- spotting cold-start or startup delays during ramp-up
What to look for
When you run this in LoadForge, pay attention to:
- median and p95 response times
- failures during rapid user ramp-up
- differences between test regions
- whether the health endpoint remains stable even as the homepage slows down
If GET /health stays fast but GET / degrades, the issue is likely application rendering, upstream dependencies, or dynamic content generation rather than Fly.io network routing itself.
Advanced Load Testing Scenarios
Once the basics are covered, you should test realistic user behavior. For Fly.io apps, that often means authenticated APIs, write-heavy traffic, and geographically sensitive workloads.
Scenario 1: Authenticated API workflow
This example simulates a user logging in, retrieving projects, and fetching project metrics. This is a common SaaS pattern and a strong test of app logic, session handling, and database-backed reads.
from locust import HttpUser, task, between
import random
class FlyAuthenticatedApiUser(HttpUser):
wait_time = between(1, 2)
def on_start(self):
response = self.client.post(
"/api/v1/auth/login",
json={
"email": "loadtest.user@example.com",
"password": "SuperSecurePass123!"
},
name="POST /api/v1/auth/login"
)
if response.status_code == 200:
data = response.json()
self.token = data.get("access_token")
self.headers = {
"Authorization": f"Bearer {self.token}",
"Content-Type": "application/json"
}
else:
self.token = None
self.headers = {}
@task(3)
def list_projects(self):
self.client.get(
"/api/v1/projects",
headers=self.headers,
name="GET /api/v1/projects"
)
@task(2)
def project_metrics(self):
project_id = random.choice([101, 102, 103, 104])
self.client.get(
f"/api/v1/projects/{project_id}/metrics?range=24h&granularity=5m",
headers=self.headers,
name="GET /api/v1/projects/:id/metrics"
)
@task(1)
def user_profile(self):
self.client.get(
"/api/v1/me",
headers=self.headers,
name="GET /api/v1/me"
)Why this matters for Fly.io
This test reveals how your Fly.io app handles:
- JWT or bearer-token authentication
- database-backed list endpoints
- repeated metrics queries
- session-independent API traffic across many concurrent users
It’s especially useful if your app runs globally on Fly.io but reads from a centralized database. You may find that auth succeeds quickly at the edge, but metrics endpoints slow down because data must travel to another region.
Scenario 2: Event ingestion and edge write traffic
Many Fly.io deployments act as globally distributed ingestion endpoints for telemetry, webhooks, or analytics. This scenario simulates clients sending events to an ingestion API.
from locust import HttpUser, task, between
import random
import uuid
from datetime import datetime
class FlyEventIngestionUser(HttpUser):
wait_time = between(0.2, 1.0)
def on_start(self):
self.headers = {
"Authorization": "Bearer lf_ingest_test_token_abc123",
"Content-Type": "application/json",
"User-Agent": "LoadForge-Fly-Ingest-Test/1.0"
}
@task
def ingest_event(self):
payload = {
"event_id": str(uuid.uuid4()),
"tenant_id": "tenant_demo_001",
"source": "web",
"event_type": random.choice([
"page_view",
"signup_started",
"checkout_completed",
"api_error"
]),
"timestamp": datetime.utcnow().isoformat() + "Z",
"region_hint": random.choice(["iad", "ord", "lhr", "fra", "sin"]),
"properties": {
"path": random.choice([
"/pricing",
"/signup",
"/dashboard",
"/api/v1/projects"
]),
"response_time_ms": random.randint(45, 1200),
"plan": random.choice(["free", "pro", "enterprise"])
}
}
self.client.post(
"/api/v1/events/ingest",
json=payload,
headers=self.headers,
name="POST /api/v1/events/ingest"
)What this test uncovers
This kind of stress testing is ideal for Fly.io because it measures:
- edge write performance from global locations
- request validation overhead
- queueing or async processing behavior
- regional spikes and burst handling
- CPU and memory pressure under high event throughput
If ingestion latency rises sharply under moderate concurrency, the bottleneck may be synchronous writes, limited worker capacity, or downstream queue/database contention rather than Fly.io itself.
Scenario 3: File upload preparation and signed URL flow
A very common pattern on Fly.io is using the app as an API gateway for upload flows. The app generates a signed upload URL, and the client then uploads to object storage. You should load test the app-controlled part of that workflow.
from locust import HttpUser, task, between
import uuid
import random
class FlyUploadWorkflowUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
login_response = self.client.post(
"/api/v1/auth/login",
json={
"email": "uploader@example.com",
"password": "UploadFlowPass456!"
},
name="POST /api/v1/auth/login"
)
token = login_response.json().get("access_token")
self.headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
@task(2)
def create_presigned_upload(self):
filename = f"report-{uuid.uuid4()}.csv"
payload = {
"filename": filename,
"content_type": "text/csv",
"size_bytes": random.randint(50_000, 5_000_000),
"folder": "customer-exports"
}
with self.client.post(
"/api/v1/uploads/presign",
json=payload,
headers=self.headers,
name="POST /api/v1/uploads/presign",
catch_response=True
) as response:
if response.status_code != 200:
response.failure(f"Unexpected status code: {response.status_code}")
return
data = response.json()
if "upload_url" not in data or "file_id" not in data:
response.failure("Missing upload_url or file_id in response")
@task(1)
def list_recent_uploads(self):
self.client.get(
"/api/v1/uploads?limit=20&status=pending,complete",
headers=self.headers,
name="GET /api/v1/uploads"
)Why this is realistic
This scenario reflects how many modern Fly.io apps actually work:
- the application handles auth and upload authorization
- object storage receives the actual file bytes
- the app stores metadata and status in a database
This load test focuses on the Fly.io-hosted application layer, where performance issues often show up in:
- auth checks
- signed URL generation
- metadata persistence
- upload listing queries
If these endpoints are slow, users will perceive the upload flow as sluggish even if object storage is fast.
Analyzing Your Results
After running your Fly.io load test in LoadForge, the next step is understanding what the metrics actually mean.
Key metrics to monitor
For Fly.io performance testing, focus on:
- response time percentiles: p50, p95, p99
- requests per second
- error rate
- timeouts
- throughput consistency during ramp-up
- latency differences by region
Interpreting latency patterns
A few common result patterns are especially important for Fly.io apps:
Fast median, slow p95 or p99
This usually means:
- some requests hit warm instances while others hit cold or overloaded ones
- certain regions are slower than others
- backend dependencies are inconsistent
Errors only during ramp-up
This often suggests:
- autoscaling lag
- startup delays
- insufficient instance capacity
- connection pool exhaustion
Read endpoints are fast, write endpoints degrade
This may indicate:
- database write contention
- queue bottlenecks
- synchronous processing in request handlers
- regional replication delays
Regional differences
If LoadForge is running traffic from multiple global test locations, compare:
- North America vs Europe vs Asia latency
- authenticated vs anonymous endpoint behavior
- static vs dynamic response times
This is one of the biggest advantages of LoadForge for Fly.io load testing. Because Fly.io is globally distributed, you need distributed testing to validate the architecture. A single-region test won’t tell you whether your edge deployment is truly helping global users.
Correlate with Fly.io observability
As you review LoadForge’s real-time reporting, compare the results with Fly.io metrics and logs:
- instance CPU and memory usage
- request concurrency
- app restarts
- scaling events
- backend database metrics
- per-region instance distribution
This helps you determine whether the bottleneck is:
- edge routing
- application code
- VM sizing
- autoscaling configuration
- downstream services
Performance Optimization Tips
Once your load testing reveals bottlenecks, these are the most common ways to improve Fly.io performance.
Place stateful dependencies closer to users
If your app is globally distributed but your database is in one region, dynamic endpoints may still be slow. Consider:
- regional read replicas
- caching hot data
- moving latency-sensitive services closer to users
Tune connection pooling
Many performance issues on Fly.io come from database or upstream connection bottlenecks. Make sure your app has:
- appropriate DB pool sizes
- keep-alive enabled where relevant
- async or nonblocking request handling if supported by your stack
Reduce cold-start impact
If load tests show ramp-up latency spikes:
- keep a minimum number of instances warm
- reduce startup time
- preload dependencies at boot
- avoid expensive initialization on first request
Separate ingestion from processing
For write-heavy APIs like event ingestion:
- accept requests quickly
- enqueue work asynchronously
- process downstream jobs outside the request path
This improves perceived edge performance and makes stress testing results much more stable.
Cache aggressively where safe
For endpoints like:
- project summaries
- metrics dashboards
- public pages
- configuration lookups
Use caching to reduce repeated database reads and improve Fly.io edge responsiveness.
Test from multiple regions regularly
Because Fly.io is built for geographic distribution, performance optimization should always include global validation. LoadForge’s cloud-based infrastructure and global test locations make it easy to repeat the same test from different parts of the world and compare results over time.
Common Pitfalls to Avoid
Load testing Fly.io applications is straightforward, but there are several mistakes that can lead to misleading results.
Testing only one region
This is the biggest mistake. A Fly.io app may perform well from Virginia and poorly from Singapore. Always include distributed load testing if your users are global.
Ignoring backend geography
Even if Fly.io routes traffic to the nearest region, your app may still depend on:
- a single-region Postgres instance
- centralized Redis
- third-party APIs hosted elsewhere
If you ignore those dependencies, you may misinterpret edge performance.
Using unrealistic traffic patterns
A health-check-only test won’t tell you much about real application behavior. Include:
- authentication
- database-backed reads
- writes
- upload flows
- burst traffic
Reusing one auth token for all users
That can hide auth bottlenecks and produce unrealistic caching behavior. In most cases, each Locust user should log in independently or use a realistic token pool.
Forgetting warm-up effects
Fly.io apps may behave differently during the first few minutes of a test. Watch for:
- startup delays
- autoscaling transitions
- connection pool initialization
- cache warming
Load testing production without safeguards
If you test a live Fly.io production app:
- use safe test accounts
- avoid destructive endpoints
- coordinate with your team
- monitor scaling costs
- set clear stop conditions
Focusing only on average response time
Average latency can look acceptable while p95 and p99 are terrible. For real user experience, percentiles matter far more than averages.
Conclusion
Fly.io gives developers a compelling platform for globally distributed applications, but edge deployment only delivers value if your app can handle real concurrency, regional traffic patterns, and backend dependency pressure. With the right load testing strategy, you can measure global latency, uncover scaling bottlenecks, validate edge performance, and improve reliability before users feel the impact.
Using LoadForge, you can build realistic Locust-based Fly.io load tests, run them from global locations, and analyze results with real-time reporting. Whether you’re testing a simple public app, an authenticated API, an event ingestion service, or an upload workflow, LoadForge makes it easier to understand how your Fly.io deployment performs under real-world load.
If you’re ready to validate your Fly.io architecture with practical performance testing and stress testing, try LoadForge and start building distributed tests that reflect how your users actually experience your application.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

Apache Load Testing Guide with LoadForge
Load test Apache web servers with LoadForge to benchmark request handling, concurrency, and overall site performance.

AWS Load Testing Guide with LoadForge
Learn how to load test AWS applications and APIs with LoadForge to find bottlenecks, measure scale, and improve performance.

Azure Functions Load Testing Guide
Load test Azure Functions with LoadForge to evaluate cold starts, throughput, and scaling behavior under peak demand.