
Introduction
Elasticsearch is built for speed, but search performance can change dramatically under real-world traffic. A cluster that feels fast during manual testing may struggle when hundreds or thousands of users run queries at the same time, when bulk indexing jobs overlap with search traffic, or when expensive aggregations hit large datasets. That’s why Elasticsearch load testing is essential before production releases, infrastructure changes, or major schema updates.
With LoadForge, you can run realistic Elasticsearch performance testing at scale using Locust-based Python scripts. This makes it easy to simulate concurrent search requests, bulk indexing operations, authenticated API traffic, and mixed workloads across distributed cloud infrastructure. Whether you want to benchmark search latency, identify indexing bottlenecks, or stress test your cluster under peak traffic, LoadForge gives you the tools to do it with real-time reporting, global test locations, and CI/CD integration.
In this guide, you’ll learn how to load test Elasticsearch with LoadForge, including practical Locust scripts that target realistic Elasticsearch endpoints and payloads.
Prerequisites
Before you start load testing Elasticsearch, make sure you have the following:
- A running Elasticsearch cluster or managed deployment
- Network access to the Elasticsearch HTTP API
- Valid credentials if security is enabled
- One or more test indices with representative mappings and data
- A clear goal for your test:
- Search latency benchmarking
- Bulk indexing throughput measurement
- Stress testing cluster limits
- Comparing performance before and after tuning
- LoadForge account access for cloud-based distributed testing
You should also know:
- Your Elasticsearch version
- Whether you use Basic Auth, API keys, or another authentication method
- Which endpoints and query patterns represent real production traffic
- Any index lifecycle, shard, or replica settings that could affect performance
For safe performance testing, avoid running destructive tests against production unless you fully understand the impact. Heavy indexing, force refreshes, and expensive aggregations can affect cluster stability.
Understanding Elasticsearch Under Load
Elasticsearch performance depends on much more than raw CPU and memory. Under concurrent traffic, several internal behaviors shape how the cluster responds.
Search workload characteristics
Search requests can vary from simple term queries to expensive multi-field full-text searches with filters, sorting, highlighting, and aggregations. Query latency often increases when:
- Queries touch many shards
- Large result windows are requested
- Deep pagination is used
- Aggregations scan large datasets
- Sorts run on unanalyzed or high-cardinality fields
- Cache hit rates are low
A query that returns in 30 ms during single-user testing may take much longer when many users trigger the same or similar searches concurrently.
Indexing workload characteristics
Indexing performance is affected by:
- Bulk request size
- Refresh interval
- Number of shards and replicas
- Mapping complexity
- Ingest pipelines
- Disk throughput
- Merge pressure
If your application indexes documents while users are searching, you need a mixed workload test. Search and indexing compete for CPU, memory, and I/O, and this is where Elasticsearch stress testing becomes especially valuable.
Common Elasticsearch bottlenecks
When load testing Elasticsearch, watch for these common bottlenecks:
- High CPU on data nodes from query execution or aggregation processing
- Heap pressure from field data, aggregations, or large result sets
- Slow disk I/O during indexing and segment merges
- Thread pool saturation for search or write operations
- Hot shards receiving disproportionate traffic
- Slow ingest pipelines with enrichments or scripts
- Network latency between clients and cluster nodes
A good Elasticsearch load test should reflect your real usage patterns rather than only sending a single repeated request.
Writing Your First Load Test
Let’s start with a basic Elasticsearch search latency test. This example simulates users searching an products index using a realistic e-commerce query. It uses Basic Authentication and hits the _search endpoint directly.
Basic search load test
from locust import HttpUser, task, between
import json
class ElasticsearchSearchUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
self.client.auth = ("elastic", "your-password")
self.headers = {
"Content-Type": "application/json"
}
@task
def search_products(self):
payload = {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "wireless headphones",
"fields": ["name^3", "description", "category"]
}
}
],
"filter": [
{"term": {"in_stock": True}},
{"range": {"price": {"gte": 50, "lte": 300}}}
]
}
},
"sort": [
{"_score": "desc"},
{"popularity": "desc"}
],
"size": 20
}
with self.client.post(
"/products/_search",
data=json.dumps(payload),
headers=self.headers,
name="POST /products/_search",
catch_response=True
) as response:
if response.status_code != 200:
response.failure(f"Search failed with status {response.status_code}")
return
data = response.json()
hits = data.get("hits", {}).get("hits", [])
if not isinstance(hits, list):
response.failure("Invalid Elasticsearch search response")What this test does
This script simulates a user who:
- Waits 1 to 3 seconds between actions
- Authenticates with Elasticsearch using Basic Auth
- Sends a realistic JSON search request to
/products/_search - Validates the response structure
This is a good starting point for measuring baseline search latency. In LoadForge, you can scale this script across many virtual users and use distributed testing to see how latency behaves as concurrency increases.
Running this test effectively
For a useful benchmark:
- Start with 10–25 users
- Increase gradually to 100, 250, or more
- Watch median, p95, and p99 response times
- Compare latency as you increase user counts
- Monitor Elasticsearch node metrics alongside LoadForge results
A single average response time is not enough. Tail latency is often where Elasticsearch issues become visible.
Advanced Load Testing Scenarios
Basic search tests are useful, but most production clusters handle more complex traffic. Below are several advanced Elasticsearch load testing scenarios you can use in LoadForge.
Authenticated search with API key and aggregations
Many Elasticsearch deployments use API keys instead of Basic Auth. This example tests a dashboard-style query with filters and aggregations, which is common in analytics applications.
from locust import HttpUser, task, between
import json
import random
class ElasticsearchAnalyticsUser(HttpUser):
wait_time = between(2, 5)
def on_start(self):
api_key = "your-base64-encoded-api-key"
self.headers = {
"Content-Type": "application/json",
"Authorization": f"ApiKey {api_key}"
}
@task
def search_orders_dashboard(self):
regions = ["us-east", "us-west", "eu-central", "ap-southeast"]
statuses = ["completed", "processing", "shipped"]
payload = {
"query": {
"bool": {
"filter": [
{"term": {"region.keyword": random.choice(regions)}},
{"terms": {"status.keyword": statuses}},
{
"range": {
"order_date": {
"gte": "now-30d/d",
"lte": "now/d"
}
}
}
]
}
},
"aggs": {
"sales_by_status": {
"terms": {
"field": "status.keyword"
}
},
"revenue_over_time": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"total_revenue": {
"sum": {
"field": "total_amount"
}
}
}
}
},
"size": 0
}
with self.client.post(
"/orders/_search",
data=json.dumps(payload),
headers=self.headers,
name="POST /orders/_search with aggs",
catch_response=True
) as response:
if response.status_code != 200:
response.failure(f"Aggregation query failed: {response.text}")
return
data = response.json()
aggs = data.get("aggregations")
if not aggs or "sales_by_status" not in aggs:
response.failure("Missing expected aggregations in response")Why this scenario matters
Aggregations can be much more expensive than standard searches. This kind of Elasticsearch performance testing helps you answer questions like:
- How many concurrent dashboard users can the cluster handle?
- Do aggregations create CPU spikes?
- Does p95 latency remain acceptable under load?
- Are certain fields or date histograms too expensive?
Bulk indexing performance test
Indexing speed is just as important as search speed. The next example simulates a service sending bulk product updates to Elasticsearch using the _bulk API.
from locust import HttpUser, task, between
import json
import random
import time
import uuid
class ElasticsearchBulkIndexUser(HttpUser):
wait_time = between(1, 2)
def on_start(self):
self.client.auth = ("elastic", "your-password")
self.headers = {
"Content-Type": "application/x-ndjson"
}
@task
def bulk_index_products(self):
actions = []
for _ in range(100):
doc_id = str(uuid.uuid4())
timestamp = int(time.time() * 1000)
action_meta = {
"index": {
"_index": "products",
"_id": doc_id
}
}
document = {
"name": random.choice([
"Wireless Mouse",
"Gaming Keyboard",
"4K Monitor",
"USB-C Dock",
"Bluetooth Speaker"
]),
"category": random.choice(["electronics", "accessories", "office"]),
"price": round(random.uniform(19.99, 499.99), 2),
"in_stock": random.choice([True, True, True, False]),
"brand": random.choice(["LogiTech", "KeyMaster", "ViewPro", "DockHub"]),
"popularity": random.randint(1, 1000),
"updated_at": timestamp
}
actions.append(json.dumps(action_meta))
actions.append(json.dumps(document))
bulk_payload = "\n".join(actions) + "\n"
with self.client.post(
"/_bulk?refresh=false",
data=bulk_payload,
headers=self.headers,
name="POST /_bulk",
catch_response=True
) as response:
if response.status_code != 200:
response.failure(f"Bulk indexing failed: {response.text}")
return
data = response.json()
if data.get("errors") is True:
response.failure("Bulk request contained indexing errors")What this test reveals
This script measures how your cluster handles sustained write traffic. It helps uncover:
- Bulk indexing throughput limits
- Slow ingest or mapping operations
- Disk and merge pressure
- Write thread pool saturation
- Impact of indexing on search responsiveness
In LoadForge, you can run this as a standalone indexing benchmark or combine it with search users in a mixed workload test.
Mixed search and indexing workload
Most real Elasticsearch environments don’t only search or only index. They do both. This example combines search and indexing behaviors in a single test user class, weighted toward search traffic.
from locust import HttpUser, task, between
import json
import random
import uuid
import time
class ElasticsearchMixedWorkloadUser(HttpUser):
wait_time = between(1, 4)
def on_start(self):
self.client.auth = ("elastic", "your-password")
@task(4)
def search_catalog(self):
headers = {"Content-Type": "application/json"}
query_terms = [
"laptop stand",
"mechanical keyboard",
"noise cancelling earbuds",
"webcam",
"portable SSD"
]
payload = {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": random.choice(query_terms),
"fields": ["name^2", "description", "brand", "category"]
}
}
],
"filter": [
{"term": {"in_stock": True}}
]
}
},
"size": 10
}
self.client.post(
"/products/_search",
json=payload,
headers=headers,
name="Mixed: product search"
)
@task(1)
def index_inventory_update(self):
headers = {"Content-Type": "application/json"}
doc_id = str(uuid.uuid4())
payload = {
"sku": f"SKU-{random.randint(100000, 999999)}",
"name": random.choice(["Laptop Stand", "USB Hub", "Office Chair", "Desk Lamp"]),
"category": random.choice(["office", "electronics", "furniture"]),
"price": round(random.uniform(25.0, 250.0), 2),
"in_stock": random.choice([True, False]),
"inventory_count": random.randint(0, 500),
"last_synced_at": int(time.time() * 1000)
}
self.client.put(
f"/inventory/_doc/{doc_id}?refresh=false",
json=payload,
headers=headers,
name="Mixed: inventory index"
)Why mixed workload testing is important
This is often the most realistic Elasticsearch stress testing scenario. It helps you understand:
- Whether indexing degrades search latency
- If search traffic slows write throughput
- How the cluster behaves during normal business activity
- Whether node roles or shard allocation need adjustment
LoadForge is especially useful here because you can scale mixed workloads across distributed generators and observe real-time reporting while the test runs.
Analyzing Your Results
After running your Elasticsearch load test, focus on more than just “requests per second.”
Key LoadForge metrics to review
In LoadForge, pay close attention to:
- Average response time
- Median response time
- p95 and p99 latency
- Requests per second
- Failure rate
- Response time trends during ramp-up and sustained load
For Elasticsearch, p95 and p99 are especially important. Search systems often appear healthy on average while a subset of requests becomes very slow under contention.
Elasticsearch-specific signals to correlate
Alongside LoadForge metrics, review cluster telemetry such as:
- Search latency and query throughput
- Indexing rate
- JVM heap usage
- CPU utilization
- Disk I/O and merge activity
- Search and write thread pool queue sizes
- Garbage collection frequency
- Rejected requests
- Hot shard patterns
If response times rise sharply during high concurrency, check whether:
- Search queues are filling up
- Heap usage is too high
- Aggregations are causing CPU saturation
- Bulk indexing is increasing disk pressure
- Too many shards are being queried per request
Interpreting common result patterns
Low average latency but high p99
This usually means most requests are fine, but some queries are expensive or blocked by contention. Look for complex searches, large aggregations, or hot shards.
Good search performance until indexing starts
This often points to disk I/O contention, merge pressure, or insufficient cluster resources for mixed workloads.
Rising failures under load
Elasticsearch may be rejecting work due to thread pool saturation, request timeouts, or circuit breaker limits.
Flat throughput with rising latency
This suggests the cluster has reached capacity and is queueing requests rather than processing them faster.
Performance Optimization Tips
Once your Elasticsearch performance testing reveals bottlenecks, these optimization steps often help.
Optimize query patterns
- Avoid deep pagination with
fromandsize; usesearch_afterwhere possible - Prefer filters for exact matches instead of scoring queries
- Limit expensive wildcard and regex queries
- Reduce aggregation cardinality when possible
- Return only required fields using
_sourcefiltering
Improve indexing efficiency
- Use the
_bulkAPI instead of single-document writes - Tune bulk batch sizes based on cluster behavior
- Increase refresh intervals during heavy indexing
- Avoid unnecessary replicas in write-heavy test environments
- Keep mappings explicit to prevent dynamic mapping overhead
Tune cluster design
- Review shard counts and avoid oversharding
- Separate hot and warm workloads if needed
- Use faster disks for indexing-heavy clusters
- Size heap carefully, but don’t over-allocate beyond best practices
- Distribute traffic evenly to avoid hot nodes or hot shards
Test with realistic workloads
Elasticsearch load testing is only useful if it reflects actual user behavior. Include:
- Real search terms
- Typical filters and sorts
- Actual document sizes
- Production-like bulk sizes
- A realistic mix of reads and writes
With LoadForge, you can create these scenarios and run them from multiple global test locations to simulate traffic closer to your real user base.
Common Pitfalls to Avoid
Elasticsearch performance testing can go wrong if the test design is unrealistic or incomplete.
Testing only one simple query
A single lightweight query rarely represents production traffic. Include a variety of query types, filters, and aggregations.
Ignoring cluster state
If the cluster is already under pressure before the test starts, your results will be misleading. Begin from a known baseline.
Using tiny datasets
Elasticsearch behaves very differently on small datasets than on production-scale indices. Test against representative data volumes whenever possible.
Forgetting authentication overhead
If your real deployment uses Basic Auth or API keys, include that in the test. Authentication can affect performance and request routing.
Overusing refresh during indexing tests
Forcing frequent refreshes can drastically reduce indexing throughput. Unless your application truly requires immediate search visibility, avoid unrealistic refresh settings.
Not correlating application and cluster metrics
LoadForge shows request-level performance, but Elasticsearch node metrics explain why the performance looks that way. Use both.
Running search-only tests for write-heavy systems
If your application continuously writes logs, products, metrics, or events, pure search tests won’t reveal real bottlenecks. Use mixed workload testing.
Conclusion
Elasticsearch load testing is one of the best ways to validate search latency, indexing speed, and overall cluster resilience before traffic spikes expose weaknesses in production. By testing realistic queries, bulk indexing workflows, and mixed workloads, you can identify bottlenecks in shards, thread pools, heap usage, disk I/O, and query design long before they become incidents.
LoadForge makes Elasticsearch performance testing practical with Locust-based scripting, cloud-based infrastructure, distributed testing, real-time reporting, and CI/CD integration. Whether you’re benchmarking a new cluster, validating a schema change, or running a full Elasticsearch stress testing exercise, LoadForge helps you simulate real traffic patterns at scale.
If you’re ready to benchmark your Elasticsearch cluster with realistic load, try LoadForge and start building tests that reflect how your users actually search and index data.
LoadForge Team
LoadForge is a load and performance testing platform built on Locust. Our team has been shipping load tests against production systems since 2018, and we write these guides from real customer engagements.
Related guides
Keep going with more guides from the same category.

Cassandra Load Testing with LoadForge
Run Cassandra load tests with LoadForge to measure write throughput, query latency, and distributed database resilience.

How to Load Test Databases with LoadForge
Discover how to load test databases with LoadForge, from SQL to NoSQL, and identify bottlenecks before production.

MySQL Load Testing with LoadForge
Run MySQL load tests with LoadForge to benchmark query speed, concurrent connections, and database performance under traffic.