Introduction

Elasticsearch is built for speed, but search performance can change dramatically under real-world traffic. A cluster that feels fast during manual testing may struggle when hundreds or thousands of users run queries at the same time, when bulk indexing jobs overlap with search traffic, or when expensive aggregations hit large datasets. That’s why Elasticsearch load testing is essential before production releases, infrastructure changes, or major schema updates.

With LoadForge, you can run realistic Elasticsearch performance testing at scale using Locust-based Python scripts. This makes it easy to simulate concurrent search requests, bulk indexing operations, authenticated API traffic, and mixed workloads across distributed cloud infrastructure. Whether you want to benchmark search latency, identify indexing bottlenecks, or stress test your cluster under peak traffic, LoadForge gives you the tools to do it with real-time reporting, global test locations, and CI/CD integration.

In this guide, you’ll learn how to load test Elasticsearch with LoadForge, including practical Locust scripts that target realistic Elasticsearch endpoints and payloads.

Prerequisites

Before you start load testing Elasticsearch, make sure you have the following:

A running Elasticsearch cluster or managed deployment
Network access to the Elasticsearch HTTP API
Valid credentials if security is enabled
One or more test indices with representative mappings and data
A clear goal for your test:
- Search latency benchmarking
- Bulk indexing throughput measurement
- Stress testing cluster limits
- Comparing performance before and after tuning
LoadForge account access for cloud-based distributed testing

You should also know:

Your Elasticsearch version
Whether you use Basic Auth, API keys, or another authentication method
Which endpoints and query patterns represent real production traffic
Any index lifecycle, shard, or replica settings that could affect performance

For safe performance testing, avoid running destructive tests against production unless you fully understand the impact. Heavy indexing, force refreshes, and expensive aggregations can affect cluster stability.

Understanding Elasticsearch Under Load

Elasticsearch performance depends on much more than raw CPU and memory. Under concurrent traffic, several internal behaviors shape how the cluster responds.

Search workload characteristics

Search requests can vary from simple term queries to expensive multi-field full-text searches with filters, sorting, highlighting, and aggregations. Query latency often increases when:

Queries touch many shards
Large result windows are requested
Deep pagination is used
Aggregations scan large datasets
Sorts run on unanalyzed or high-cardinality fields
Cache hit rates are low

A query that returns in 30 ms during single-user testing may take much longer when many users trigger the same or similar searches concurrently.

Indexing workload characteristics

Indexing performance is affected by:

Bulk request size
Refresh interval
Number of shards and replicas
Mapping complexity
Ingest pipelines
Disk throughput
Merge pressure

If your application indexes documents while users are searching, you need a mixed workload test. Search and indexing compete for CPU, memory, and I/O, and this is where Elasticsearch stress testing becomes especially valuable.

Common Elasticsearch bottlenecks

When load testing Elasticsearch, watch for these common bottlenecks:

High CPU on data nodes from query execution or aggregation processing
Heap pressure from field data, aggregations, or large result sets
Slow disk I/O during indexing and segment merges
Thread pool saturation for search or write operations
Hot shards receiving disproportionate traffic
Slow ingest pipelines with enrichments or scripts
Network latency between clients and cluster nodes

A good Elasticsearch load test should reflect your real usage patterns rather than only sending a single repeated request.

Writing Your First Load Test

Let’s start with a basic Elasticsearch search latency test. This example simulates users searching an products index using a realistic e-commerce query. It uses Basic Authentication and hits the _search endpoint directly.

Basic search load test

python

from locust import HttpUser, task, between
import json
 
class ElasticsearchSearchUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        self.client.auth = ("elastic", "your-password")
        self.headers = {
            "Content-Type": "application/json"
        }
 
    @task
    def search_products(self):
        payload = {
            "query": {
                "bool": {
                    "must": [
                        {
                            "multi_match": {
                                "query": "wireless headphones",
                                "fields": ["name^3", "description", "category"]
                            }
                        }
                    ],
                    "filter": [
                        {"term": {"in_stock": True}},
                        {"range": {"price": {"gte": 50, "lte": 300}}}
                    ]
                }
            },
            "sort": [
                {"_score": "desc"},
                {"popularity": "desc"}
            ],
            "size": 20
        }
 
        with self.client.post(
            "/products/_search",
            data=json.dumps(payload),
            headers=self.headers,
            name="POST /products/_search",
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f"Search failed with status {response.status_code}")
                return
 
            data = response.json()
            hits = data.get("hits", {}).get("hits", [])
            if not isinstance(hits, list):
                response.failure("Invalid Elasticsearch search response")

What this test does

This script simulates a user who:

Waits 1 to 3 seconds between actions
Authenticates with Elasticsearch using Basic Auth
Sends a realistic JSON search request to /products/_search
Validates the response structure

This is a good starting point for measuring baseline search latency. In LoadForge, you can scale this script across many virtual users and use distributed testing to see how latency behaves as concurrency increases.

Running this test effectively

For a useful benchmark:

Start with 10–25 users
Increase gradually to 100, 250, or more
Watch median, p95, and p99 response times
Compare latency as you increase user counts
Monitor Elasticsearch node metrics alongside LoadForge results

A single average response time is not enough. Tail latency is often where Elasticsearch issues become visible.

Advanced Load Testing Scenarios

Basic search tests are useful, but most production clusters handle more complex traffic. Below are several advanced Elasticsearch load testing scenarios you can use in LoadForge.

Authenticated search with API key and aggregations

Many Elasticsearch deployments use API keys instead of Basic Auth. This example tests a dashboard-style query with filters and aggregations, which is common in analytics applications.

python

from locust import HttpUser, task, between
import json
import random
 
class ElasticsearchAnalyticsUser(HttpUser):
    wait_time = between(2, 5)
 
    def on_start(self):
        api_key = "your-base64-encoded-api-key"
        self.headers = {
            "Content-Type": "application/json",
            "Authorization": f"ApiKey {api_key}"
        }
 
    @task
    def search_orders_dashboard(self):
        regions = ["us-east", "us-west", "eu-central", "ap-southeast"]
        statuses = ["completed", "processing", "shipped"]
 
        payload = {
            "query": {
                "bool": {
                    "filter": [
                        {"term": {"region.keyword": random.choice(regions)}},
                        {"terms": {"status.keyword": statuses}},
                        {
                            "range": {
                                "order_date": {
                                    "gte": "now-30d/d",
                                    "lte": "now/d"
                                }
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "sales_by_status": {
                    "terms": {
                        "field": "status.keyword"
                    }
                },
                "revenue_over_time": {
                    "date_histogram": {
                        "field": "order_date",
                        "calendar_interval": "day"
                    },
                    "aggs": {
                        "total_revenue": {
                            "sum": {
                                "field": "total_amount"
                            }
                        }
                    }
                }
            },
            "size": 0
        }
 
        with self.client.post(
            "/orders/_search",
            data=json.dumps(payload),
            headers=self.headers,
            name="POST /orders/_search with aggs",
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f"Aggregation query failed: {response.text}")
                return
 
            data = response.json()
            aggs = data.get("aggregations")
            if not aggs or "sales_by_status" not in aggs:
                response.failure("Missing expected aggregations in response")

Why this scenario matters

Aggregations can be much more expensive than standard searches. This kind of Elasticsearch performance testing helps you answer questions like:

How many concurrent dashboard users can the cluster handle?
Do aggregations create CPU spikes?
Does p95 latency remain acceptable under load?
Are certain fields or date histograms too expensive?

Bulk indexing performance test

Indexing speed is just as important as search speed. The next example simulates a service sending bulk product updates to Elasticsearch using the _bulk API.

python

from locust import HttpUser, task, between
import json
import random
import time
import uuid
 
class ElasticsearchBulkIndexUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        self.client.auth = ("elastic", "your-password")
        self.headers = {
            "Content-Type": "application/x-ndjson"
        }
 
    @task
    def bulk_index_products(self):
        actions = []
 
        for _ in range(100):
            doc_id = str(uuid.uuid4())
            timestamp = int(time.time() * 1000)
 
            action_meta = {
                "index": {
                    "_index": "products",
                    "_id": doc_id
                }
            }
 
            document = {
                "name": random.choice([
                    "Wireless Mouse",
                    "Gaming Keyboard",
                    "4K Monitor",
                    "USB-C Dock",
                    "Bluetooth Speaker"
                ]),
                "category": random.choice(["electronics", "accessories", "office"]),
                "price": round(random.uniform(19.99, 499.99), 2),
                "in_stock": random.choice([True, True, True, False]),
                "brand": random.choice(["LogiTech", "KeyMaster", "ViewPro", "DockHub"]),
                "popularity": random.randint(1, 1000),
                "updated_at": timestamp
            }
 
            actions.append(json.dumps(action_meta))
            actions.append(json.dumps(document))
 
        bulk_payload = "\n".join(actions) + "\n"
 
        with self.client.post(
            "/_bulk?refresh=false",
            data=bulk_payload,
            headers=self.headers,
            name="POST /_bulk",
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f"Bulk indexing failed: {response.text}")
                return
 
            data = response.json()
            if data.get("errors") is True:
                response.failure("Bulk request contained indexing errors")

What this test reveals

This script measures how your cluster handles sustained write traffic. It helps uncover:

Bulk indexing throughput limits
Slow ingest or mapping operations
Disk and merge pressure
Write thread pool saturation
Impact of indexing on search responsiveness

In LoadForge, you can run this as a standalone indexing benchmark or combine it with search users in a mixed workload test.

Mixed search and indexing workload

Most real Elasticsearch environments don’t only search or only index. They do both. This example combines search and indexing behaviors in a single test user class, weighted toward search traffic.

python

from locust import HttpUser, task, between
import json
import random
import uuid
import time
 
class ElasticsearchMixedWorkloadUser(HttpUser):
    wait_time = between(1, 4)
 
    def on_start(self):
        self.client.auth = ("elastic", "your-password")
 
    @task(4)
    def search_catalog(self):
        headers = {"Content-Type": "application/json"}
        query_terms = [
            "laptop stand",
            "mechanical keyboard",
            "noise cancelling earbuds",
            "webcam",
            "portable SSD"
        ]
 
        payload = {
            "query": {
                "bool": {
                    "must": [
                        {
                            "multi_match": {
                                "query": random.choice(query_terms),
                                "fields": ["name^2", "description", "brand", "category"]
                            }
                        }
                    ],
                    "filter": [
                        {"term": {"in_stock": True}}
                    ]
                }
            },
            "size": 10
        }
 
        self.client.post(
            "/products/_search",
            json=payload,
            headers=headers,
            name="Mixed: product search"
        )
 
    @task(1)
    def index_inventory_update(self):
        headers = {"Content-Type": "application/json"}
        doc_id = str(uuid.uuid4())
 
        payload = {
            "sku": f"SKU-{random.randint(100000, 999999)}",
            "name": random.choice(["Laptop Stand", "USB Hub", "Office Chair", "Desk Lamp"]),
            "category": random.choice(["office", "electronics", "furniture"]),
            "price": round(random.uniform(25.0, 250.0), 2),
            "in_stock": random.choice([True, False]),
            "inventory_count": random.randint(0, 500),
            "last_synced_at": int(time.time() * 1000)
        }
 
        self.client.put(
            f"/inventory/_doc/{doc_id}?refresh=false",
            json=payload,
            headers=headers,
            name="Mixed: inventory index"
        )

Why mixed workload testing is important

This is often the most realistic Elasticsearch stress testing scenario. It helps you understand:

Whether indexing degrades search latency
If search traffic slows write throughput
How the cluster behaves during normal business activity
Whether node roles or shard allocation need adjustment

LoadForge is especially useful here because you can scale mixed workloads across distributed generators and observe real-time reporting while the test runs.

Analyzing Your Results

After running your Elasticsearch load test, focus on more than just “requests per second.”

Key LoadForge metrics to review

In LoadForge, pay close attention to:

Average response time
Median response time
p95 and p99 latency
Requests per second
Failure rate
Response time trends during ramp-up and sustained load

For Elasticsearch, p95 and p99 are especially important. Search systems often appear healthy on average while a subset of requests becomes very slow under contention.

Elasticsearch-specific signals to correlate

Alongside LoadForge metrics, review cluster telemetry such as:

Search latency and query throughput
Indexing rate
JVM heap usage
CPU utilization
Disk I/O and merge activity
Search and write thread pool queue sizes
Garbage collection frequency
Rejected requests
Hot shard patterns

If response times rise sharply during high concurrency, check whether:

Search queues are filling up
Heap usage is too high
Aggregations are causing CPU saturation
Bulk indexing is increasing disk pressure
Too many shards are being queried per request

Interpreting common result patterns

Low average latency but high p99

This usually means most requests are fine, but some queries are expensive or blocked by contention. Look for complex searches, large aggregations, or hot shards.

Good search performance until indexing starts

This often points to disk I/O contention, merge pressure, or insufficient cluster resources for mixed workloads.

Rising failures under load

Elasticsearch may be rejecting work due to thread pool saturation, request timeouts, or circuit breaker limits.

Flat throughput with rising latency

This suggests the cluster has reached capacity and is queueing requests rather than processing them faster.

Performance Optimization Tips

Once your Elasticsearch performance testing reveals bottlenecks, these optimization steps often help.

Optimize query patterns

Avoid deep pagination with from and size; use search_after where possible
Prefer filters for exact matches instead of scoring queries
Limit expensive wildcard and regex queries
Reduce aggregation cardinality when possible
Return only required fields using _source filtering

Improve indexing efficiency

Use the _bulk API instead of single-document writes
Tune bulk batch sizes based on cluster behavior
Increase refresh intervals during heavy indexing
Avoid unnecessary replicas in write-heavy test environments
Keep mappings explicit to prevent dynamic mapping overhead

Tune cluster design

Review shard counts and avoid oversharding
Separate hot and warm workloads if needed
Use faster disks for indexing-heavy clusters
Size heap carefully, but don’t over-allocate beyond best practices
Distribute traffic evenly to avoid hot nodes or hot shards

Test with realistic workloads

Elasticsearch load testing is only useful if it reflects actual user behavior. Include:

Real search terms
Typical filters and sorts
Actual document sizes
Production-like bulk sizes
A realistic mix of reads and writes

With LoadForge, you can create these scenarios and run them from multiple global test locations to simulate traffic closer to your real user base.

Common Pitfalls to Avoid

Elasticsearch performance testing can go wrong if the test design is unrealistic or incomplete.

Testing only one simple query

A single lightweight query rarely represents production traffic. Include a variety of query types, filters, and aggregations.

Ignoring cluster state

If the cluster is already under pressure before the test starts, your results will be misleading. Begin from a known baseline.

Using tiny datasets

Elasticsearch behaves very differently on small datasets than on production-scale indices. Test against representative data volumes whenever possible.

Forgetting authentication overhead

If your real deployment uses Basic Auth or API keys, include that in the test. Authentication can affect performance and request routing.

Overusing refresh during indexing tests

Forcing frequent refreshes can drastically reduce indexing throughput. Unless your application truly requires immediate search visibility, avoid unrealistic refresh settings.

Not correlating application and cluster metrics

LoadForge shows request-level performance, but Elasticsearch node metrics explain why the performance looks that way. Use both.

Running search-only tests for write-heavy systems

If your application continuously writes logs, products, metrics, or events, pure search tests won’t reveal real bottlenecks. Use mixed workload testing.

Conclusion

Elasticsearch load testing is one of the best ways to validate search latency, indexing speed, and overall cluster resilience before traffic spikes expose weaknesses in production. By testing realistic queries, bulk indexing workflows, and mixed workloads, you can identify bottlenecks in shards, thread pools, heap usage, disk I/O, and query design long before they become incidents.

LoadForge makes Elasticsearch performance testing practical with Locust-based scripting, cloud-based infrastructure, distributed testing, real-time reporting, and CI/CD integration. Whether you’re benchmarking a new cluster, validating a schema change, or running a full Elasticsearch stress testing exercise, LoadForge helps you simulate real traffic patterns at scale.

If you’re ready to benchmark your Elasticsearch cluster with realistic load, try LoadForge and start building tests that reflect how your users actually search and index data.

Elasticsearch Load Testing with LoadForge

Introduction

Prerequisites

Understanding Elasticsearch Under Load

Search workload characteristics

Indexing workload characteristics

Common Elasticsearch bottlenecks

Writing Your First Load Test

Basic search load test

What this test does

Running this test effectively

Advanced Load Testing Scenarios

Authenticated search with API key and aggregations

Why this scenario matters

Bulk indexing performance test

What this test reveals

Mixed search and indexing workload

Why mixed workload testing is important

Analyzing Your Results

Key LoadForge metrics to review

Elasticsearch-specific signals to correlate

Interpreting common result patterns

Low average latency but high p99

Good search performance until indexing starts

Rising failures under load

Flat throughput with rising latency

Performance Optimization Tips

Optimize query patterns

Improve indexing efficiency

Tune cluster design

Test with realistic workloads

Common Pitfalls to Avoid

Testing only one simple query

Ignoring cluster state

Using tiny datasets

Forgetting authentication overhead

Overusing refresh during indexing tests

Not correlating application and cluster metrics

Running search-only tests for write-heavy systems

Conclusion

Try LoadForge free for 7 days

Related guides

Cassandra Load Testing with LoadForge

How to Load Test Databases with LoadForge

MySQL Load Testing with LoadForge