LoadForge LogoLoadForge

Elasticsearch Load Testing with LoadForge

Elasticsearch Load Testing with LoadForge

Introduction

Elasticsearch is built for speed, but search performance can change dramatically under real-world traffic. A cluster that feels fast during manual testing may struggle when hundreds or thousands of users run queries at the same time, when bulk indexing jobs overlap with search traffic, or when expensive aggregations hit large datasets. That’s why Elasticsearch load testing is essential before production releases, infrastructure changes, or major schema updates.

With LoadForge, you can run realistic Elasticsearch performance testing at scale using Locust-based Python scripts. This makes it easy to simulate concurrent search requests, bulk indexing operations, authenticated API traffic, and mixed workloads across distributed cloud infrastructure. Whether you want to benchmark search latency, identify indexing bottlenecks, or stress test your cluster under peak traffic, LoadForge gives you the tools to do it with real-time reporting, global test locations, and CI/CD integration.

In this guide, you’ll learn how to load test Elasticsearch with LoadForge, including practical Locust scripts that target realistic Elasticsearch endpoints and payloads.

Prerequisites

Before you start load testing Elasticsearch, make sure you have the following:

  • A running Elasticsearch cluster or managed deployment
  • Network access to the Elasticsearch HTTP API
  • Valid credentials if security is enabled
  • One or more test indices with representative mappings and data
  • A clear goal for your test:
    • Search latency benchmarking
    • Bulk indexing throughput measurement
    • Stress testing cluster limits
    • Comparing performance before and after tuning
  • LoadForge account access for cloud-based distributed testing

You should also know:

  • Your Elasticsearch version
  • Whether you use Basic Auth, API keys, or another authentication method
  • Which endpoints and query patterns represent real production traffic
  • Any index lifecycle, shard, or replica settings that could affect performance

For safe performance testing, avoid running destructive tests against production unless you fully understand the impact. Heavy indexing, force refreshes, and expensive aggregations can affect cluster stability.

Understanding Elasticsearch Under Load

Elasticsearch performance depends on much more than raw CPU and memory. Under concurrent traffic, several internal behaviors shape how the cluster responds.

Search workload characteristics

Search requests can vary from simple term queries to expensive multi-field full-text searches with filters, sorting, highlighting, and aggregations. Query latency often increases when:

  • Queries touch many shards
  • Large result windows are requested
  • Deep pagination is used
  • Aggregations scan large datasets
  • Sorts run on unanalyzed or high-cardinality fields
  • Cache hit rates are low

A query that returns in 30 ms during single-user testing may take much longer when many users trigger the same or similar searches concurrently.

Indexing workload characteristics

Indexing performance is affected by:

  • Bulk request size
  • Refresh interval
  • Number of shards and replicas
  • Mapping complexity
  • Ingest pipelines
  • Disk throughput
  • Merge pressure

If your application indexes documents while users are searching, you need a mixed workload test. Search and indexing compete for CPU, memory, and I/O, and this is where Elasticsearch stress testing becomes especially valuable.

Common Elasticsearch bottlenecks

When load testing Elasticsearch, watch for these common bottlenecks:

  • High CPU on data nodes from query execution or aggregation processing
  • Heap pressure from field data, aggregations, or large result sets
  • Slow disk I/O during indexing and segment merges
  • Thread pool saturation for search or write operations
  • Hot shards receiving disproportionate traffic
  • Slow ingest pipelines with enrichments or scripts
  • Network latency between clients and cluster nodes

A good Elasticsearch load test should reflect your real usage patterns rather than only sending a single repeated request.

Writing Your First Load Test

Let’s start with a basic Elasticsearch search latency test. This example simulates users searching an products index using a realistic e-commerce query. It uses Basic Authentication and hits the _search endpoint directly.

Basic search load test

python
from locust import HttpUser, task, between
import json
 
class ElasticsearchSearchUser(HttpUser):
    wait_time = between(1, 3)
 
    def on_start(self):
        self.client.auth = ("elastic", "your-password")
        self.headers = {
            "Content-Type": "application/json"
        }
 
    @task
    def search_products(self):
        payload = {
            "query": {
                "bool": {
                    "must": [
                        {
                            "multi_match": {
                                "query": "wireless headphones",
                                "fields": ["name^3", "description", "category"]
                            }
                        }
                    ],
                    "filter": [
                        {"term": {"in_stock": True}},
                        {"range": {"price": {"gte": 50, "lte": 300}}}
                    ]
                }
            },
            "sort": [
                {"_score": "desc"},
                {"popularity": "desc"}
            ],
            "size": 20
        }
 
        with self.client.post(
            "/products/_search",
            data=json.dumps(payload),
            headers=self.headers,
            name="POST /products/_search",
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f"Search failed with status {response.status_code}")
                return
 
            data = response.json()
            hits = data.get("hits", {}).get("hits", [])
            if not isinstance(hits, list):
                response.failure("Invalid Elasticsearch search response")

What this test does

This script simulates a user who:

  • Waits 1 to 3 seconds between actions
  • Authenticates with Elasticsearch using Basic Auth
  • Sends a realistic JSON search request to /products/_search
  • Validates the response structure

This is a good starting point for measuring baseline search latency. In LoadForge, you can scale this script across many virtual users and use distributed testing to see how latency behaves as concurrency increases.

Running this test effectively

For a useful benchmark:

  • Start with 10–25 users
  • Increase gradually to 100, 250, or more
  • Watch median, p95, and p99 response times
  • Compare latency as you increase user counts
  • Monitor Elasticsearch node metrics alongside LoadForge results

A single average response time is not enough. Tail latency is often where Elasticsearch issues become visible.

Advanced Load Testing Scenarios

Basic search tests are useful, but most production clusters handle more complex traffic. Below are several advanced Elasticsearch load testing scenarios you can use in LoadForge.

Authenticated search with API key and aggregations

Many Elasticsearch deployments use API keys instead of Basic Auth. This example tests a dashboard-style query with filters and aggregations, which is common in analytics applications.

python
from locust import HttpUser, task, between
import json
import random
 
class ElasticsearchAnalyticsUser(HttpUser):
    wait_time = between(2, 5)
 
    def on_start(self):
        api_key = "your-base64-encoded-api-key"
        self.headers = {
            "Content-Type": "application/json",
            "Authorization": f"ApiKey {api_key}"
        }
 
    @task
    def search_orders_dashboard(self):
        regions = ["us-east", "us-west", "eu-central", "ap-southeast"]
        statuses = ["completed", "processing", "shipped"]
 
        payload = {
            "query": {
                "bool": {
                    "filter": [
                        {"term": {"region.keyword": random.choice(regions)}},
                        {"terms": {"status.keyword": statuses}},
                        {
                            "range": {
                                "order_date": {
                                    "gte": "now-30d/d",
                                    "lte": "now/d"
                                }
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "sales_by_status": {
                    "terms": {
                        "field": "status.keyword"
                    }
                },
                "revenue_over_time": {
                    "date_histogram": {
                        "field": "order_date",
                        "calendar_interval": "day"
                    },
                    "aggs": {
                        "total_revenue": {
                            "sum": {
                                "field": "total_amount"
                            }
                        }
                    }
                }
            },
            "size": 0
        }
 
        with self.client.post(
            "/orders/_search",
            data=json.dumps(payload),
            headers=self.headers,
            name="POST /orders/_search with aggs",
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f"Aggregation query failed: {response.text}")
                return
 
            data = response.json()
            aggs = data.get("aggregations")
            if not aggs or "sales_by_status" not in aggs:
                response.failure("Missing expected aggregations in response")

Why this scenario matters

Aggregations can be much more expensive than standard searches. This kind of Elasticsearch performance testing helps you answer questions like:

  • How many concurrent dashboard users can the cluster handle?
  • Do aggregations create CPU spikes?
  • Does p95 latency remain acceptable under load?
  • Are certain fields or date histograms too expensive?

Bulk indexing performance test

Indexing speed is just as important as search speed. The next example simulates a service sending bulk product updates to Elasticsearch using the _bulk API.

python
from locust import HttpUser, task, between
import json
import random
import time
import uuid
 
class ElasticsearchBulkIndexUser(HttpUser):
    wait_time = between(1, 2)
 
    def on_start(self):
        self.client.auth = ("elastic", "your-password")
        self.headers = {
            "Content-Type": "application/x-ndjson"
        }
 
    @task
    def bulk_index_products(self):
        actions = []
 
        for _ in range(100):
            doc_id = str(uuid.uuid4())
            timestamp = int(time.time() * 1000)
 
            action_meta = {
                "index": {
                    "_index": "products",
                    "_id": doc_id
                }
            }
 
            document = {
                "name": random.choice([
                    "Wireless Mouse",
                    "Gaming Keyboard",
                    "4K Monitor",
                    "USB-C Dock",
                    "Bluetooth Speaker"
                ]),
                "category": random.choice(["electronics", "accessories", "office"]),
                "price": round(random.uniform(19.99, 499.99), 2),
                "in_stock": random.choice([True, True, True, False]),
                "brand": random.choice(["LogiTech", "KeyMaster", "ViewPro", "DockHub"]),
                "popularity": random.randint(1, 1000),
                "updated_at": timestamp
            }
 
            actions.append(json.dumps(action_meta))
            actions.append(json.dumps(document))
 
        bulk_payload = "\n".join(actions) + "\n"
 
        with self.client.post(
            "/_bulk?refresh=false",
            data=bulk_payload,
            headers=self.headers,
            name="POST /_bulk",
            catch_response=True
        ) as response:
            if response.status_code != 200:
                response.failure(f"Bulk indexing failed: {response.text}")
                return
 
            data = response.json()
            if data.get("errors") is True:
                response.failure("Bulk request contained indexing errors")

What this test reveals

This script measures how your cluster handles sustained write traffic. It helps uncover:

  • Bulk indexing throughput limits
  • Slow ingest or mapping operations
  • Disk and merge pressure
  • Write thread pool saturation
  • Impact of indexing on search responsiveness

In LoadForge, you can run this as a standalone indexing benchmark or combine it with search users in a mixed workload test.

Mixed search and indexing workload

Most real Elasticsearch environments don’t only search or only index. They do both. This example combines search and indexing behaviors in a single test user class, weighted toward search traffic.

python
from locust import HttpUser, task, between
import json
import random
import uuid
import time
 
class ElasticsearchMixedWorkloadUser(HttpUser):
    wait_time = between(1, 4)
 
    def on_start(self):
        self.client.auth = ("elastic", "your-password")
 
    @task(4)
    def search_catalog(self):
        headers = {"Content-Type": "application/json"}
        query_terms = [
            "laptop stand",
            "mechanical keyboard",
            "noise cancelling earbuds",
            "webcam",
            "portable SSD"
        ]
 
        payload = {
            "query": {
                "bool": {
                    "must": [
                        {
                            "multi_match": {
                                "query": random.choice(query_terms),
                                "fields": ["name^2", "description", "brand", "category"]
                            }
                        }
                    ],
                    "filter": [
                        {"term": {"in_stock": True}}
                    ]
                }
            },
            "size": 10
        }
 
        self.client.post(
            "/products/_search",
            json=payload,
            headers=headers,
            name="Mixed: product search"
        )
 
    @task(1)
    def index_inventory_update(self):
        headers = {"Content-Type": "application/json"}
        doc_id = str(uuid.uuid4())
 
        payload = {
            "sku": f"SKU-{random.randint(100000, 999999)}",
            "name": random.choice(["Laptop Stand", "USB Hub", "Office Chair", "Desk Lamp"]),
            "category": random.choice(["office", "electronics", "furniture"]),
            "price": round(random.uniform(25.0, 250.0), 2),
            "in_stock": random.choice([True, False]),
            "inventory_count": random.randint(0, 500),
            "last_synced_at": int(time.time() * 1000)
        }
 
        self.client.put(
            f"/inventory/_doc/{doc_id}?refresh=false",
            json=payload,
            headers=headers,
            name="Mixed: inventory index"
        )

Why mixed workload testing is important

This is often the most realistic Elasticsearch stress testing scenario. It helps you understand:

  • Whether indexing degrades search latency
  • If search traffic slows write throughput
  • How the cluster behaves during normal business activity
  • Whether node roles or shard allocation need adjustment

LoadForge is especially useful here because you can scale mixed workloads across distributed generators and observe real-time reporting while the test runs.

Analyzing Your Results

After running your Elasticsearch load test, focus on more than just “requests per second.”

Key LoadForge metrics to review

In LoadForge, pay close attention to:

  • Average response time
  • Median response time
  • p95 and p99 latency
  • Requests per second
  • Failure rate
  • Response time trends during ramp-up and sustained load

For Elasticsearch, p95 and p99 are especially important. Search systems often appear healthy on average while a subset of requests becomes very slow under contention.

Elasticsearch-specific signals to correlate

Alongside LoadForge metrics, review cluster telemetry such as:

  • Search latency and query throughput
  • Indexing rate
  • JVM heap usage
  • CPU utilization
  • Disk I/O and merge activity
  • Search and write thread pool queue sizes
  • Garbage collection frequency
  • Rejected requests
  • Hot shard patterns

If response times rise sharply during high concurrency, check whether:

  • Search queues are filling up
  • Heap usage is too high
  • Aggregations are causing CPU saturation
  • Bulk indexing is increasing disk pressure
  • Too many shards are being queried per request

Interpreting common result patterns

Low average latency but high p99

This usually means most requests are fine, but some queries are expensive or blocked by contention. Look for complex searches, large aggregations, or hot shards.

Good search performance until indexing starts

This often points to disk I/O contention, merge pressure, or insufficient cluster resources for mixed workloads.

Rising failures under load

Elasticsearch may be rejecting work due to thread pool saturation, request timeouts, or circuit breaker limits.

Flat throughput with rising latency

This suggests the cluster has reached capacity and is queueing requests rather than processing them faster.

Performance Optimization Tips

Once your Elasticsearch performance testing reveals bottlenecks, these optimization steps often help.

Optimize query patterns

  • Avoid deep pagination with from and size; use search_after where possible
  • Prefer filters for exact matches instead of scoring queries
  • Limit expensive wildcard and regex queries
  • Reduce aggregation cardinality when possible
  • Return only required fields using _source filtering

Improve indexing efficiency

  • Use the _bulk API instead of single-document writes
  • Tune bulk batch sizes based on cluster behavior
  • Increase refresh intervals during heavy indexing
  • Avoid unnecessary replicas in write-heavy test environments
  • Keep mappings explicit to prevent dynamic mapping overhead

Tune cluster design

  • Review shard counts and avoid oversharding
  • Separate hot and warm workloads if needed
  • Use faster disks for indexing-heavy clusters
  • Size heap carefully, but don’t over-allocate beyond best practices
  • Distribute traffic evenly to avoid hot nodes or hot shards

Test with realistic workloads

Elasticsearch load testing is only useful if it reflects actual user behavior. Include:

  • Real search terms
  • Typical filters and sorts
  • Actual document sizes
  • Production-like bulk sizes
  • A realistic mix of reads and writes

With LoadForge, you can create these scenarios and run them from multiple global test locations to simulate traffic closer to your real user base.

Common Pitfalls to Avoid

Elasticsearch performance testing can go wrong if the test design is unrealistic or incomplete.

Testing only one simple query

A single lightweight query rarely represents production traffic. Include a variety of query types, filters, and aggregations.

Ignoring cluster state

If the cluster is already under pressure before the test starts, your results will be misleading. Begin from a known baseline.

Using tiny datasets

Elasticsearch behaves very differently on small datasets than on production-scale indices. Test against representative data volumes whenever possible.

Forgetting authentication overhead

If your real deployment uses Basic Auth or API keys, include that in the test. Authentication can affect performance and request routing.

Overusing refresh during indexing tests

Forcing frequent refreshes can drastically reduce indexing throughput. Unless your application truly requires immediate search visibility, avoid unrealistic refresh settings.

Not correlating application and cluster metrics

LoadForge shows request-level performance, but Elasticsearch node metrics explain why the performance looks that way. Use both.

Running search-only tests for write-heavy systems

If your application continuously writes logs, products, metrics, or events, pure search tests won’t reveal real bottlenecks. Use mixed workload testing.

Conclusion

Elasticsearch load testing is one of the best ways to validate search latency, indexing speed, and overall cluster resilience before traffic spikes expose weaknesses in production. By testing realistic queries, bulk indexing workflows, and mixed workloads, you can identify bottlenecks in shards, thread pools, heap usage, disk I/O, and query design long before they become incidents.

LoadForge makes Elasticsearch performance testing practical with Locust-based scripting, cloud-based infrastructure, distributed testing, real-time reporting, and CI/CD integration. Whether you’re benchmarking a new cluster, validating a schema change, or running a full Elasticsearch stress testing exercise, LoadForge helps you simulate real traffic patterns at scale.

If you’re ready to benchmark your Elasticsearch cluster with realistic load, try LoadForge and start building tests that reflect how your users actually search and index data.

Try LoadForge free for 7 days

Set up your first load test in under 2 minutes. No commitment.