How to interpret your load test results, and what to focus on, and how to improve your scalability
Reading and understanding your load tests is equally an important as running them. Knowing what to look for, based on what you are testing, is critical. Remember that you don't always have to break your service to find a bottleneck, you can predict them by watching for worsening error rates, slower response times and more.
Response times are key indicators of your applications ability to scale. They often include the time taken to fetch data from your database, they highlight the effectiveness of your caching, and they show the slowdown as users grow. There are two graphs we advise looking at:
When looking at response time it's important to have designed the right test as well. Run a longer (10 minute) test, and have your "rate" low - so users slowly build up. That allows you to watch for degradations. Lets look at a good example of a Response Time Graph below:
This looks good! The graph above shows a small spike at launch, but a solid, reliable, low response time without spikes.
This looks bad! The graph above shows a constant deterioration. It gets slower and slower until ultimately it's taking almost a second per response.
The above example is nice and obvious. But sometimes poor performance can be hidden in only a small percentage of your responses. That's what the P9x graphs are for. They show you the the worst 5% and worst 1% of responses, and can be illuminating. Lets look at an example below, from the good test we had above.
As you can see here, at least 1% of our users at the start of the test experienced an extremely delay in getting a reply - over 3800ms (3.8 seconds). The rest of the test looks quite good, but here's the P9x from the bad response time test above:
Clear as day! Our bottom 5% of users are getting a terrible service level here, over 20 seconds for a response. Remember that our median response time was just 800ms. These are the critical components to check for!
In this section we'll talk about monitoring your request rate (requests per second) and error rate (errors per second) as you scale up your test. For most users, your service will eventually fail. The important thing to know is when, how, and how badly does it cascade.
Lets look at an example of a test that overloaded the webservers.
Here you can see that initially things were going well. We hit 50 requests per second (about 300 active users in this test case) and then the errors starting piling up. This type of spike typically indicates a backend failure - databases locking up, etc. You can see at some points we were getting around 40% errors to our users.
The test was for a site called WowStat, which lists World of Warcraft character information (for those wondering about the upcoming request paths). We decided to dig into what these errors were, and what was causing them. This is available under the "Errors" tab on your LoadForge report.
Here we can see some great data. Firstly, we're getting 502 Server Errors - so we can look at our server log. Secondly, certain URLs generated far more errors than others.
As you can see, digging into your results is critical. Results are stored on LoadForge so you can always compare graphs as well.