This guide shows how to performance test DigitalOcean's AI platform using Locust. Perfect for testing response times, reliability, and capacity planning for AI workloads on DigitalOcean.
Use Cases
- Test DigitalOcean AI API response times under load
- Validate AI service reliability and uptime
- Capacity planning for AI applications
- Compare different Llama3 model variants
- Monitor API rate limits and quotas
Simple Implementation
import random
from locust import HttpUser, task, between
class Llama3ChatUser(HttpUser):
wait_time = between(1, 5)
QUESTIONS = [
"What is the capital of France?",
"Translate 'Hello, how are you?' into Spanish.",
"Who wrote 'Pride and Prejudice'?",
"What's 13 multiplied by 17?",
"Name three benefits of a vegan diet.",
"Give me a quick summary of the plot of '1984'.",
"Explain the concept of machine learning in simple terms.",
"What are the main differences between Python and JavaScript?",
"How does photosynthesis work?",
"What are some tips for effective time management?"
]
@task
def chat_completion(self):
question = random.choice(self.QUESTIONS)
# Used to show a preview of the question in LF
preview = (question[:20] + "...") if len(question) > 20 else question
payload = {
"model": "llama3.3-70b-instruct",
"messages": [{"role": "user", "content": question}],
"stream": False,
"include_functions_info": False,
"include_retrieval_info": False,
"include_guardrails_info": False
}
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_token}"
}
with self.client.post(
"/api/v1/chat/completions",
json=payload,
headers=headers,
name=f"chat: {preview}",
catch_response=True
) as response:
if response.status_code == 200:
response.success()
else:
response.failure(f"Status {response.status_code}")
def on_start(self):
# Set your DigitalOcean AI API token here
self.api_token = "your-digitalocean-ai-token"
Setup Instructions
-
Get DigitalOcean AI Access:
- Sign up for DigitalOcean account
- Enable AI services in your project
- Generate an API token from the control panel
-
Configure the Script in LoadForge:
- Copy the script into LoadForge's test editor
- Replace
your-digitalocean-ai-token
with your actual API token
- Set the target host URL to your DigitalOcean AI endpoint
-
Configure Load Test Settings:
- Start with 1-5 virtual users to test connectivity
- Set appropriate ramp-up time to avoid rate limits
- Monitor response times and error rates
What This Tests
- Response Times: Measure latency for Llama3.3 70B model
- Throughput: Test concurrent request handling
- Rate Limits: Understand DigitalOcean AI quotas and limits
- Reliability: Check API stability under sustained load
- API Performance: Validate DigitalOcean AI service quality
Expected Performance
Typical results for Llama 3.3 70B on DigitalOcean AI:
- Response Time: ~3-6 seconds per request
- Quality: High-quality responses with latest model improvements
- Throughput: Suitable for production workloads with proper scaling
Rate Limits & Pricing
- Request Limits: Varies by plan and model
- Token Limits: Based on input/output tokens
- Concurrent Requests: Limited per account tier
- Pricing: Pay-per-use model based on tokens consumed
Common Issues
- Authentication: Ensure API token has correct permissions
- Rate Limiting: Start with low user counts to avoid 429 errors
- Endpoint URLs: Verify the correct DigitalOcean AI endpoint
- Token Limits: Monitor usage to avoid exceeding quotas
Best Practices
- Gradual Ramp-up: Start with 1-5 users, increase gradually
- Monitor Costs: Track token usage to avoid unexpected charges
- Error Handling: Implement proper retry logic for production use
- Caching: Consider caching responses for repeated queries