# Dify Stress Test Suite A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance. ## Key Metrics Tracked The stress test focuses on four critical SSE performance indicators: 1. **Active SSE Connections** - Real-time count of open SSE connections 1. **New Connection Rate** - Connections per second (conn/sec) 1. **Time to First Event (TTFE)** - Latency until first SSE event arrives 1. **Event Throughput** - Events per second (events/sec) ## Features - **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure - **Real-time Metrics**: Live reporting every 5 seconds during tests - **Comprehensive Tracking**: - Active connection monitoring - Connection establishment rate - Event processing throughput - TTFE distribution analysis - **Multiple Interfaces**: - Web UI for real-time monitoring () - Headless mode with periodic console updates - **Detailed Reports**: Final statistics with overall rates and averages - **Easy Configuration**: Uses existing API key configuration from setup ## What Gets Measured The stress test focuses on SSE streaming performance with these key metrics: ### Primary Endpoint: `/v1/workflows/run` The stress test tests a single endpoint with comprehensive SSE metrics tracking: - **Request Type**: POST request to workflow execution API - **Response Type**: Server-Sent Events (SSE) stream - **Payload**: Random questions from a configurable pool - **Concurrency**: Configurable from 1 to 1000+ simultaneous users ### Key Performance Metrics #### 1. **Active Connections** - **What it measures**: Number of concurrent SSE connections open at any moment - **Why it matters**: Shows system's ability to handle parallel streams - **Good values**: Should remain stable under load without drops #### 2. **Connection Rate (conn/sec)** - **What it measures**: How fast new SSE connections are established - **Why it matters**: Indicates system's ability to handle connection spikes - **Good values**: - Light load: 5-10 conn/sec - Medium load: 20-50 conn/sec - Heavy load: 100+ conn/sec #### 3. **Time to First Event (TTFE)** - **What it measures**: Latency from request sent to first SSE event received - **Why it matters**: Critical for user experience - faster TTFE = better perceived performance - **Good values**: - Excellent: < 50ms - Good: 50-100ms - Acceptable: 100-500ms - Poor: > 500ms #### 4. **Event Throughput (events/sec)** - **What it measures**: Rate of SSE events being delivered across all connections - **Why it matters**: Shows actual data delivery performance - **Expected values**: Depends on workflow complexity and number of connections - Single connection: 10-20 events/sec - 10 connections: 50-100 events/sec - 100 connections: 200-500 events/sec #### 5. **Request/Response Times** - **P50 (Median)**: 50% of requests complete within this time - **P95**: 95% of requests complete within this time - **P99**: 99% of requests complete within this time - **Min/Max**: Best and worst case response times ## Prerequisites 1. **Dependencies are automatically installed** when running setup: - Locust (load testing framework) - sseclient-py (SSE client library) 1. **Complete Dify setup**: ```bash # Run the complete setup python scripts/stress-test/setup_all.py ``` 1. **Ensure services are running**: **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode: ```bash # Run from the api directory cd api uv run gunicorn \ --bind 0.0.0.0:5001 \ --workers 4 \ --worker-class gevent \ --timeout 120 \ --keep-alive 5 \ --log-level info \ --access-logfile - \ --error-logfile - \ app:app ``` **Configuration options explained**: - `--workers 4`: Number of worker processes (adjust based on CPU cores) - `--worker-class gevent`: Async worker for handling concurrent connections - `--timeout 120`: Worker timeout for long-running requests - `--keep-alive 5`: Keep connections alive for SSE streaming **NOT RECOMMENDED for stress testing**: ```bash # Debug mode - DO NOT use for stress testing (slow performance) ./dev/start-api # This runs Flask in debug mode with single-threaded execution ``` **Also start the Mock OpenAI server**: ```bash python scripts/stress-test/setup/mock_openai_server.py ``` ## Running the Stress Test ```bash # Run with default configuration (headless mode) ./scripts/stress-test/run_locust_stress_test.sh # Or run directly with uv uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 # Run with Web UI (access at http://localhost:8089) uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089 ``` The script will: 1. Validate that all required services are running 1. Check API token availability 1. Execute the Locust stress test with SSE support 1. Generate comprehensive reports in the `reports/` directory ## Configuration The stress test configuration is in `locust.conf`: ```ini users = 10 # Number of concurrent users spawn-rate = 2 # Users spawned per second run-time = 1m # Test duration (30s, 5m, 1h) headless = true # Run without web UI ``` ### Custom Question Sets Modify the questions list in `sse_benchmark.py`: ```python self.questions = [ "Your custom question 1", "Your custom question 2", # Add more questions... ] ``` ## Understanding the Results ### Report Structure After running the stress test, you'll find these files in the `reports/` directory: - `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics - `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts - `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics - `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data ### Key Metrics **Requests Per Second (RPS)**: - **Excellent**: > 50 RPS - **Good**: 20-50 RPS - **Acceptable**: 10-20 RPS - **Needs Improvement**: < 10 RPS **Response Time Percentiles**: - **P50 (Median)**: 50% of requests complete within this time - **P95**: 95% of requests complete within this time - **P99**: 99% of requests complete within this time **Success Rate**: - Should be > 99% for production readiness - Lower rates indicate errors or timeouts ### Example Output ```text ============================================================ DIFY SSE STRESS TEST ============================================================ [2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec ============================================================ SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841 Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms ============================================================ Type Name # reqs # fails | Avg Min Max Med | req/s failures/s ---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|----------- POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37 0.00 ---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|----------- Aggregated 142 0(0.00%) | 41 18 192 38 | 2.37 0.00 ============================================================ FINAL RESULTS ============================================================ Total Connections: 142 Total Events: 2841 Average TTFE: 43 ms ============================================================ ``` ### How to Read the Results **Live SSE Metrics Box (Updates every 10 seconds):** ```text SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841 Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms ``` - **Active**: Current number of open SSE connections - **Total Conn**: Cumulative connections established - **Events**: Total SSE events received - **conn/s**: Connection establishment rate - **events/s**: Event delivery rate - **TTFE**: Average time to first event **Standard Locust Table:** ```text Type Name # reqs # fails | Avg Min Max Med | req/s POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37 ``` - **Type**: Always POST for our SSE requests - **Name**: The API endpoint being tested - **# reqs**: Total requests made - **# fails**: Failed requests (should be 0) - **Avg/Min/Max/Med**: Response time percentiles (ms) - **req/s**: Request throughput **Performance Targets:** ✅ **Good Performance**: - Zero failures (0.00%) - TTFE < 100ms - Stable active connections - Consistent event throughput ⚠️ **Warning Signs**: - Failures > 1% - TTFE > 500ms - Dropping active connections - Declining event rate over time ## Test Scenarios ### Light Load ```yaml concurrency: 10 iterations: 100 ``` ### Normal Load ```yaml concurrency: 100 iterations: 1000 ``` ### Heavy Load ```yaml concurrency: 500 iterations: 5000 ``` ### Stress Test ```yaml concurrency: 1000 iterations: 10000 ``` ## Performance Tuning ### API Server Optimization **Gunicorn Tuning for Different Load Levels**: ```bash # Light load (10-50 concurrent users) uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app # Medium load (50-200 concurrent users) uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app # Heavy load (200-1000 concurrent users) uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app ``` **Worker calculation formula**: - Workers = (2 × CPU cores) + 1 - For SSE/WebSocket: Use gevent worker class - For CPU-bound tasks: Use sync workers ### Database Optimization **PostgreSQL Connection Pool Tuning**: For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`: ```bash # Edit docker/middleware.env POSTGRES_MAX_CONNECTIONS=200 # Default is 100 # Recommended values for different load levels: # Light load (10-50 users): 100 (default) # Medium load (50-200 users): 200 # Heavy load (200-1000 users): 500 ``` After changing, restart the PostgreSQL container: ```bash docker compose -f docker/docker-compose.middleware.yaml down db docker compose -f docker/docker-compose.middleware.yaml up -d db ``` **Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory: - 100 connections: ~1GB RAM - 200 connections: ~2GB RAM - 500 connections: ~5GB RAM ### System Optimizations 1. **Increase file descriptor limits**: ```bash ulimit -n 65536 ``` 1. **TCP tuning for high concurrency** (Linux): ```bash # Increase TCP buffer sizes sudo sysctl -w net.core.rmem_max=134217728 sudo sysctl -w net.core.wmem_max=134217728 # Enable TCP fast open sudo sysctl -w net.ipv4.tcp_fastopen=3 ``` 1. **macOS specific**: ```bash # Increase maximum connections sudo sysctl -w kern.ipc.somaxconn=2048 ``` ## Troubleshooting ### Common Issues 1. **"ModuleNotFoundError: No module named 'locust'"**: ```bash # Dependencies are installed automatically, but if needed: uv --project api add --dev locust sseclient-py ``` 1. **"API key configuration not found"**: ```bash # Run setup python scripts/stress-test/setup_all.py ``` 1. **Services not running**: ```bash # Start Dify API with Gunicorn (production mode) cd api uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app # Start Mock OpenAI server python scripts/stress-test/setup/mock_openai_server.py ``` 1. **High error rate**: - Reduce concurrency level - Check system resources (CPU, memory) - Review API server logs for errors - Increase timeout values if needed 1. **Permission denied running script**: ```bash chmod +x run_benchmark.sh ``` ## Advanced Usage ### Running Multiple Iterations ```bash # Run stress test 3 times with 60-second intervals for i in {1..3}; do echo "Run $i of 3" ./run_locust_stress_test.sh sleep 60 done ``` ### Custom Locust Options Run Locust directly with custom options: ```bash # With specific user count and spawn rate uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \ --host http://localhost:5001 --users 50 --spawn-rate 5 # Generate CSV reports uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \ --host http://localhost:5001 --csv reports/results # Run for specific duration uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \ --host http://localhost:5001 --run-time 5m --headless ``` ### Comparing Results ```bash # Compare multiple stress test runs ls -la reports/stress_test_*.txt | tail -5 ``` ## Interpreting Performance Issues ### High Response Times Possible causes: - Database query performance - External API latency - Insufficient server resources - Network congestion ### Low Throughput (RPS < 10) Check for: - CPU bottlenecks - Memory constraints - Database connection pooling - API rate limiting ### High Error Rate Investigate: - Server error logs - Resource exhaustion - Timeout configurations - Connection limits ## Why Locust? Locust was chosen over Drill for this stress test because: 1. **Proper SSE Support**: Correctly handles streaming responses without premature closure 1. **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration 1. **Web UI**: Real-time monitoring and control via web interface 1. **Python Integration**: Seamlessly integrates with existing Python setup code 1. **Extensibility**: Easy to customize for specific testing scenarios ## Contributing To improve the stress test suite: 1. Edit `stress_test.yml` for configuration changes 1. Modify `run_locust_stress_test.sh` for workflow improvements 1. Update question sets for better coverage 1. Add new metrics or analysis features