|  | 1 month ago | |
|---|---|---|
| .. | ||
| common | 1 month ago | |
| setup | 1 month ago | |
| README.md | 1 month ago | |
| cleanup.py | 1 month ago | |
| locust.conf | 1 month ago | |
| run_locust_stress_test.sh | 1 month ago | |
| setup_all.py | 1 month ago | |
| sse_benchmark.py | 1 month ago | |
A high-performance stress test suite for Dify workflow execution using Locust - optimized for measuring Server-Sent Events (SSE) streaming performance.
The stress test focuses on four critical SSE performance indicators:
The stress test focuses on SSE streaming performance with these key metrics:
/v1/workflows/runThe stress test tests a single endpoint with comprehensive SSE metrics tracking:
Dependencies are automatically installed when running setup:
Complete Dify setup:
   # Run the complete setup
   python scripts/stress-test/setup_all.py
IMPORTANT: For accurate stress testing, run the API server with Gunicorn in production mode:
   # Run from the api directory
   cd api
   uv run gunicorn \
     --bind 0.0.0.0:5001 \
     --workers 4 \
     --worker-class gevent \
     --timeout 120 \
     --keep-alive 5 \
     --log-level info \
     --access-logfile - \
     --error-logfile - \
     app:app
Configuration options explained:
--workers 4: Number of worker processes (adjust based on CPU cores)--worker-class gevent: Async worker for handling concurrent connections--timeout 120: Worker timeout for long-running requests--keep-alive 5: Keep connections alive for SSE streamingNOT RECOMMENDED for stress testing:
   # Debug mode - DO NOT use for stress testing (slow performance)
   ./dev/start-api  # This runs Flask in debug mode with single-threaded execution
Also start the Mock OpenAI server:
   python scripts/stress-test/setup/mock_openai_server.py
# Run with default configuration (headless mode)
./scripts/stress-test/run_locust_stress_test.sh
# Or run directly with uv
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
# Run with Web UI (access at http://localhost:8089)
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
The script will:
reports/ directoryThe stress test configuration is in locust.conf:
users = 10           # Number of concurrent users
spawn-rate = 2       # Users spawned per second
run-time = 1m        # Test duration (30s, 5m, 1h)
headless = true      # Run without web UI
Modify the questions list in sse_benchmark.py:
self.questions = [
    "Your custom question 1",
    "Your custom question 2",
    # Add more questions...
]
After running the stress test, you’ll find these files in the reports/ directory:
locust_summary_YYYYMMDD_HHMMSS.txt - Complete console output with metricslocust_report_YYYYMMDD_HHMMSS.html - Interactive HTML report with chartslocust_YYYYMMDD_HHMMSS_stats.csv - CSV with detailed statisticslocust_YYYYMMDD_HHMMSS_stats_history.csv - Time-series dataRequests Per Second (RPS):
Response Time Percentiles:
Success Rate:
============================================================
DIFY SSE STRESS TEST
============================================================
[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
============================================================
SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
============================================================
Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00
============================================================
FINAL RESULTS
============================================================
Total Connections: 142
Total Events:      2841
Average TTFE:      43 ms
============================================================
Live SSE Metrics Box (Updates every 10 seconds):
SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
Standard Locust Table:
Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37
Performance Targets:
✅ Good Performance:
⚠️ Warning Signs:
concurrency: 10
iterations: 100
concurrency: 100
iterations: 1000
concurrency: 500
iterations: 5000
concurrency: 1000
iterations: 10000
Gunicorn Tuning for Different Load Levels:
# Light load (10-50 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
# Medium load (50-200 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
# Heavy load (200-1000 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
Worker calculation formula:
PostgreSQL Connection Pool Tuning:
For high-concurrency stress testing, increase the PostgreSQL max connections in docker/middleware.env:
# Edit docker/middleware.env
POSTGRES_MAX_CONNECTIONS=200  # Default is 100
# Recommended values for different load levels:
# Light load (10-50 users): 100 (default)
# Medium load (50-200 users): 200
# Heavy load (200-1000 users): 500
After changing, restart the PostgreSQL container:
docker compose -f docker/docker-compose.middleware.yaml down db
docker compose -f docker/docker-compose.middleware.yaml up -d db
Note: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
   ulimit -n 65536
   # Increase TCP buffer sizes
   sudo sysctl -w net.core.rmem_max=134217728
   sudo sysctl -w net.core.wmem_max=134217728
   # Enable TCP fast open
   sudo sysctl -w net.ipv4.tcp_fastopen=3
   # Increase maximum connections
   sudo sysctl -w kern.ipc.somaxconn=2048
   # Dependencies are installed automatically, but if needed:
   uv --project api add --dev locust sseclient-py
   # Run setup
   python scripts/stress-test/setup_all.py
   # Start Dify API with Gunicorn (production mode)
   cd api
   uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app
   # Start Mock OpenAI server
   python scripts/stress-test/setup/mock_openai_server.py
High error rate:
Permission denied running script:
   chmod +x run_benchmark.sh
# Run stress test 3 times with 60-second intervals
for i in {1..3}; do
    echo "Run $i of 3"
    ./run_locust_stress_test.sh
    sleep 60
done
Run Locust directly with custom options:
# With specific user count and spawn rate
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --users 50 --spawn-rate 5
# Generate CSV reports
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --csv reports/results
# Run for specific duration
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --run-time 5m --headless
# Compare multiple stress test runs
ls -la reports/stress_test_*.txt | tail -5
Possible causes:
Check for:
Investigate:
Locust was chosen over Drill for this stress test because:
To improve the stress test suite:
stress_test.yml for configuration changesrun_locust_stress_test.sh for workflow improvements