History

Asuka Minato bdd85b36a4 ruff check preview (#25653) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>		1 month ago
..
common	ruff check preview (#25653)	1 month ago
setup	ruff check preview (#25653)	1 month ago
README.md	feat(stress-test): add comprehensive stress testing suite using Locust (#25617)	1 month ago
cleanup.py	ruff check preview (#25653)	1 month ago
locust.conf	feat(stress-test): add comprehensive stress testing suite using Locust (#25617)	1 month ago
run_locust_stress_test.sh	feat(stress-test): add comprehensive stress testing suite using Locust (#25617)	1 month ago
setup_all.py	ruff check preview (#25653)	1 month ago
sse_benchmark.py	ruff check preview (#25653)	1 month ago

Dify Stress Test Suite

A high-performance stress test suite for Dify workflow execution using Locust - optimized for measuring Server-Sent Events (SSE) streaming performance.

Key Metrics Tracked

The stress test focuses on four critical SSE performance indicators:

Active SSE Connections - Real-time count of open SSE connections
New Connection Rate - Connections per second (conn/sec)
Time to First Event (TTFE) - Latency until first SSE event arrives
Event Throughput - Events per second (events/sec)

Features

True SSE Support: Properly handles Server-Sent Events streaming without premature connection closure
Real-time Metrics: Live reporting every 5 seconds during tests
Comprehensive Tracking:
- Active connection monitoring
- Connection establishment rate
- Event processing throughput
- TTFE distribution analysis
Multiple Interfaces:
- Web UI for real-time monitoring (http://localhost:8089)
- Headless mode with periodic console updates
Detailed Reports: Final statistics with overall rates and averages
Easy Configuration: Uses existing API key configuration from setup

What Gets Measured

The stress test focuses on SSE streaming performance with these key metrics:

Primary Endpoint: `/v1/workflows/run`

The stress test tests a single endpoint with comprehensive SSE metrics tracking:

Request Type: POST request to workflow execution API
Response Type: Server-Sent Events (SSE) stream
Payload: Random questions from a configurable pool
Concurrency: Configurable from 1 to 1000+ simultaneous users

Key Performance Metrics

1. Active Connections

What it measures: Number of concurrent SSE connections open at any moment
Why it matters: Shows system’s ability to handle parallel streams
Good values: Should remain stable under load without drops

2. Connection Rate (conn/sec)

What it measures: How fast new SSE connections are established
Why it matters: Indicates system’s ability to handle connection spikes
Good values:
- Light load: 5-10 conn/sec
- Medium load: 20-50 conn/sec
- Heavy load: 100+ conn/sec

3. Time to First Event (TTFE)

What it measures: Latency from request sent to first SSE event received
Why it matters: Critical for user experience - faster TTFE = better perceived performance
Good values:
- Excellent: < 50ms
- Good: 50-100ms
- Acceptable: 100-500ms
- Poor: > 500ms

4. Event Throughput (events/sec)

What it measures: Rate of SSE events being delivered across all connections
Why it matters: Shows actual data delivery performance
Expected values: Depends on workflow complexity and number of connections
- Single connection: 10-20 events/sec
- 10 connections: 50-100 events/sec
- 100 connections: 200-500 events/sec

5. Request/Response Times

P50 (Median): 50% of requests complete within this time
P95: 95% of requests complete within this time
P99: 99% of requests complete within this time
Min/Max: Best and worst case response times

Prerequisites

Dependencies are automatically installed when running setup:
- Locust (load testing framework)
- sseclient-py (SSE client library)
Complete Dify setup:

   # Run the complete setup
   python scripts/stress-test/setup_all.py

Ensure services are running:

IMPORTANT: For accurate stress testing, run the API server with Gunicorn in production mode:

   # Run from the api directory
   cd api
   uv run gunicorn \
     --bind 0.0.0.0:5001 \
     --workers 4 \
     --worker-class gevent \
     --timeout 120 \
     --keep-alive 5 \
     --log-level info \
     --access-logfile - \
     --error-logfile - \
     app:app

Configuration options explained:

--workers 4: Number of worker processes (adjust based on CPU cores)
--worker-class gevent: Async worker for handling concurrent connections
--timeout 120: Worker timeout for long-running requests
--keep-alive 5: Keep connections alive for SSE streaming

NOT RECOMMENDED for stress testing:

   # Debug mode - DO NOT use for stress testing (slow performance)
   ./dev/start-api  # This runs Flask in debug mode with single-threaded execution

Also start the Mock OpenAI server:

   python scripts/stress-test/setup/mock_openai_server.py

Running the Stress Test

# Run with default configuration (headless mode)
./scripts/stress-test/run_locust_stress_test.sh

# Or run directly with uv
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001

# Run with Web UI (access at http://localhost:8089)
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089

The script will:

Validate that all required services are running
Check API token availability
Execute the Locust stress test with SSE support
Generate comprehensive reports in the reports/ directory

Configuration

The stress test configuration is in locust.conf:

users = 10           # Number of concurrent users
spawn-rate = 2       # Users spawned per second
run-time = 1m        # Test duration (30s, 5m, 1h)
headless = true      # Run without web UI

Custom Question Sets

Modify the questions list in sse_benchmark.py:

self.questions = [
    "Your custom question 1",
    "Your custom question 2",
    # Add more questions...
]

Understanding the Results

Report Structure

After running the stress test, you’ll find these files in the reports/ directory:

locust_summary_YYYYMMDD_HHMMSS.txt - Complete console output with metrics
locust_report_YYYYMMDD_HHMMSS.html - Interactive HTML report with charts
locust_YYYYMMDD_HHMMSS_stats.csv - CSV with detailed statistics
locust_YYYYMMDD_HHMMSS_stats_history.csv - Time-series data

Key Metrics

Requests Per Second (RPS):

Excellent: > 50 RPS
Good: 20-50 RPS
Acceptable: 10-20 RPS
Needs Improvement: < 10 RPS

Response Time Percentiles:

P50 (Median): 50% of requests complete within this time
P95: 95% of requests complete within this time
P99: 99% of requests complete within this time

Success Rate:

Should be > 99% for production readiness
Lower rates indicate errors or timeouts

Example Output

============================================================
DIFY SSE STRESS TEST
============================================================

[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec

============================================================
SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
============================================================

Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00

============================================================
FINAL RESULTS
============================================================
Total Connections: 142
Total Events:      2841
Average TTFE:      43 ms
============================================================

How to Read the Results

Live SSE Metrics Box (Updates every 10 seconds):

SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms

Active: Current number of open SSE connections
Total Conn: Cumulative connections established
Events: Total SSE events received
conn/s: Connection establishment rate
events/s: Event delivery rate
TTFE: Average time to first event

Standard Locust Table:

Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37

Type: Always POST for our SSE requests
Name: The API endpoint being tested
# reqs: Total requests made
# fails: Failed requests (should be 0)
Avg/Min/Max/Med: Response time percentiles (ms)
req/s: Request throughput

Performance Targets:

✅ Good Performance:

Zero failures (0.00%)
TTFE < 100ms
Stable active connections
Consistent event throughput

⚠️ Warning Signs:

Failures > 1%
TTFE > 500ms
Dropping active connections
Declining event rate over time

Test Scenarios

Light Load

concurrency: 10
iterations: 100

Normal Load

concurrency: 100
iterations: 1000

Heavy Load

concurrency: 500
iterations: 5000

Stress Test

concurrency: 1000
iterations: 10000

Performance Tuning

API Server Optimization

Gunicorn Tuning for Different Load Levels:

# Light load (10-50 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app

# Medium load (50-200 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app

# Heavy load (200-1000 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app

Worker calculation formula:

Workers = (2 × CPU cores) + 1
For SSE/WebSocket: Use gevent worker class
For CPU-bound tasks: Use sync workers

Database Optimization

PostgreSQL Connection Pool Tuning:

For high-concurrency stress testing, increase the PostgreSQL max connections in docker/middleware.env:

# Edit docker/middleware.env
POSTGRES_MAX_CONNECTIONS=200  # Default is 100

# Recommended values for different load levels:
# Light load (10-50 users): 100 (default)
# Medium load (50-200 users): 200
# Heavy load (200-1000 users): 500

After changing, restart the PostgreSQL container:

docker compose -f docker/docker-compose.middleware.yaml down db
docker compose -f docker/docker-compose.middleware.yaml up -d db

Note: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:

100 connections: ~1GB RAM
200 connections: ~2GB RAM
500 connections: ~5GB RAM

System Optimizations

Increase file descriptor limits:

   ulimit -n 65536

TCP tuning for high concurrency (Linux):

   # Increase TCP buffer sizes
   sudo sysctl -w net.core.rmem_max=134217728
   sudo sysctl -w net.core.wmem_max=134217728

   # Enable TCP fast open
   sudo sysctl -w net.ipv4.tcp_fastopen=3

macOS specific:

   # Increase maximum connections
   sudo sysctl -w kern.ipc.somaxconn=2048

Troubleshooting

Common Issues

“ModuleNotFoundError: No module named ‘locust’”:

   # Dependencies are installed automatically, but if needed:
   uv --project api add --dev locust sseclient-py

“API key configuration not found”:

   # Run setup
   python scripts/stress-test/setup_all.py

Services not running:

   # Start Dify API with Gunicorn (production mode)
   cd api
   uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app

   # Start Mock OpenAI server
   python scripts/stress-test/setup/mock_openai_server.py

High error rate:
- Reduce concurrency level
- Check system resources (CPU, memory)
- Review API server logs for errors
- Increase timeout values if needed
Permission denied running script:

   chmod +x run_benchmark.sh

Advanced Usage

Running Multiple Iterations

# Run stress test 3 times with 60-second intervals
for i in {1..3}; do
    echo "Run $i of 3"
    ./run_locust_stress_test.sh
    sleep 60
done

Custom Locust Options

Run Locust directly with custom options:

# With specific user count and spawn rate
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --users 50 --spawn-rate 5

# Generate CSV reports
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --csv reports/results

# Run for specific duration
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --run-time 5m --headless

Comparing Results

# Compare multiple stress test runs
ls -la reports/stress_test_*.txt | tail -5

Interpreting Performance Issues

High Response Times

Possible causes:

Database query performance
External API latency
Insufficient server resources
Network congestion

Low Throughput (RPS < 10)

Check for:

CPU bottlenecks
Memory constraints
Database connection pooling
API rate limiting

High Error Rate

Investigate:

Server error logs
Resource exhaustion
Timeout configurations
Connection limits

Why Locust?

Locust was chosen over Drill for this stress test because:

Proper SSE Support: Correctly handles streaming responses without premature closure
Custom Metrics: Can track SSE-specific metrics like TTFE and stream duration
Web UI: Real-time monitoring and control via web interface
Python Integration: Seamlessly integrates with existing Python setup code
Extensibility: Easy to customize for specific testing scenarios

Contributing

To improve the stress test suite:

Edit stress_test.yml for configuration changes
Modify run_locust_stress_test.sh for workflow improvements
Update question sets for better coverage
Add new metrics or analysis features

README.md