|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521 |
- # Dify Stress Test Suite
-
- A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance.
-
- ## Key Metrics Tracked
-
- The stress test focuses on four critical SSE performance indicators:
-
- 1. **Active SSE Connections** - Real-time count of open SSE connections
- 1. **New Connection Rate** - Connections per second (conn/sec)
- 1. **Time to First Event (TTFE)** - Latency until first SSE event arrives
- 1. **Event Throughput** - Events per second (events/sec)
-
- ## Features
-
- - **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure
- - **Real-time Metrics**: Live reporting every 5 seconds during tests
- - **Comprehensive Tracking**:
- - Active connection monitoring
- - Connection establishment rate
- - Event processing throughput
- - TTFE distribution analysis
- - **Multiple Interfaces**:
- - Web UI for real-time monitoring (<http://localhost:8089>)
- - Headless mode with periodic console updates
- - **Detailed Reports**: Final statistics with overall rates and averages
- - **Easy Configuration**: Uses existing API key configuration from setup
-
- ## What Gets Measured
-
- The stress test focuses on SSE streaming performance with these key metrics:
-
- ### Primary Endpoint: `/v1/workflows/run`
-
- The stress test tests a single endpoint with comprehensive SSE metrics tracking:
-
- - **Request Type**: POST request to workflow execution API
- - **Response Type**: Server-Sent Events (SSE) stream
- - **Payload**: Random questions from a configurable pool
- - **Concurrency**: Configurable from 1 to 1000+ simultaneous users
-
- ### Key Performance Metrics
-
- #### 1. **Active Connections**
-
- - **What it measures**: Number of concurrent SSE connections open at any moment
- - **Why it matters**: Shows system's ability to handle parallel streams
- - **Good values**: Should remain stable under load without drops
-
- #### 2. **Connection Rate (conn/sec)**
-
- - **What it measures**: How fast new SSE connections are established
- - **Why it matters**: Indicates system's ability to handle connection spikes
- - **Good values**:
- - Light load: 5-10 conn/sec
- - Medium load: 20-50 conn/sec
- - Heavy load: 100+ conn/sec
-
- #### 3. **Time to First Event (TTFE)**
-
- - **What it measures**: Latency from request sent to first SSE event received
- - **Why it matters**: Critical for user experience - faster TTFE = better perceived performance
- - **Good values**:
- - Excellent: < 50ms
- - Good: 50-100ms
- - Acceptable: 100-500ms
- - Poor: > 500ms
-
- #### 4. **Event Throughput (events/sec)**
-
- - **What it measures**: Rate of SSE events being delivered across all connections
- - **Why it matters**: Shows actual data delivery performance
- - **Expected values**: Depends on workflow complexity and number of connections
- - Single connection: 10-20 events/sec
- - 10 connections: 50-100 events/sec
- - 100 connections: 200-500 events/sec
-
- #### 5. **Request/Response Times**
-
- - **P50 (Median)**: 50% of requests complete within this time
- - **P95**: 95% of requests complete within this time
- - **P99**: 99% of requests complete within this time
- - **Min/Max**: Best and worst case response times
-
- ## Prerequisites
-
- 1. **Dependencies are automatically installed** when running setup:
-
- - Locust (load testing framework)
- - sseclient-py (SSE client library)
-
- 1. **Complete Dify setup**:
-
- ```bash
- # Run the complete setup
- python scripts/stress-test/setup_all.py
- ```
-
- 1. **Ensure services are running**:
-
- **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode:
-
- ```bash
- # Run from the api directory
- cd api
- uv run gunicorn \
- --bind 0.0.0.0:5001 \
- --workers 4 \
- --worker-class gevent \
- --timeout 120 \
- --keep-alive 5 \
- --log-level info \
- --access-logfile - \
- --error-logfile - \
- app:app
- ```
-
- **Configuration options explained**:
-
- - `--workers 4`: Number of worker processes (adjust based on CPU cores)
- - `--worker-class gevent`: Async worker for handling concurrent connections
- - `--timeout 120`: Worker timeout for long-running requests
- - `--keep-alive 5`: Keep connections alive for SSE streaming
-
- **NOT RECOMMENDED for stress testing**:
-
- ```bash
- # Debug mode - DO NOT use for stress testing (slow performance)
- ./dev/start-api # This runs Flask in debug mode with single-threaded execution
- ```
-
- **Also start the Mock OpenAI server**:
-
- ```bash
- python scripts/stress-test/setup/mock_openai_server.py
- ```
-
- ## Running the Stress Test
-
- ```bash
- # Run with default configuration (headless mode)
- ./scripts/stress-test/run_locust_stress_test.sh
-
- # Or run directly with uv
- uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
-
- # Run with Web UI (access at http://localhost:8089)
- uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
- ```
-
- The script will:
-
- 1. Validate that all required services are running
- 1. Check API token availability
- 1. Execute the Locust stress test with SSE support
- 1. Generate comprehensive reports in the `reports/` directory
-
- ## Configuration
-
- The stress test configuration is in `locust.conf`:
-
- ```ini
- users = 10 # Number of concurrent users
- spawn-rate = 2 # Users spawned per second
- run-time = 1m # Test duration (30s, 5m, 1h)
- headless = true # Run without web UI
- ```
-
- ### Custom Question Sets
-
- Modify the questions list in `sse_benchmark.py`:
-
- ```python
- self.questions = [
- "Your custom question 1",
- "Your custom question 2",
- # Add more questions...
- ]
- ```
-
- ## Understanding the Results
-
- ### Report Structure
-
- After running the stress test, you'll find these files in the `reports/` directory:
-
- - `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics
- - `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts
- - `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics
- - `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data
-
- ### Key Metrics
-
- **Requests Per Second (RPS)**:
-
- - **Excellent**: > 50 RPS
- - **Good**: 20-50 RPS
- - **Acceptable**: 10-20 RPS
- - **Needs Improvement**: < 10 RPS
-
- **Response Time Percentiles**:
-
- - **P50 (Median)**: 50% of requests complete within this time
- - **P95**: 95% of requests complete within this time
- - **P99**: 99% of requests complete within this time
-
- **Success Rate**:
-
- - Should be > 99% for production readiness
- - Lower rates indicate errors or timeouts
-
- ### Example Output
-
- ```text
- ============================================================
- DIFY SSE STRESS TEST
- ============================================================
-
- [2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
-
- ============================================================
- SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841
- Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
- ============================================================
-
- Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
- ---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
- POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37 0.00
- ---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
- Aggregated 142 0(0.00%) | 41 18 192 38 | 2.37 0.00
-
- ============================================================
- FINAL RESULTS
- ============================================================
- Total Connections: 142
- Total Events: 2841
- Average TTFE: 43 ms
- ============================================================
- ```
-
- ### How to Read the Results
-
- **Live SSE Metrics Box (Updates every 10 seconds):**
-
- ```text
- SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841
- Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
- ```
-
- - **Active**: Current number of open SSE connections
- - **Total Conn**: Cumulative connections established
- - **Events**: Total SSE events received
- - **conn/s**: Connection establishment rate
- - **events/s**: Event delivery rate
- - **TTFE**: Average time to first event
-
- **Standard Locust Table:**
-
- ```text
- Type Name # reqs # fails | Avg Min Max Med | req/s
- POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37
- ```
-
- - **Type**: Always POST for our SSE requests
- - **Name**: The API endpoint being tested
- - **# reqs**: Total requests made
- - **# fails**: Failed requests (should be 0)
- - **Avg/Min/Max/Med**: Response time percentiles (ms)
- - **req/s**: Request throughput
-
- **Performance Targets:**
-
- ✅ **Good Performance**:
-
- - Zero failures (0.00%)
- - TTFE < 100ms
- - Stable active connections
- - Consistent event throughput
-
- ⚠️ **Warning Signs**:
-
- - Failures > 1%
- - TTFE > 500ms
- - Dropping active connections
- - Declining event rate over time
-
- ## Test Scenarios
-
- ### Light Load
-
- ```yaml
- concurrency: 10
- iterations: 100
- ```
-
- ### Normal Load
-
- ```yaml
- concurrency: 100
- iterations: 1000
- ```
-
- ### Heavy Load
-
- ```yaml
- concurrency: 500
- iterations: 5000
- ```
-
- ### Stress Test
-
- ```yaml
- concurrency: 1000
- iterations: 10000
- ```
-
- ## Performance Tuning
-
- ### API Server Optimization
-
- **Gunicorn Tuning for Different Load Levels**:
-
- ```bash
- # Light load (10-50 concurrent users)
- uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
-
- # Medium load (50-200 concurrent users)
- uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
-
- # Heavy load (200-1000 concurrent users)
- uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
- ```
-
- **Worker calculation formula**:
-
- - Workers = (2 × CPU cores) + 1
- - For SSE/WebSocket: Use gevent worker class
- - For CPU-bound tasks: Use sync workers
-
- ### Database Optimization
-
- **PostgreSQL Connection Pool Tuning**:
-
- For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`:
-
- ```bash
- # Edit docker/middleware.env
- POSTGRES_MAX_CONNECTIONS=200 # Default is 100
-
- # Recommended values for different load levels:
- # Light load (10-50 users): 100 (default)
- # Medium load (50-200 users): 200
- # Heavy load (200-1000 users): 500
- ```
-
- After changing, restart the PostgreSQL container:
-
- ```bash
- docker compose -f docker/docker-compose.middleware.yaml down db
- docker compose -f docker/docker-compose.middleware.yaml up -d db
- ```
-
- **Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
-
- - 100 connections: ~1GB RAM
- - 200 connections: ~2GB RAM
- - 500 connections: ~5GB RAM
-
- ### System Optimizations
-
- 1. **Increase file descriptor limits**:
-
- ```bash
- ulimit -n 65536
- ```
-
- 1. **TCP tuning for high concurrency** (Linux):
-
- ```bash
- # Increase TCP buffer sizes
- sudo sysctl -w net.core.rmem_max=134217728
- sudo sysctl -w net.core.wmem_max=134217728
-
- # Enable TCP fast open
- sudo sysctl -w net.ipv4.tcp_fastopen=3
- ```
-
- 1. **macOS specific**:
-
- ```bash
- # Increase maximum connections
- sudo sysctl -w kern.ipc.somaxconn=2048
- ```
-
- ## Troubleshooting
-
- ### Common Issues
-
- 1. **"ModuleNotFoundError: No module named 'locust'"**:
-
- ```bash
- # Dependencies are installed automatically, but if needed:
- uv --project api add --dev locust sseclient-py
- ```
-
- 1. **"API key configuration not found"**:
-
- ```bash
- # Run setup
- python scripts/stress-test/setup_all.py
- ```
-
- 1. **Services not running**:
-
- ```bash
- # Start Dify API with Gunicorn (production mode)
- cd api
- uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app
-
- # Start Mock OpenAI server
- python scripts/stress-test/setup/mock_openai_server.py
- ```
-
- 1. **High error rate**:
-
- - Reduce concurrency level
- - Check system resources (CPU, memory)
- - Review API server logs for errors
- - Increase timeout values if needed
-
- 1. **Permission denied running script**:
-
- ```bash
- chmod +x run_benchmark.sh
- ```
-
- ## Advanced Usage
-
- ### Running Multiple Iterations
-
- ```bash
- # Run stress test 3 times with 60-second intervals
- for i in {1..3}; do
- echo "Run $i of 3"
- ./run_locust_stress_test.sh
- sleep 60
- done
- ```
-
- ### Custom Locust Options
-
- Run Locust directly with custom options:
-
- ```bash
- # With specific user count and spawn rate
- uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
- --host http://localhost:5001 --users 50 --spawn-rate 5
-
- # Generate CSV reports
- uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
- --host http://localhost:5001 --csv reports/results
-
- # Run for specific duration
- uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
- --host http://localhost:5001 --run-time 5m --headless
- ```
-
- ### Comparing Results
-
- ```bash
- # Compare multiple stress test runs
- ls -la reports/stress_test_*.txt | tail -5
- ```
-
- ## Interpreting Performance Issues
-
- ### High Response Times
-
- Possible causes:
-
- - Database query performance
- - External API latency
- - Insufficient server resources
- - Network congestion
-
- ### Low Throughput (RPS < 10)
-
- Check for:
-
- - CPU bottlenecks
- - Memory constraints
- - Database connection pooling
- - API rate limiting
-
- ### High Error Rate
-
- Investigate:
-
- - Server error logs
- - Resource exhaustion
- - Timeout configurations
- - Connection limits
-
- ## Why Locust?
-
- Locust was chosen over Drill for this stress test because:
-
- 1. **Proper SSE Support**: Correctly handles streaming responses without premature closure
- 1. **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration
- 1. **Web UI**: Real-time monitoring and control via web interface
- 1. **Python Integration**: Seamlessly integrates with existing Python setup code
- 1. **Extensibility**: Easy to customize for specific testing scenarios
-
- ## Contributing
-
- To improve the stress test suite:
-
- 1. Edit `stress_test.yml` for configuration changes
- 1. Modify `run_locust_stress_test.sh` for workflow improvements
- 1. Update question sets for better coverage
- 1. Add new metrics or analysis features
|