You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521
  1. # Dify Stress Test Suite
  2. A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance.
  3. ## Key Metrics Tracked
  4. The stress test focuses on four critical SSE performance indicators:
  5. 1. **Active SSE Connections** - Real-time count of open SSE connections
  6. 1. **New Connection Rate** - Connections per second (conn/sec)
  7. 1. **Time to First Event (TTFE)** - Latency until first SSE event arrives
  8. 1. **Event Throughput** - Events per second (events/sec)
  9. ## Features
  10. - **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure
  11. - **Real-time Metrics**: Live reporting every 5 seconds during tests
  12. - **Comprehensive Tracking**:
  13. - Active connection monitoring
  14. - Connection establishment rate
  15. - Event processing throughput
  16. - TTFE distribution analysis
  17. - **Multiple Interfaces**:
  18. - Web UI for real-time monitoring (<http://localhost:8089>)
  19. - Headless mode with periodic console updates
  20. - **Detailed Reports**: Final statistics with overall rates and averages
  21. - **Easy Configuration**: Uses existing API key configuration from setup
  22. ## What Gets Measured
  23. The stress test focuses on SSE streaming performance with these key metrics:
  24. ### Primary Endpoint: `/v1/workflows/run`
  25. The stress test tests a single endpoint with comprehensive SSE metrics tracking:
  26. - **Request Type**: POST request to workflow execution API
  27. - **Response Type**: Server-Sent Events (SSE) stream
  28. - **Payload**: Random questions from a configurable pool
  29. - **Concurrency**: Configurable from 1 to 1000+ simultaneous users
  30. ### Key Performance Metrics
  31. #### 1. **Active Connections**
  32. - **What it measures**: Number of concurrent SSE connections open at any moment
  33. - **Why it matters**: Shows system's ability to handle parallel streams
  34. - **Good values**: Should remain stable under load without drops
  35. #### 2. **Connection Rate (conn/sec)**
  36. - **What it measures**: How fast new SSE connections are established
  37. - **Why it matters**: Indicates system's ability to handle connection spikes
  38. - **Good values**:
  39. - Light load: 5-10 conn/sec
  40. - Medium load: 20-50 conn/sec
  41. - Heavy load: 100+ conn/sec
  42. #### 3. **Time to First Event (TTFE)**
  43. - **What it measures**: Latency from request sent to first SSE event received
  44. - **Why it matters**: Critical for user experience - faster TTFE = better perceived performance
  45. - **Good values**:
  46. - Excellent: < 50ms
  47. - Good: 50-100ms
  48. - Acceptable: 100-500ms
  49. - Poor: > 500ms
  50. #### 4. **Event Throughput (events/sec)**
  51. - **What it measures**: Rate of SSE events being delivered across all connections
  52. - **Why it matters**: Shows actual data delivery performance
  53. - **Expected values**: Depends on workflow complexity and number of connections
  54. - Single connection: 10-20 events/sec
  55. - 10 connections: 50-100 events/sec
  56. - 100 connections: 200-500 events/sec
  57. #### 5. **Request/Response Times**
  58. - **P50 (Median)**: 50% of requests complete within this time
  59. - **P95**: 95% of requests complete within this time
  60. - **P99**: 99% of requests complete within this time
  61. - **Min/Max**: Best and worst case response times
  62. ## Prerequisites
  63. 1. **Dependencies are automatically installed** when running setup:
  64. - Locust (load testing framework)
  65. - sseclient-py (SSE client library)
  66. 1. **Complete Dify setup**:
  67. ```bash
  68. # Run the complete setup
  69. python scripts/stress-test/setup_all.py
  70. ```
  71. 1. **Ensure services are running**:
  72. **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode:
  73. ```bash
  74. # Run from the api directory
  75. cd api
  76. uv run gunicorn \
  77. --bind 0.0.0.0:5001 \
  78. --workers 4 \
  79. --worker-class gevent \
  80. --timeout 120 \
  81. --keep-alive 5 \
  82. --log-level info \
  83. --access-logfile - \
  84. --error-logfile - \
  85. app:app
  86. ```
  87. **Configuration options explained**:
  88. - `--workers 4`: Number of worker processes (adjust based on CPU cores)
  89. - `--worker-class gevent`: Async worker for handling concurrent connections
  90. - `--timeout 120`: Worker timeout for long-running requests
  91. - `--keep-alive 5`: Keep connections alive for SSE streaming
  92. **NOT RECOMMENDED for stress testing**:
  93. ```bash
  94. # Debug mode - DO NOT use for stress testing (slow performance)
  95. ./dev/start-api # This runs Flask in debug mode with single-threaded execution
  96. ```
  97. **Also start the Mock OpenAI server**:
  98. ```bash
  99. python scripts/stress-test/setup/mock_openai_server.py
  100. ```
  101. ## Running the Stress Test
  102. ```bash
  103. # Run with default configuration (headless mode)
  104. ./scripts/stress-test/run_locust_stress_test.sh
  105. # Or run directly with uv
  106. uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
  107. # Run with Web UI (access at http://localhost:8089)
  108. uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
  109. ```
  110. The script will:
  111. 1. Validate that all required services are running
  112. 1. Check API token availability
  113. 1. Execute the Locust stress test with SSE support
  114. 1. Generate comprehensive reports in the `reports/` directory
  115. ## Configuration
  116. The stress test configuration is in `locust.conf`:
  117. ```ini
  118. users = 10 # Number of concurrent users
  119. spawn-rate = 2 # Users spawned per second
  120. run-time = 1m # Test duration (30s, 5m, 1h)
  121. headless = true # Run without web UI
  122. ```
  123. ### Custom Question Sets
  124. Modify the questions list in `sse_benchmark.py`:
  125. ```python
  126. self.questions = [
  127. "Your custom question 1",
  128. "Your custom question 2",
  129. # Add more questions...
  130. ]
  131. ```
  132. ## Understanding the Results
  133. ### Report Structure
  134. After running the stress test, you'll find these files in the `reports/` directory:
  135. - `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics
  136. - `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts
  137. - `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics
  138. - `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data
  139. ### Key Metrics
  140. **Requests Per Second (RPS)**:
  141. - **Excellent**: > 50 RPS
  142. - **Good**: 20-50 RPS
  143. - **Acceptable**: 10-20 RPS
  144. - **Needs Improvement**: < 10 RPS
  145. **Response Time Percentiles**:
  146. - **P50 (Median)**: 50% of requests complete within this time
  147. - **P95**: 95% of requests complete within this time
  148. - **P99**: 99% of requests complete within this time
  149. **Success Rate**:
  150. - Should be > 99% for production readiness
  151. - Lower rates indicate errors or timeouts
  152. ### Example Output
  153. ```text
  154. ============================================================
  155. DIFY SSE STRESS TEST
  156. ============================================================
  157. [2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
  158. ============================================================
  159. SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841
  160. Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
  161. ============================================================
  162. Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
  163. ---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
  164. POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37 0.00
  165. ---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
  166. Aggregated 142 0(0.00%) | 41 18 192 38 | 2.37 0.00
  167. ============================================================
  168. FINAL RESULTS
  169. ============================================================
  170. Total Connections: 142
  171. Total Events: 2841
  172. Average TTFE: 43 ms
  173. ============================================================
  174. ```
  175. ### How to Read the Results
  176. **Live SSE Metrics Box (Updates every 10 seconds):**
  177. ```text
  178. SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841
  179. Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
  180. ```
  181. - **Active**: Current number of open SSE connections
  182. - **Total Conn**: Cumulative connections established
  183. - **Events**: Total SSE events received
  184. - **conn/s**: Connection establishment rate
  185. - **events/s**: Event delivery rate
  186. - **TTFE**: Average time to first event
  187. **Standard Locust Table:**
  188. ```text
  189. Type Name # reqs # fails | Avg Min Max Med | req/s
  190. POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37
  191. ```
  192. - **Type**: Always POST for our SSE requests
  193. - **Name**: The API endpoint being tested
  194. - **# reqs**: Total requests made
  195. - **# fails**: Failed requests (should be 0)
  196. - **Avg/Min/Max/Med**: Response time percentiles (ms)
  197. - **req/s**: Request throughput
  198. **Performance Targets:**
  199. ✅ **Good Performance**:
  200. - Zero failures (0.00%)
  201. - TTFE < 100ms
  202. - Stable active connections
  203. - Consistent event throughput
  204. ⚠️ **Warning Signs**:
  205. - Failures > 1%
  206. - TTFE > 500ms
  207. - Dropping active connections
  208. - Declining event rate over time
  209. ## Test Scenarios
  210. ### Light Load
  211. ```yaml
  212. concurrency: 10
  213. iterations: 100
  214. ```
  215. ### Normal Load
  216. ```yaml
  217. concurrency: 100
  218. iterations: 1000
  219. ```
  220. ### Heavy Load
  221. ```yaml
  222. concurrency: 500
  223. iterations: 5000
  224. ```
  225. ### Stress Test
  226. ```yaml
  227. concurrency: 1000
  228. iterations: 10000
  229. ```
  230. ## Performance Tuning
  231. ### API Server Optimization
  232. **Gunicorn Tuning for Different Load Levels**:
  233. ```bash
  234. # Light load (10-50 concurrent users)
  235. uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
  236. # Medium load (50-200 concurrent users)
  237. uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
  238. # Heavy load (200-1000 concurrent users)
  239. uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
  240. ```
  241. **Worker calculation formula**:
  242. - Workers = (2 × CPU cores) + 1
  243. - For SSE/WebSocket: Use gevent worker class
  244. - For CPU-bound tasks: Use sync workers
  245. ### Database Optimization
  246. **PostgreSQL Connection Pool Tuning**:
  247. For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`:
  248. ```bash
  249. # Edit docker/middleware.env
  250. POSTGRES_MAX_CONNECTIONS=200 # Default is 100
  251. # Recommended values for different load levels:
  252. # Light load (10-50 users): 100 (default)
  253. # Medium load (50-200 users): 200
  254. # Heavy load (200-1000 users): 500
  255. ```
  256. After changing, restart the PostgreSQL container:
  257. ```bash
  258. docker compose -f docker/docker-compose.middleware.yaml down db
  259. docker compose -f docker/docker-compose.middleware.yaml up -d db
  260. ```
  261. **Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
  262. - 100 connections: ~1GB RAM
  263. - 200 connections: ~2GB RAM
  264. - 500 connections: ~5GB RAM
  265. ### System Optimizations
  266. 1. **Increase file descriptor limits**:
  267. ```bash
  268. ulimit -n 65536
  269. ```
  270. 1. **TCP tuning for high concurrency** (Linux):
  271. ```bash
  272. # Increase TCP buffer sizes
  273. sudo sysctl -w net.core.rmem_max=134217728
  274. sudo sysctl -w net.core.wmem_max=134217728
  275. # Enable TCP fast open
  276. sudo sysctl -w net.ipv4.tcp_fastopen=3
  277. ```
  278. 1. **macOS specific**:
  279. ```bash
  280. # Increase maximum connections
  281. sudo sysctl -w kern.ipc.somaxconn=2048
  282. ```
  283. ## Troubleshooting
  284. ### Common Issues
  285. 1. **"ModuleNotFoundError: No module named 'locust'"**:
  286. ```bash
  287. # Dependencies are installed automatically, but if needed:
  288. uv --project api add --dev locust sseclient-py
  289. ```
  290. 1. **"API key configuration not found"**:
  291. ```bash
  292. # Run setup
  293. python scripts/stress-test/setup_all.py
  294. ```
  295. 1. **Services not running**:
  296. ```bash
  297. # Start Dify API with Gunicorn (production mode)
  298. cd api
  299. uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app
  300. # Start Mock OpenAI server
  301. python scripts/stress-test/setup/mock_openai_server.py
  302. ```
  303. 1. **High error rate**:
  304. - Reduce concurrency level
  305. - Check system resources (CPU, memory)
  306. - Review API server logs for errors
  307. - Increase timeout values if needed
  308. 1. **Permission denied running script**:
  309. ```bash
  310. chmod +x run_benchmark.sh
  311. ```
  312. ## Advanced Usage
  313. ### Running Multiple Iterations
  314. ```bash
  315. # Run stress test 3 times with 60-second intervals
  316. for i in {1..3}; do
  317. echo "Run $i of 3"
  318. ./run_locust_stress_test.sh
  319. sleep 60
  320. done
  321. ```
  322. ### Custom Locust Options
  323. Run Locust directly with custom options:
  324. ```bash
  325. # With specific user count and spawn rate
  326. uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  327. --host http://localhost:5001 --users 50 --spawn-rate 5
  328. # Generate CSV reports
  329. uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  330. --host http://localhost:5001 --csv reports/results
  331. # Run for specific duration
  332. uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  333. --host http://localhost:5001 --run-time 5m --headless
  334. ```
  335. ### Comparing Results
  336. ```bash
  337. # Compare multiple stress test runs
  338. ls -la reports/stress_test_*.txt | tail -5
  339. ```
  340. ## Interpreting Performance Issues
  341. ### High Response Times
  342. Possible causes:
  343. - Database query performance
  344. - External API latency
  345. - Insufficient server resources
  346. - Network congestion
  347. ### Low Throughput (RPS < 10)
  348. Check for:
  349. - CPU bottlenecks
  350. - Memory constraints
  351. - Database connection pooling
  352. - API rate limiting
  353. ### High Error Rate
  354. Investigate:
  355. - Server error logs
  356. - Resource exhaustion
  357. - Timeout configurations
  358. - Connection limits
  359. ## Why Locust?
  360. Locust was chosen over Drill for this stress test because:
  361. 1. **Proper SSE Support**: Correctly handles streaming responses without premature closure
  362. 1. **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration
  363. 1. **Web UI**: Real-time monitoring and control via web interface
  364. 1. **Python Integration**: Seamlessly integrates with existing Python setup code
  365. 1. **Extensibility**: Easy to customize for specific testing scenarios
  366. ## Contributing
  367. To improve the stress test suite:
  368. 1. Edit `stress_test.yml` for configuration changes
  369. 1. Modify `run_locust_stress_test.sh` for workflow improvements
  370. 1. Update question sets for better coverage
  371. 1. Add new metrics or analysis features