Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.

launch_backend_service.sh 2.8KB

Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
Change launch backend script to handle errors gracefully (#3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
il y a 11 mois
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
  1. #!/bin/bash
  2. # Exit immediately if a command exits with a non-zero status
  3. set -e
  4. # Unset HTTP proxies that might be set by Docker daemon
  5. export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
  6. export PYTHONPATH=$(pwd)
  7. export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/
  8. JEMALLOC_PATH=$(pkg-config --variable=libdir jemalloc)/libjemalloc.so
  9. PY=python3
  10. # Set default number of workers if WS is not set or less than 1
  11. if [[ -z "$WS" || $WS -lt 1 ]]; then
  12. WS=1
  13. fi
  14. # Maximum number of retries for each task executor and server
  15. MAX_RETRIES=5
  16. # Flag to control termination
  17. STOP=false
  18. # Array to keep track of child PIDs
  19. PIDS=()
  20. # Function to handle termination signals
  21. cleanup() {
  22. echo "Termination signal received. Shutting down..."
  23. STOP=true
  24. # Terminate all child processes
  25. for pid in "${PIDS[@]}"; do
  26. if kill -0 "$pid" 2>/dev/null; then
  27. echo "Killing process $pid"
  28. kill "$pid"
  29. fi
  30. done
  31. exit 0
  32. }
  33. # Trap SIGINT and SIGTERM to invoke cleanup
  34. trap cleanup SIGINT SIGTERM
  35. # Function to execute task_executor with retry logic
  36. task_exe(){
  37. local task_id=$1
  38. local retry_count=0
  39. while ! $STOP && [ $retry_count -lt $MAX_RETRIES ]; do
  40. echo "Starting task_executor.py for task $task_id (Attempt $((retry_count+1)))"
  41. LD_PRELOAD=$JEMALLOC_PATH $PY rag/svr/task_executor.py "$task_id"
  42. EXIT_CODE=$?
  43. if [ $EXIT_CODE -eq 0 ]; then
  44. echo "task_executor.py for task $task_id exited successfully."
  45. break
  46. else
  47. echo "task_executor.py for task $task_id failed with exit code $EXIT_CODE. Retrying..." >&2
  48. retry_count=$((retry_count + 1))
  49. sleep 2
  50. fi
  51. done
  52. if [ $retry_count -ge $MAX_RETRIES ]; then
  53. echo "task_executor.py for task $task_id failed after $MAX_RETRIES attempts. Exiting..." >&2
  54. cleanup
  55. fi
  56. }
  57. # Function to execute ragflow_server with retry logic
  58. run_server(){
  59. local retry_count=0
  60. while ! $STOP && [ $retry_count -lt $MAX_RETRIES ]; do
  61. echo "Starting ragflow_server.py (Attempt $((retry_count+1)))"
  62. $PY api/ragflow_server.py
  63. EXIT_CODE=$?
  64. if [ $EXIT_CODE -eq 0 ]; then
  65. echo "ragflow_server.py exited successfully."
  66. break
  67. else
  68. echo "ragflow_server.py failed with exit code $EXIT_CODE. Retrying..." >&2
  69. retry_count=$((retry_count + 1))
  70. sleep 2
  71. fi
  72. done
  73. if [ $retry_count -ge $MAX_RETRIES ]; then
  74. echo "ragflow_server.py failed after $MAX_RETRIES attempts. Exiting..." >&2
  75. cleanup
  76. fi
  77. }
  78. # Start task executors
  79. for ((i=0;i<WS;i++))
  80. do
  81. task_exe "$i" &
  82. PIDS+=($!)
  83. done
  84. # Start the main server
  85. run_server &
  86. PIDS+=($!)
  87. # Wait for all background processes to finish
  88. wait