Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
6 місяці тому
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
6 місяці тому
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
  1. #
  2. # Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
  3. #
  4. # Licensed under the Apache License, Version 2.0 (the "License");
  5. # you may not use this file except in compliance with the License.
  6. # You may obtain a copy of the License at
  7. #
  8. # http://www.apache.org/licenses/LICENSE-2.0
  9. #
  10. # Unless required by applicable law or agreed to in writing, software
  11. # distributed under the License is distributed on an "AS IS" BASIS,
  12. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. # See the License for the specific language governing permissions and
  14. # limitations under the License.
  15. #
  16. import os
  17. import logging
  18. from api.utils import get_base_config, decrypt_database_config
  19. from api.utils.file_utils import get_project_base_directory
  20. # Server
  21. RAG_CONF_PATH = os.path.join(get_project_base_directory(), "conf")
  22. # Get storage type and document engine from system environment variables
  23. STORAGE_IMPL_TYPE = os.getenv('STORAGE_IMPL', 'MINIO')
  24. DOC_ENGINE = os.getenv('DOC_ENGINE', 'elasticsearch')
  25. ES = {}
  26. INFINITY = {}
  27. AZURE = {}
  28. S3 = {}
  29. MINIO = {}
  30. OSS = {}
  31. OS = {}
  32. # Initialize the selected configuration data based on environment variables to solve the problem of initialization errors due to lack of configuration
  33. if DOC_ENGINE == 'elasticsearch':
  34. ES = get_base_config("es", {})
  35. elif DOC_ENGINE == 'opensearch':
  36. OS = get_base_config("os", {})
  37. elif DOC_ENGINE == 'infinity':
  38. INFINITY = get_base_config("infinity", {"uri": "infinity:23817"})
  39. if STORAGE_IMPL_TYPE in ['AZURE_SPN', 'AZURE_SAS']:
  40. AZURE = get_base_config("azure", {})
  41. elif STORAGE_IMPL_TYPE == 'AWS_S3':
  42. S3 = get_base_config("s3", {})
  43. elif STORAGE_IMPL_TYPE == 'MINIO':
  44. MINIO = decrypt_database_config(name="minio")
  45. elif STORAGE_IMPL_TYPE == 'OSS':
  46. OSS = get_base_config("oss", {})
  47. try:
  48. REDIS = decrypt_database_config(name="redis")
  49. except Exception:
  50. REDIS = {}
  51. pass
  52. DOC_MAXIMUM_SIZE = int(os.environ.get("MAX_CONTENT_LENGTH", 128 * 1024 * 1024))
  53. SVR_QUEUE_NAME = "rag_flow_svr_queue"
  54. SVR_CONSUMER_GROUP_NAME = "rag_flow_svr_task_broker"
  55. PAGERANK_FLD = "pagerank_fea"
  56. TAG_FLD = "tag_feas"
  57. PARALLEL_DEVICES = 0
  58. try:
  59. import torch.cuda
  60. PARALLEL_DEVICES = torch.cuda.device_count()
  61. logging.info(f"found {PARALLEL_DEVICES} gpus")
  62. except Exception:
  63. logging.info("can't import package 'torch'")
  64. def print_rag_settings():
  65. logging.info(f"MAX_CONTENT_LENGTH: {DOC_MAXIMUM_SIZE}")
  66. logging.info(f"MAX_FILE_COUNT_PER_USER: {int(os.environ.get('MAX_FILE_NUM_PER_USER', 0))}")
  67. def get_svr_queue_name(priority: int) -> str:
  68. if priority == 0:
  69. return SVR_QUEUE_NAME
  70. return f"{SVR_QUEUE_NAME}_{priority}"
  71. def get_svr_queue_names():
  72. return [get_svr_queue_name(priority) for priority in [1, 0]]