### What problem does this PR solve? Added doc for switching elasticsearch to infinity ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update

11 meses atrás · 9d395ab74e
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
          echo "RAGFLOW_IMAGE=infiniflow/ragflow:dev" >> docker/.env
          sudo docker compose -f docker/docker-compose.yml up -d
      - name: Run tests
      - name: Run tests against Elasticsearch
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
        if: always()  # always run this step even if previous steps failed
        run: |
          sudo docker compose -f docker/docker-compose.yml down -v
      - name: Start ragflow:dev
        run: |
          sudo DOC_ENGINE=infinity docker compose -f docker/docker-compose.yml up -d
      - name: Run tests against Infinity
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          cd sdk/python && poetry install && source .venv/bin/activate && cd test && pytest --tb=short t_dataset.py t_chat.py t_session.py t_document.py t_chunk.py
      - name: Stop ragflow:dev
        if: always()  # always run this step even if previous steps failed
        run: |
          sudo DOC_ENGINE=infinity docker compose -f docker/docker-compose.yml down -v
--- a/README.md
+++ b/README.md
   $ docker compose -f docker-compose.yml up -d
   ```
   > - To download a RAGFlow slim Docker image of a specific version, update the `RAGFlow_IMAGE` variable in *
   > - To download a RAGFlow slim Docker image of a specific version, update the `RAGFLOW_IMAGE` variable in *
       *docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.13.0-slim`. After
       making this change, rerun the command above to initiate the download.
   > - To download the dev version of RAGFlow Docker image *including* embedding models and Python libraries, update the
       `RAGFlow_IMAGE` variable in **docker/.env** to `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. After making this change,
       `RAGFLOW_IMAGE` variable in **docker/.env** to `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. After making this change,
       rerun the command above to initiate the download.
   > - To download a specific version of RAGFlow Docker image *including* embedding models and Python libraries, update
       the `RAGFlow_IMAGE` variable in **docker/.env** to your desired version. For example,
       the `RAGFLOW_IMAGE` variable in **docker/.env** to your desired version. For example,
       `RAGFLOW_IMAGE=infiniflow/ragflow:v0.13.0`. After making this change, rerun the command above to initiate the
       download.
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network abnormal`
   > If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anormal`
   error because, at that moment, your RAGFlow may not be fully initialized.
 5. In your web browser, enter the IP address of your server and log in to RAGFlow.
 > $ docker compose -f docker/docker-compose.yml up -d
 > ```
 ### Switch doc engine from Elasticsearch to Infinity
 RAGFlow uses Elasticsearch by default for storing full text and vectors. To switch to [Infinity](https://github.com/infiniflow/infinity/), follow these steps:
 1. Stop all running containers:
   ```bash
   $ docker compose -f docker/docker-compose.yml down -v
   ```
 2. Set `DOC_ENGINE` in **docker/.env** to `infinity`.
 3. Start the containers:
   ```bash
   $ docker compose -f docker/docker-compose.yml up -d
   ```
 > [!WARNING] 
 > Switching to Infinity on a Linux/arm64 machine is not yet officially supported.
 ## 🔧 Build a Docker image without embedding models
 This image is approximately 1 GB in size and relies on external LLM and embedding services.
--- a/README_id.md
+++ b/README_id.md
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > Jika Anda melewatkan langkah ini dan langsung login ke RAGFlow, browser Anda mungkin menampilkan error `network abnormal`
   > Jika Anda melewatkan langkah ini dan langsung login ke RAGFlow, browser Anda mungkin menampilkan error `network anormal`
   karena RAGFlow mungkin belum sepenuhnya siap.
 5. Buka browser web Anda, masukkan alamat IP server Anda, dan login ke RAGFlow.
--- a/README_ko.md
+++ b/README_ko.md
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > 만약 확인 단계를 건너뛰고 바로 RAGFlow에 로그인하면, RAGFlow가 완전히 초기화되지 않았기 때문에 브라우저에서 `network abnormal` 오류가 발생할 수 있습니다.
   > 만약 확인 단계를 건너뛰고 바로 RAGFlow에 로그인하면, RAGFlow가 완전히 초기화되지 않았기 때문에 브라우저에서 `network anormal` 오류가 발생할 수 있습니다.
 5. 웹 브라우저에 서버의 IP 주소를 입력하고 RAGFlow에 로그인하세요.
   > 기본 설정을 사용할 경우, `http://IP_OF_YOUR_MACHINE`만 입력하면 됩니다 (포트 번호는 제외). 기본 HTTP 서비스 포트 `80`은 기본 구성으로 사용할 때 생략할 수 있습니다.
--- a/README_zh.md
+++ b/README_zh.md
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > 如果您跳过这一步系统确认步骤就登录 RAGFlow，你的浏览器有可能会提示 `network abnormal` 或 `网络异常`，因为 RAGFlow 可能并未完全启动成功。
   > 如果您跳过这一步系统确认步骤就登录 RAGFlow，你的浏览器有可能会提示 `network anormal` 或 `网络异常`，因为 RAGFlow 可能并未完全启动成功。
 5. 在你的浏览器中输入你的服务器对应的 IP 地址并登录 RAGFlow。
   > 上面这个例子中，您只需输入 http://IP_OF_YOUR_MACHINE 即可：未改动过配置则无需输入端口（默认的 HTTP 服务端口 80）。
--- a/api/settings.py
+++ b/api/settings.py
 PRIVILEGE_COMMAND_WHITELIST = []
 CHECK_NODES_IDENTITY = False
 if 'hosts' in get_base_config("es", {}):
 DOC_ENGINE = os.environ.get('DOC_ENGINE', "elasticsearch")
 if DOC_ENGINE == "elasticsearch":
    docStoreConn = rag.utils.es_conn.ESConnection()
 else:
 elif DOC_ENGINE == "infinity":
    docStoreConn = rag.utils.infinity_conn.InfinityConnection()
 else:
    raise Exception(f"Not supported doc engine: {DOC_ENGINE}")
 retrievaler = search.Dealer(docStoreConn)
 kg_retrievaler = kg_search.KGSearch(docStoreConn)
--- a/conf/service_conf.yaml
+++ b/conf/service_conf.yaml
  user: 'root'
  password: 'infini_rag_flow'
  host: 'mysql'
  port: 3306
  port: 5455
  max_connections: 100
  stale_timeout: 30
 minio:
  password: 'infini_rag_flow'
  host: 'minio:9000'
 es:
  hosts: 'http://es01:9200'
  hosts: 'http://es01:1200'
  username: 'elastic'
  password: 'infini_rag_flow'
 redis:
--- a/docker/.env
+++ b/docker/.env
 # The type of doc engine to use.
 # Supported values are `elasticsearch`, `infinity`.
 DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
 # ------------------------------
 # docker env var for specifying vector db type at startup
 # (based on the vector db type, the corresponding docker
 # compose profile will be used)
 # ------------------------------
 COMPOSE_PROFILES=${DOC_ENGINE}
 # The version of Elasticsearch.
 STACK_VERSION=8.11.3
--- a/docker/docker-compose-base.yml
+++ b/docker/docker-compose-base.yml
 services:
  es01:
    container_name: ragflow-es-01
    profiles:
      - elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    volumes:
      - esdata01:/usr/share/elasticsearch/data
      - ragflow
    restart: on-failure
  # infinity:
  #   container_name: ragflow-infinity
  #   image: infiniflow/infinity:v0.5.0-dev2
  #   volumes:
  #     - infinity_data:/var/infinity
  #   ports:
  #     - ${INFINITY_THRIFT_PORT}:23817
  #     - ${INFINITY_HTTP_PORT}:23820
  #     - ${INFINITY_PSQL_PORT}:5432
  #   env_file: .env
  #   environment:
  #     - TZ=${TIMEZONE}
  #   mem_limit: ${MEM_LIMIT}
  #   ulimits:
  #     nofile:
  #       soft: 500000
  #       hard: 500000
  #   networks:
  #     - ragflow
  #   healthcheck:
  #     test: ["CMD", "curl", "http://localhost:23820/admin/node/current"]
  #     interval: 10s
  #     timeout: 10s
  #     retries: 120
  #   restart: on-failure
  infinity:
    container_name: ragflow-infinity
    profiles:
      - infinity
    image: infiniflow/infinity:v0.5.0-dev2
    volumes:
      - infinity_data:/var/infinity
    ports:
      - ${INFINITY_THRIFT_PORT}:23817
      - ${INFINITY_HTTP_PORT}:23820
      - ${INFINITY_PSQL_PORT}:5432
    env_file: .env
    environment:
      - TZ=${TIMEZONE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      nofile:
        soft: 500000
        hard: 500000
    networks:
      - ragflow
    healthcheck:
      test: ["CMD", "curl", "http://localhost:23820/admin/node/current"]
      interval: 10s
      timeout: 10s
      retries: 120
    restart: on-failure
  mysql:
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: ${RAGFLOW_IMAGE}
    container_name: ragflow-server
    ports:
--- a/docker/service_conf.yaml.template
+++ b/docker/service_conf.yaml.template
 es:
  hosts: 'http://${ES_HOST:-es01}:9200'
  username: '${ES_USER:-elastic}'
  password: '${ES_PASSWORD:-infini_rag_flow}'
  password: '${ELASTIC_PASSWORD:-infini_rag_flow}'
 redis:
  db: 1
  password: '${REDIS_PASSWORD:-infini_rag_flow}' 
--- a/docs/guides/develop/build_docker_image.mdx
+++ b/docs/guides/develop/build_docker_image.mdx
 ```
  </TabItem>
  <TabItem value="linux/arm64">
 ## 🔧 Build a Docker image for linux arm64
 We are currently unable to regularly build multi-arch images with CI and have no plans to publish arm64 images in the near future.  
 However, you can build an image yourself on a linux/arm64 host machine:
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 pip3 install huggingface-hub nltk
 python3 download_deps.py
 docker build --build-arg ARCH=arm64 -f Dockerfile.slim -t infiniflow/ragflow:dev-slim .
 docker build --build-arg ARCH=arm64 -f Dockerfile -t infiniflow/ragflow:dev .
 ```
  </TabItem>
 </Tabs>
--- a/rag/utils/es_conn.py
+++ b/rag/utils/es_conn.py
 import os
 from typing import List, Dict
 import elasticsearch
 import copy
 from elasticsearch import Elasticsearch
 from elasticsearch_dsl import UpdateByQuery, Q, Search, Index
 from rag.utils.doc_store_conn import DocStoreConnection, MatchExpr, OrderByExpr, MatchTextExpr, MatchDenseExpr, FusionExpr
 from rag.nlp import is_english, rag_tokenizer
 logger.info("Elasticsearch sdk version: "+str(elasticsearch.__version__))
@singleton
 class ESConnection(DocStoreConnection):
    def __init__(self):
        self.info = {}
        for _ in range(10):
        logger.info(f"Use Elasticsearch {settings.ES['hosts']} as the doc engine.")
        for _ in range(24):
            try:
                self.es = Elasticsearch(
                    settings.ES["hosts"].split(","),
                )
                if self.es:
                    self.info = self.es.info()
                    logger.info("Connect to es.")
                    break
            except Exception:
                logger.exception("Fail to connect to es")
                time.sleep(1)
            except Exception as e:
                logger.warn(f"{str(e)}. Waiting Elasticsearch {settings.ES['hosts']} to be healthy.")
                time.sleep(5)
        if not self.es.ping():
            raise Exception("Can't connect to ES cluster")
        v = self.info.get("version", {"number": "5.6"})
            msg = f"Elasticsearch {settings.ES['hosts']} didn't become healthy in 120s."
            logger.error(msg)
            raise Exception(msg)
        v = self.info.get("version", {"number": "8.11.3"})
        v = v["number"].split(".")[0]
        if int(v) < 8:
            raise Exception(f"ES version must be greater than or equal to 8, current version: {v}")
            msg = f"Elasticsearch version must be greater than or equal to 8, current version: {v}"
            logger.error(msg)
            raise Exception(msg)
        fp_mapping = os.path.join(get_project_base_directory(), "conf", "mapping.json")
        if not os.path.exists(fp_mapping):
            raise Exception(f"Mapping file not found at {fp_mapping}")
            msg = f"Elasticsearch mapping file not found at {fp_mapping}"
            logger.error(msg)
            raise Exception(msg)
        self.mapping = json.load(open(fp_mapping, "r"))
        logger.info(f"Elasticsearch {settings.ES['hosts']} is healthy.")
    """
    Database operations
--- a/rag/utils/infinity_conn.py
+++ b/rag/utils/infinity_conn.py
 import os
 import re
 import json
 import time
 from typing import List, Dict
 import infinity
 from infinity.common import ConflictType, InfinityException
 from infinity.index import IndexInfo, IndexType
 from infinity.connection_pool import ConnectionPool
 from rag import settings
 from api.utils.log_utils import logger
 from rag import settings
 from rag.utils import singleton
 import polars as pl
 from polars.series.series import Series
        if ":" in infinity_uri:
            host, port = infinity_uri.split(":")
            infinity_uri = infinity.common.NetworkAddress(host, int(port))
        self.connPool = ConnectionPool(infinity_uri)
        logger.info(f"Connected to infinity {infinity_uri}.")
        self.connPool = None
        logger.info(f"Use Infinity {infinity_uri} as the doc engine.")
        for _ in range(24):
            try:
                connPool = ConnectionPool(infinity_uri)
                inf_conn = connPool.get_conn()
                _ = inf_conn.show_current_node()
                connPool.release_conn(inf_conn)
                self.connPool = connPool
                break
            except Exception as e:
                logger.warn(f"{str(e)}. Waiting Infinity {infinity_uri} to be healthy.")
                time.sleep(5)
        if self.connPool is None:
            msg = f"Infinity {infinity_uri} didn't become healthy in 120s."
            logger.error(msg)
            raise Exception(msg)
        logger.info(f"Infinity {infinity_uri} is healthy.")
    """
    Database operations
            _ = db_instance.get_table(table_name)
            self.connPool.release_conn(inf_conn)
            return True
        except Exception:
            logger.exception("INFINITY indexExist")
        except Exception as e:
            logger.warn(f"INFINITY indexExist {str(e)}")
        return False
    """
                )
                if len(filter_cond) != 0:
                    filter_fulltext = f"({filter_cond}) AND {filter_fulltext}"
                # doc_store_logger.info(f"filter_fulltext: {filter_fulltext}")
                # logger.info(f"filter_fulltext: {filter_fulltext}")
                minimum_should_match = "0%"
                if "minimum_should_match" in matchExpr.extra_options:
                    minimum_should_match = (
            for k, v in d.items():
                if k.endswith("_kwd") and isinstance(v, list):
                    d[k] = " ".join(v)
        ids = [f"{d['id']}" for d in documents]
        ids = ["'{}'".format(d["id"]) for d in documents]
        str_ids = ", ".join(ids)
        str_filter = f"id IN ({str_ids})"
        table_instance.delete(str_filter)
        # logger.info(f"InfinityConnection.insert {json.dumps(documents)}")
        table_instance.insert(documents)
        self.connPool.release_conn(inf_conn)
        doc_store_logger.info(f"inserted into {table_name} {str_ids}.")
        logger.info(f"inserted into {table_name} {str_ids}.")
        return []
    def update(
--- a/sdk/python/test/t_chunk.py
+++ b/sdk/python/test/t_chunk.py
    docs = ds.upload_documents(documents)
    doc = docs[0]
    chunk = doc.add_chunk(content="This is a chunk addition test")
    # For ElasticSearch, the chunk is not searchable in shot time (~2s).
    # For Elasticsearch, the chunk is not searchable in shot time (~2s).
    sleep(3)
    chunk.update({"content":"This is a updated content"})
    docs = ds.upload_documents(documents)
    doc = docs[0]
    chunk = doc.add_chunk(content="This is a chunk addition test")
    # For ElasticSearch, the chunk is not searchable in shot time (~2s).
    # For Elasticsearch, the chunk is not searchable in shot time (~2s).
    sleep(3)
    chunk.update({"available":0})