### What problem does this PR solve? update docs for release 0.8.0 ### Type of change - [x] Documentation Update --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>tags/v0.8.0
| <a href="https://demo.ragflow.io" target="_blank"> | <a href="https://demo.ragflow.io" target="_blank"> | ||||
| <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a> | <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a> | ||||
| <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> | <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> | ||||
| <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.7.0"></a> | |||||
| <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.8.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.8.0"></a> | |||||
| <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE"> | <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE"> | ||||
| <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license"> | <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license"> | ||||
| </a> | </a> | ||||
| ## 📌 Latest Updates | ## 📌 Latest Updates | ||||
| - 2024-07-08 Supports [Graph](./graph/README.md). | |||||
| - 2024-06-27 Supports Markdown and Docx in the Q&A parsing method. Supports extracting images from Docx files. Supports extracting tables from Markdown files. | - 2024-06-27 Supports Markdown and Docx in the Q&A parsing method. Supports extracting images from Docx files. Supports extracting tables from Markdown files. | ||||
| - 2024-06-14 Supports PDF in the Q&A parsing method. | - 2024-06-14 Supports PDF in the Q&A parsing method. | ||||
| - 2024-06-06 Supports [Self-RAG](https://huggingface.co/papers/2310.11511), which is enabled by default in dialog settings. | - 2024-06-06 Supports [Self-RAG](https://huggingface.co/papers/2310.11511), which is enabled by default in dialog settings. | ||||
| - 2024-05-30 Integrates [BCE](https://github.com/netease-youdao/BCEmbedding) and [BGE](https://github.com/FlagOpen/FlagEmbedding) reranker models. | - 2024-05-30 Integrates [BCE](https://github.com/netease-youdao/BCEmbedding) and [BGE](https://github.com/FlagOpen/FlagEmbedding) reranker models. | ||||
| - 2024-05-28 Supports LLM Baichuan and VolcanoArk. | - 2024-05-28 Supports LLM Baichuan and VolcanoArk. | ||||
| - 2024-05-23 Supports [RAPTOR](https://arxiv.org/html/2401.18059v1) for better text retrieval. | - 2024-05-23 Supports [RAPTOR](https://arxiv.org/html/2401.18059v1) for better text retrieval. | ||||
| - 2024-05-21 Supports streaming output and text chunk retrieval API. | - 2024-05-21 Supports streaming output and text chunk retrieval API. | ||||
| - 2024-05-15 Integrates OpenAI GPT-4o. | - 2024-05-15 Integrates OpenAI GPT-4o. | ||||
| - 2024-05-08 Integrates LLM DeepSeek-V2. | |||||
| ## 🌟 Key Features | ## 🌟 Key Features | ||||
| 3. Build the pre-built Docker images and start up the server: | 3. Build the pre-built Docker images and start up the server: | ||||
| > Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.7.0`, before running the following commands. | |||||
| > Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.8.0`, before running the following commands. | |||||
| ```bash | ```bash | ||||
| $ cd ragflow/docker | $ cd ragflow/docker |
| <a href="https://demo.ragflow.io" target="_blank"> | <a href="https://demo.ragflow.io" target="_blank"> | ||||
| <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a> | <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a> | ||||
| <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> | <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> | ||||
| <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" | |||||
| alt="docker pull infiniflow/ragflow:v0.7.0"></a> | |||||
| <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.8.0-brightgreen" | |||||
| alt="docker pull infiniflow/ragflow:v0.8.0"></a> | |||||
| <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE"> | <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE"> | ||||
| <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license"> | <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license"> | ||||
| </a> | </a> | ||||
| ## 📌 最新情報 | ## 📌 最新情報 | ||||
| - 2024-07-08 [Graph](./graph/README.md) に対応しました。. | |||||
| - 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。Docxファイルからの画像の抽出をサポートします。Markdownファイルからテーブルを抽出することをサポートします。 | - 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。Docxファイルからの画像の抽出をサポートします。Markdownファイルからテーブルを抽出することをサポートします。 | ||||
| - 2024-06-14 Q&A 解析メソッドは PDF ファイルをサポートしています。 | - 2024-06-14 Q&A 解析メソッドは PDF ファイルをサポートしています。 | ||||
| - 2024-06-06 会話設定でデフォルトでチェックされている [Self-RAG](https://huggingface.co/papers/2310.11511) をサポートします。 | - 2024-06-06 会話設定でデフォルトでチェックされている [Self-RAG](https://huggingface.co/papers/2310.11511) をサポートします。 | ||||
| - 2024-05-30 [BCE](https://github.com/netease-youdao/BCEmbedding)、[BGE](https://github.com/FlagOpen/FlagEmbedding) reranker を統合。 | |||||
| - 2024-05-30 [BCE](https://github.com/netease-youdao/BCEmbedding) 、[BGE](https://github.com/FlagOpen/FlagEmbedding) reranker を統合。 | |||||
| - 2024-05-28 LLM BaichuanとVolcanoArkを統合しました。 | - 2024-05-28 LLM BaichuanとVolcanoArkを統合しました。 | ||||
| - 2024-05-23 より良いテキスト検索のために[RAPTOR](https://arxiv.org/html/2401.18059v1)をサポート。 | |||||
| - 2024-05-23 より良いテキスト検索のために [RAPTOR](https://arxiv.org/html/2401.18059v1) をサポート。 | |||||
| - 2024-05-21 ストリーミング出力とテキストチャンク取得APIをサポート。 | - 2024-05-21 ストリーミング出力とテキストチャンク取得APIをサポート。 | ||||
| - 2024-05-15 OpenAI GPT-4oを統合しました。 | - 2024-05-15 OpenAI GPT-4oを統合しました。 | ||||
| - 2024-05-08 LLM DeepSeek-V2を統合しました。 | |||||
| ## 🌟 主な特徴 | ## 🌟 主な特徴 | ||||
| $ docker compose up -d | $ docker compose up -d | ||||
| ``` | ``` | ||||
| > 上記のコマンドを実行すると、RAGFlowの開発版dockerイメージが自動的にダウンロードされます。 特定のバージョンのDockerイメージをダウンロードして実行したい場合は、docker/.envファイルのRAGFLOW_VERSION変数を見つけて、対応するバージョンに変更してください。 例えば、RAGFLOW_VERSION=v0.7.0として、上記のコマンドを実行してください。 | |||||
| > 上記のコマンドを実行すると、RAGFlowの開発版dockerイメージが自動的にダウンロードされます。 特定のバージョンのDockerイメージをダウンロードして実行したい場合は、docker/.envファイルのRAGFLOW_VERSION変数を見つけて、対応するバージョンに変更してください。 例えば、RAGFLOW_VERSION=v0.8.0として、上記のコマンドを実行してください。 | |||||
| > コアイメージのサイズは約 9 GB で、ロードに時間がかかる場合があります。 | > コアイメージのサイズは約 9 GB で、ロードに時間がかかる場合があります。 | ||||
| ```bash | ```bash | ||||
| $ git clone https://github.com/infiniflow/ragflow.git | $ git clone https://github.com/infiniflow/ragflow.git | ||||
| $ cd ragflow/ | $ cd ragflow/ | ||||
| $ docker build -t infiniflow/ragflow:v0.7.0 . | |||||
| $ docker build -t infiniflow/ragflow:v0.8.0 . | |||||
| $ cd ragflow/docker | $ cd ragflow/docker | ||||
| $ chmod +x ./entrypoint.sh | $ chmod +x ./entrypoint.sh | ||||
| $ docker compose up -d | $ docker compose up -d |
| <a href="https://demo.ragflow.io" target="_blank"> | <a href="https://demo.ragflow.io" target="_blank"> | ||||
| <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a> | <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a> | ||||
| <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> | <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> | ||||
| <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.7.0"></a> | |||||
| <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.8.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.8.0"></a> | |||||
| <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE"> | <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE"> | ||||
| <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license"> | <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license"> | ||||
| </a> | </a> | ||||
| ## 📌 近期更新 | ## 📌 近期更新 | ||||
| - 2024-07-08 支持 [Graph](./graph/README.md)。 | |||||
| - 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。支持提取出 Docx 文件中的图片。支持提取出 Markdown 文件中的表格。 | - 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。支持提取出 Docx 文件中的图片。支持提取出 Markdown 文件中的表格。 | ||||
| - 2024-06-14 Q&A 解析方式支持 PDF 文件。 | - 2024-06-14 Q&A 解析方式支持 PDF 文件。 | ||||
| - 2024-06-06 支持 [Self-RAG](https://huggingface.co/papers/2310.11511) ,在对话设置里面默认勾选。 | - 2024-06-06 支持 [Self-RAG](https://huggingface.co/papers/2310.11511) ,在对话设置里面默认勾选。 | ||||
| - 2024-05-23 实现 [RAPTOR](https://arxiv.org/html/2401.18059v1) 提供更好的文本检索。 | - 2024-05-23 实现 [RAPTOR](https://arxiv.org/html/2401.18059v1) 提供更好的文本检索。 | ||||
| - 2024-05-21 支持流式结果输出和文本块获取API。 | - 2024-05-21 支持流式结果输出和文本块获取API。 | ||||
| - 2024-05-15 集成大模型 OpenAI GPT-4o。 | - 2024-05-15 集成大模型 OpenAI GPT-4o。 | ||||
| - 2024-05-08 集成大模型 DeepSeek。 | |||||
| ## 🌟 主要功能 | ## 🌟 主要功能 | ||||
| $ docker compose -f docker-compose-CN.yml up -d | $ docker compose -f docker-compose-CN.yml up -d | ||||
| ``` | ``` | ||||
| > 请注意,运行上述命令会自动下载 RAGFlow 的开发版本 docker 镜像。如果你想下载并运行特定版本的 docker 镜像,请在 docker/.env 文件中找到 RAGFLOW_VERSION 变量,将其改为对应版本。例如 RAGFLOW_VERSION=v0.7.0,然后运行上述命令。 | |||||
| > 请注意,运行上述命令会自动下载 RAGFlow 的开发版本 docker 镜像。如果你想下载并运行特定版本的 docker 镜像,请在 docker/.env 文件中找到 RAGFLOW_VERSION 变量,将其改为对应版本。例如 RAGFLOW_VERSION=v0.8.0,然后运行上述命令。 | |||||
| > 核心镜像文件大约 9 GB,可能需要一定时间拉取。请耐心等待。 | > 核心镜像文件大约 9 GB,可能需要一定时间拉取。请耐心等待。 | ||||
| ```bash | ```bash | ||||
| $ git clone https://github.com/infiniflow/ragflow.git | $ git clone https://github.com/infiniflow/ragflow.git | ||||
| $ cd ragflow/ | $ cd ragflow/ | ||||
| $ docker build -t infiniflow/ragflow:v0.7.0 . | |||||
| $ docker build -t infiniflow/ragflow:v0.8.0 . | |||||
| $ cd ragflow/docker | $ cd ragflow/docker | ||||
| $ chmod +x ./entrypoint.sh | $ chmod +x ./entrypoint.sh | ||||
| $ docker compose up -d | $ docker compose up -d |
| ## Search for knowledge base | ## Search for knowledge base | ||||
| As of RAGFlow v0.7.0, the search feature is still in a rudimentary form, supporting only knowledge base search by name. | |||||
| As of RAGFlow v0.8.0, the search feature is still in a rudimentary form, supporting only knowledge base search by name. | |||||
|  |  | ||||
| ## Move file to specified folder | ## Move file to specified folder | ||||
| As of RAGFlow v0.7.0, this feature is *not* available. | |||||
| As of RAGFlow v0.8.0, this feature is *not* available. | |||||
| ## Search files or folders | ## Search files or folders | ||||
| As of RAGFlow v0.7.0, the search feature is still in a rudimentary form, supporting only file and folder search in the current directory by name (files or folders in the child directory will not be retrieved). | |||||
| As of RAGFlow v0.8.0, the search feature is still in a rudimentary form, supporting only file and folder search in the current directory by name (files or folders in the child directory will not be retrieved). | |||||
|  |  | ||||
|  |  | ||||
| > As of RAGFlow v0.7.0, bulk download is not supported, nor can you download an entire folder. | |||||
| > As of RAGFlow v0.8.0, bulk download is not supported, nor can you download an entire folder. |
| `vm.max_map_count`. This value sets the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation. | `vm.max_map_count`. This value sets the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation. | ||||
| RAGFlow v0.7.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning of the Elasticsearch component. | |||||
| RAGFlow v0.8.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning of the Elasticsearch component. | |||||
| <Tabs | <Tabs | ||||
| defaultValue="linux" | defaultValue="linux" | ||||
| 3. Build the pre-built Docker images and start up the server: | 3. Build the pre-built Docker images and start up the server: | ||||
| > Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.7.0`, before running the following commands. | |||||
| > Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.8.0`, before running the following commands. | |||||
| ```bash | ```bash | ||||
| $ cd ragflow/docker | $ cd ragflow/docker |
| cpn = self.get_component(cpn_id) | cpn = self.get_component(cpn_id) | ||||
| if not cpn["downstream"]: break | if not cpn["downstream"]: break | ||||
| if self._find_loop(): raise OverflowError("Too much loops!") | |||||
| loop = self._find_loop() | |||||
| if loop: raise OverflowError(f"Too much loops: {loop}") | |||||
| if cpn["obj"].component_name.lower() in ["switch", "categorize", "relevant"]: | if cpn["obj"].component_name.lower() in ["switch", "categorize", "relevant"]: | ||||
| switch_out = cpn["obj"].output()[1].iloc[0, 0] | switch_out = cpn["obj"].output()[1].iloc[0, 0] | ||||
| if len(path) < 2: return False | if len(path) < 2: return False | ||||
| for l in range(1, len(path) // 2): | |||||
| for l in range(2, len(path) // 2): | |||||
| pat = ",".join(path[0:l]) | pat = ",".join(path[0:l]) | ||||
| path_str = ",".join(path) | path_str = ",".join(path) | ||||
| if len(pat) >= len(path_str): return False | if len(pat) >= len(path_str): return False | ||||
| path_str = path_str[len(pat):] | |||||
| loop = max_loops | loop = max_loops | ||||
| while path_str.find(pat) >= 0 and loop >= 0: | |||||
| while path_str.find(pat) == 0 and loop >= 0: | |||||
| loop -= 1 | loop -= 1 | ||||
| path_str = path_str[len(pat):] | |||||
| if loop < 0: return True | |||||
| if len(pat)+1 >= len(path_str): | |||||
| return False | |||||
| path_str = path_str[len(pat)+1:] | |||||
| if loop < 0: | |||||
| pat = " => ".join([p.split(":")[0] for p in path[0:l]]) | |||||
| return pat + " => " + pat | |||||
| return False | return False |
| self.check_empty(self.category_description, "[Categorize] Category examples") | self.check_empty(self.category_description, "[Categorize] Category examples") | ||||
| for k, v in self.category_description.items(): | for k, v in self.category_description.items(): | ||||
| if not k: raise ValueError(f"[Categorize] Category name can not be empty!") | if not k: raise ValueError(f"[Categorize] Category name can not be empty!") | ||||
| if not v["to"]: raise ValueError(f"[Categorize] 'To' of category {k} can not be empty!") | |||||
| if not v.get("to"): raise ValueError(f"[Categorize] 'To' of category {k} can not be empty!") | |||||
| def get_prompt(self): | def get_prompt(self): | ||||
| cate_lines = [] | cate_lines = [] |
| for para in self._param.parameters: | for para in self._param.parameters: | ||||
| cpn = self._canvas.get_component(para["component_id"])["obj"] | cpn = self._canvas.get_component(para["component_id"])["obj"] | ||||
| _, out = cpn.output(allow_partial=False) | _, out = cpn.output(allow_partial=False) | ||||
| kwargs[para["key"]] = "\n - ".join(out["content"]) | |||||
| if "content" not in out.columns: | |||||
| kwargs[para["key"]] = "Nothing" | |||||
| else: | |||||
| kwargs[para["key"]] = "\n - ".join(out["content"]) | |||||
| kwargs["input"] = input | kwargs["input"] = input | ||||
| for n, v in kwargs.items(): | for n, v in kwargs.items(): |