### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Updatetags/v0.19.x
| `{knowledge}` is the system's reserved variable, representing the chunks retrieved from the knowledge base(s) specified by **Knowledge bases** under the **Assistant settings** tab. If your chat assistant is associated with certain knowledge bases, you can keep it as is. | `{knowledge}` is the system's reserved variable, representing the chunks retrieved from the knowledge base(s) specified by **Knowledge bases** under the **Assistant settings** tab. If your chat assistant is associated with certain knowledge bases, you can keep it as is. | ||||
| :::info NOTE | :::info NOTE | ||||
| It does not currently make a difference whether you set `{knowledge}` to optional or mandatory, but note that this design will be updated at a later point. | |||||
| It currently makes no difference whether `{knowledge}` is set as optional or mandatory, but please note this design will be updated in due course. | |||||
| ::: | ::: | ||||
| From v0.17.0 onward, you can start an AI chat without specifying knowledge bases. In this case, we recommend removing the `{knowledge}` variable to prevent unnecessary reference and keeping the **Empty response** field empty to avoid errors. | From v0.17.0 onward, you can start an AI chat without specifying knowledge bases. In this case, we recommend removing the `{knowledge}` variable to prevent unnecessary reference and keeping the **Empty response** field empty to avoid errors. |
| - On the configuration page of your knowledge base, switch off **Use RAPTOR to enhance retrieval**. | - On the configuration page of your knowledge base, switch off **Use RAPTOR to enhance retrieval**. | ||||
| - Extracting knowledge graph (GraphRAG) is time-consuming. | - Extracting knowledge graph (GraphRAG) is time-consuming. | ||||
| - Disable **Auto-keyword** and **Auto-question** on the configuration page of your knowledge base, as both depend on the LLM. | - Disable **Auto-keyword** and **Auto-question** on the configuration page of your knowledge base, as both depend on the LLM. | ||||
| - **v0.17.0+:** If your document is plain text PDF and does not require GPU-intensive processes like OCR (Optical Character Recognition), TSR (Table Structure Recognition), or DLA (Document Layout Analysis), you can choose **Naive** over **DeepDoc** or other time-consuming large model options in the **Document parser** dropdown. This will substantially reduce document parsing time. | |||||
| - **v0.17.0+:** If all PDFs in your knowledge base are plain text and do not require GPU-intensive processes like OCR (Optical Character Recognition), TSR (Table Structure Recognition), or DLA (Document Layout Analysis), you can choose **Naive** over **DeepDoc** or other time-consuming large model options in the **Document parser** dropdown. This will substantially reduce document parsing time. |
| ### Prompt | ### Prompt | ||||
| The following prompt will be applied recursively for cluster summarization, with `{cluster_content}` serving as an internal parameter. We recommend that you keep it as-is for now. The design will be updated at a later point. | |||||
| The following prompt will be applied recursively for cluster summarization, with `{cluster_content}` serving as an internal parameter. We recommend that you keep it as-is for now. The design will be updated in due course. | |||||
| ``` | ``` | ||||
| Please summarize the following paragraphs... Paragraphs as following: | Please summarize the following paragraphs... Paragraphs as following: |
| --- | --- | ||||
| sidebar_position: 0 | |||||
| sidebar_position: 2 | |||||
| slug: /select_pdf_parser | slug: /select_pdf_parser | ||||
| --- | --- | ||||
| - **Laws** | - **Laws** | ||||
| - **Presentation** | - **Presentation** | ||||
| - **One** | - **One** | ||||
| - To use a third-party visual model for parsing PDFs, ensure you have set a default image2txt model under **Set default models** on the **Model providers** page. | |||||
| - To use a third-party visual model for parsing PDFs, ensure you have set a default img2txt model under **Set default models** on the **Model providers** page. | |||||
| ## Procedure | ## Procedure | ||||
| 2. Select the option that works best with your scenario: | 2. Select the option that works best with your scenario: | ||||
| - DeepDoc: (Default) The default visual model for OCR, TSR, and DLR tasks. | |||||
| - Naive: Skip OCR, TSR, and DLR tasks if *all* your PDFs are plain text. | |||||
| - A third-party visual model provided by a specific model provider. | |||||
| - DeepDoc: (Default) The default visual model for OCR, TSR, and DLR tasks, which is time-consuming. | |||||
| - Naive: Skip OCR, TSR, and DLR tasks if *all* your PDFs are plain text. | |||||
| - A third-party visual model provided by a specific model provider. | |||||
| :::caution WARNING | :::caution WARNING | ||||
| Third-party visual models are marked **Experimental**, because we have not fully tested these models for the aforementioned data extraction tasks. | Third-party visual models are marked **Experimental**, because we have not fully tested these models for the aforementioned data extraction tasks. |
| --- | --- | ||||
| sidebar_position: 2 | |||||
| sidebar_position: 0 | |||||
| slug: /set_metada | slug: /set_metada | ||||
| --- | --- | ||||
| Ensure that your metadata is in JSON format; otherwise, your updates will not be applied. | Ensure that your metadata is in JSON format; otherwise, your updates will not be applied. | ||||
| ::: | ::: | ||||
|  | |||||
|  | |||||
| ## Frequently asked questions | |||||
| ### Can I set metadata for multiple documents at once? | |||||
| No, RAGFlow does not support batch metadata setting. If you still consider this feature essential, please [raise an issue](https://github.com/infiniflow/ragflow/issues) explaining your use case and its importance. |
| 5. Click **OK** to confirm your changes. | 5. Click **OK** to confirm your changes. | ||||
| :::note | :::note | ||||
| To update an existing model API key at a later point: | |||||
| To update an existing model API key: | |||||
|  |  | ||||
| ::: | ::: |
|  |  | ||||
| > Each RAGFlow account is able to use **text-embedding-v2** for free, an embedding model of Tongyi-Qianwen. This is why you can see Tongyi-Qianwen in the **Added models** list. And you may need to update your Tongyi-Qianwen API key at a later point. | |||||
| 2. Click on the desired LLM and update the API key accordingly (DeepSeek-V2 in this case): | 2. Click on the desired LLM and update the API key accordingly (DeepSeek-V2 in this case): | ||||
|  |  |
| - AI chat: Leverages Tavily-based web search to enhance contexts in agentic reasoning. To activate this, enter the correct Tavily API key under the **Assistant settings** tab of your chat assistant dialogue. | - AI chat: Leverages Tavily-based web search to enhance contexts in agentic reasoning. To activate this, enter the correct Tavily API key under the **Assistant settings** tab of your chat assistant dialogue. | ||||
| - AI chat: Supports starting a chat without specifying knowledge bases. | - AI chat: Supports starting a chat without specifying knowledge bases. | ||||
| - AI chat: HTML files can also be previewed and referenced, in addition to PDF files. | - AI chat: HTML files can also be previewed and referenced, in addition to PDF files. | ||||
| - Dataset: Adds a **PDF parser**, aka **Document parser**, dropdown menu to dataset configurations. This includes a DeepDoc model option, which is time-consuming, a much faster **naive** option (plain text), which skips DLA (Document Layout Analysis), OCR (Optical Character Recognition), and TSR (Table Structure Recognition) tasks, and several currently *experimental* large model options. | |||||
| - Dataset: Adds a **PDF parser**, aka **Document parser**, dropdown menu to dataset configurations. This includes a DeepDoc model option, which is time-consuming, a much faster **naive** option (plain text), which skips DLA (Document Layout Analysis), OCR (Optical Character Recognition), and TSR (Table Structure Recognition) tasks, and several currently *experimental* large model options. See [here](./guides/dataset/select_pdf_parser.md). | |||||
| - Agent component: **(x)** or a forward slash `/` can be used to insert available keys (variables) in the system prompt field of the **Generate** or **Template** component. | - Agent component: **(x)** or a forward slash `/` can be used to insert available keys (variables) in the system prompt field of the **Generate** or **Template** component. | ||||
| - Object storage: Supports using Aliyun OSS (Object Storage Service) as a file storage option. | - Object storage: Supports using Aliyun OSS (Object Storage Service) as a file storage option. | ||||
| - Models: Updates the supported model list for Tongyi-Qianwen (Qwen), adding DeepSeek-specific models; adds ModelScope as a model provider. | - Models: Updates the supported model list for Tongyi-Qianwen (Qwen), adding DeepSeek-specific models; adds ModelScope as a model provider. |