### What problem does this PR solve? ### Type of change - [x] Documentation Update

4 달 전 · 157cd8b1b0
--- a/docs/faq.mdx
+++ b/docs/faq.mdx
@@ -19,7 +19,7 @@ import TOCInline from '@theme/TOCInline';

 ### What sets RAGFlow apart from other RAG products?

 The "garbage in garbage out" status quo remains unchanged despite the fact that LLMs have advanced Natural Language Processing (NLP) significantly. In response, RAGFlow introduces two unique features compared to other Retrieval-Augmented Generation (RAG) products.
 The "garbage in garbage out" status quo remains unchanged despite the fact that LLMs have advanced Natural Language Processing (NLP) significantly. In its response, RAGFlow introduces two unique features compared to other Retrieval-Augmented Generation (RAG) products.

 - Fine-grained document parsing: Document parsing involves images and tables, with the flexibility for you to intervene as needed.
 - Traceable answers with reduced hallucinations: You can trust RAGFlow's responses as you can view the citations and references supporting them.
--- a/docs/guides/dataset/autokeyword_autoquestion.mdx
+++ b/docs/guides/dataset/autokeyword_autoquestion.mdx
@@ -0,0 +1,67 @@
 ---
 sidebar_position: 3
 slug: /autokeyword_autoquestion
 ---

 # Auto-keyword Auto-question
 import APITable from '@site/src/components/APITable';

 Use a chat model to generate keywords and questions from the original chunks.

 ---

 When selecting a chunking method, you can also enable auto-keyword or auto-question generation to increase retrieval rates. This feature uses a chat model to produce a specified number of keywords and questions from each created chunk, creating a layer of higher-level information from the original content.

 :::tip NOTE
 Enabling this feature increases document indexing time, as all created chunks will be sent to the chat model for keyword or question generation.
 :::

 - **Auto-keyword**  
   - **Definition:** The number of additional keywords the LLM generates for each chunk. By supplying synonyms for text that is unfriendly to tokenization or multilingual content, this improves recall for full-text or hybrid retrieval. It can also be used to correct bad cases. Disabling this can significantly accelerate parsing.  
   - **Common Values:**  
     - `0`: Disabled;  
     - `3`-`5` = Recommended (if a chunk has over a thousand characters, more keywords may be needed);  
     - Maximum `30`. Note that, as the number increases, the marginal benefit decreases.  

 - **Auto-question**  
   - **Definition:** Generates potential FAQ-style questions for each chunk, making retrieval matches more aligned with real user queries (Who/What/Why).  
   - **Common Values:**  
     - `0` = disabled;  
     - `1–2` = commonly used (if a chunk has thousands of characters, more may be needed);  
     - Upper limit `30` (to avoid generating too many at once). Can also be used to correct bad cases.  
   - **Typical Use Cases:** Scenarios requiring FAQ retrieval, such as product manuals, policy documents, etc.


 ## Configuration

 On the **Configuration** page of your knowledge base, you will find the Auto-keyword and Auto-question sliders under **Page rank**.

 :::tip NOTE
 The Auto-keyword or Auto-question value must be an integer. If you set their value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.
 :::


 ## Best practices

 If you are uncertain how to set auto-keyword or auto-question values, here are some best practices gathered from our community:

 ```mdx-code-block
 <APITable>
 ```

 | Use cases or typical scenarios                                      | Document volume/length          | Auto_keyword (0–30)        | Auto_question (0–30)       |
 |---------------------------------------------------------------------|---------------------------------|----------------------------|----------------------------|
 | 1. Internal Process Guidance for Employee Handbook                  | Small, under 10 pages           | 0                          | 0                          |
 | 2. Customer Service FAQ Hot Questions                               | Medium, 10–100 pages            | 3–7                        | 1–3                        |
 | 3. Technical Whitepapers: Development Standards, Protocol Explanations | Large, over 100 pages        | 2–4                        | 1–2                        |
 | 4. Contracts / Regulations / Legal Clause Retrieval                 | Large, over 50 pages            | 2–5                        | 0–1                        |
 | 5. Multi-repository Layered New Documents + Old Archive             | Many                            | Adjust as appropriate      |Adjust as appropriate       |
 | 6. Social Media Comment Pool: Multilingual & Mixed Spelling         | Very large volume of short text | 8–12                       | 0                          |
 | 7. Operational Logs for DevOps Troubleshooting                      | Very large volume of short text | 3–6                        | 0                          |
 | 8. Marketing Asset Library: Multilingual Product Descriptions       | Medium                          | 6–10                       | 1–2                        |
 | 9. Training Courseware / eBooks                                     | Large                           | 2–5                        | 1–2                        |
 | 10. Maintenance Manual: Equipment Diagrams + Steps                  | Medium                          | 3–7                        | 1–2                        |

 ```mdx-code-block
 </APITable>
 ```