소스 검색

Docs: Added auto-keyword auto-question guide (#8113)

### What problem does this PR solve?

### Type of change


- [x] Documentation Update
tags/v0.19.1
writinwaters 4 달 전
부모
커밋
157cd8b1b0
No account linked to committer's email address
2개의 변경된 파일68개의 추가작업 그리고 1개의 파일을 삭제
  1. 1
    1
      docs/faq.mdx
  2. 67
    0
      docs/guides/dataset/autokeyword_autoquestion.mdx

+ 1
- 1
docs/faq.mdx 파일 보기

@@ -19,7 +19,7 @@ import TOCInline from '@theme/TOCInline';

### What sets RAGFlow apart from other RAG products?

The "garbage in garbage out" status quo remains unchanged despite the fact that LLMs have advanced Natural Language Processing (NLP) significantly. In response, RAGFlow introduces two unique features compared to other Retrieval-Augmented Generation (RAG) products.
The "garbage in garbage out" status quo remains unchanged despite the fact that LLMs have advanced Natural Language Processing (NLP) significantly. In its response, RAGFlow introduces two unique features compared to other Retrieval-Augmented Generation (RAG) products.

- Fine-grained document parsing: Document parsing involves images and tables, with the flexibility for you to intervene as needed.
- Traceable answers with reduced hallucinations: You can trust RAGFlow's responses as you can view the citations and references supporting them.

+ 67
- 0
docs/guides/dataset/autokeyword_autoquestion.mdx 파일 보기

@@ -0,0 +1,67 @@
---
sidebar_position: 3
slug: /autokeyword_autoquestion
---

# Auto-keyword Auto-question
import APITable from '@site/src/components/APITable';

Use a chat model to generate keywords and questions from the original chunks.

---

When selecting a chunking method, you can also enable auto-keyword or auto-question generation to increase retrieval rates. This feature uses a chat model to produce a specified number of keywords and questions from each created chunk, creating a layer of higher-level information from the original content.

:::tip NOTE
Enabling this feature increases document indexing time, as all created chunks will be sent to the chat model for keyword or question generation.
:::

- **Auto-keyword**
- **Definition:** The number of additional keywords the LLM generates for each chunk. By supplying synonyms for text that is unfriendly to tokenization or multilingual content, this improves recall for full-text or hybrid retrieval. It can also be used to correct bad cases. Disabling this can significantly accelerate parsing.
- **Common Values:**
- `0`: Disabled;
- `3`-`5` = Recommended (if a chunk has over a thousand characters, more keywords may be needed);
- Maximum `30`. Note that, as the number increases, the marginal benefit decreases.

- **Auto-question**
- **Definition:** Generates potential FAQ-style questions for each chunk, making retrieval matches more aligned with real user queries (Who/What/Why).
- **Common Values:**
- `0` = disabled;
- `1–2` = commonly used (if a chunk has thousands of characters, more may be needed);
- Upper limit `30` (to avoid generating too many at once). Can also be used to correct bad cases.
- **Typical Use Cases:** Scenarios requiring FAQ retrieval, such as product manuals, policy documents, etc.


## Configuration

On the **Configuration** page of your knowledge base, you will find the Auto-keyword and Auto-question sliders under **Page rank**.

:::tip NOTE
The Auto-keyword or Auto-question value must be an integer. If you set their value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.
:::


## Best practices

If you are uncertain how to set auto-keyword or auto-question values, here are some best practices gathered from our community:

```mdx-code-block
<APITable>
```

| Use cases or typical scenarios | Document volume/length | Auto_keyword (0–30) | Auto_question (0–30) |
|---------------------------------------------------------------------|---------------------------------|----------------------------|----------------------------|
| 1. Internal Process Guidance for Employee Handbook | Small, under 10 pages | 0 | 0 |
| 2. Customer Service FAQ Hot Questions | Medium, 10–100 pages | 3–7 | 1–3 |
| 3. Technical Whitepapers: Development Standards, Protocol Explanations | Large, over 100 pages | 2–4 | 1–2 |
| 4. Contracts / Regulations / Legal Clause Retrieval | Large, over 50 pages | 2–5 | 0–1 |
| 5. Multi-repository Layered New Documents + Old Archive | Many | Adjust as appropriate |Adjust as appropriate |
| 6. Social Media Comment Pool: Multilingual & Mixed Spelling | Very large volume of short text | 8–12 | 0 |
| 7. Operational Logs for DevOps Troubleshooting | Very large volume of short text | 3–6 | 0 |
| 8. Marketing Asset Library: Multilingual Product Descriptions | Medium | 6–10 | 1–2 |
| 9. Training Courseware / eBooks | Large | 2–5 | 1–2 |
| 10. Maintenance Manual: Equipment Diagrams + Steps | Medium | 3–7 | 1–2 |

```mdx-code-block
</APITable>
```

Loading…
취소
저장