|
|
|
@@ -0,0 +1,67 @@ |
|
|
|
--- |
|
|
|
sidebar_position: 3 |
|
|
|
slug: /autokeyword_autoquestion |
|
|
|
--- |
|
|
|
|
|
|
|
# Auto-keyword Auto-question |
|
|
|
import APITable from '@site/src/components/APITable'; |
|
|
|
|
|
|
|
Use a chat model to generate keywords and questions from the original chunks. |
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
When selecting a chunking method, you can also enable auto-keyword or auto-question generation to increase retrieval rates. This feature uses a chat model to produce a specified number of keywords and questions from each created chunk, creating a layer of higher-level information from the original content. |
|
|
|
|
|
|
|
:::tip NOTE |
|
|
|
Enabling this feature increases document indexing time, as all created chunks will be sent to the chat model for keyword or question generation. |
|
|
|
::: |
|
|
|
|
|
|
|
- **Auto-keyword** |
|
|
|
- **Definition:** The number of additional keywords the LLM generates for each chunk. By supplying synonyms for text that is unfriendly to tokenization or multilingual content, this improves recall for full-text or hybrid retrieval. It can also be used to correct bad cases. Disabling this can significantly accelerate parsing. |
|
|
|
- **Common Values:** |
|
|
|
- `0`: Disabled; |
|
|
|
- `3`-`5` = Recommended (if a chunk has over a thousand characters, more keywords may be needed); |
|
|
|
- Maximum `30`. Note that, as the number increases, the marginal benefit decreases. |
|
|
|
|
|
|
|
- **Auto-question** |
|
|
|
- **Definition:** Generates potential FAQ-style questions for each chunk, making retrieval matches more aligned with real user queries (Who/What/Why). |
|
|
|
- **Common Values:** |
|
|
|
- `0` = disabled; |
|
|
|
- `1–2` = commonly used (if a chunk has thousands of characters, more may be needed); |
|
|
|
- Upper limit `30` (to avoid generating too many at once). Can also be used to correct bad cases. |
|
|
|
- **Typical Use Cases:** Scenarios requiring FAQ retrieval, such as product manuals, policy documents, etc. |
|
|
|
|
|
|
|
|
|
|
|
## Configuration |
|
|
|
|
|
|
|
On the **Configuration** page of your knowledge base, you will find the Auto-keyword and Auto-question sliders under **Page rank**. |
|
|
|
|
|
|
|
:::tip NOTE |
|
|
|
The Auto-keyword or Auto-question value must be an integer. If you set their value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1. |
|
|
|
::: |
|
|
|
|
|
|
|
|
|
|
|
## Best practices |
|
|
|
|
|
|
|
If you are uncertain how to set auto-keyword or auto-question values, here are some best practices gathered from our community: |
|
|
|
|
|
|
|
```mdx-code-block |
|
|
|
<APITable> |
|
|
|
``` |
|
|
|
|
|
|
|
| Use cases or typical scenarios | Document volume/length | Auto_keyword (0–30) | Auto_question (0–30) | |
|
|
|
|---------------------------------------------------------------------|---------------------------------|----------------------------|----------------------------| |
|
|
|
| 1. Internal Process Guidance for Employee Handbook | Small, under 10 pages | 0 | 0 | |
|
|
|
| 2. Customer Service FAQ Hot Questions | Medium, 10–100 pages | 3–7 | 1–3 | |
|
|
|
| 3. Technical Whitepapers: Development Standards, Protocol Explanations | Large, over 100 pages | 2–4 | 1–2 | |
|
|
|
| 4. Contracts / Regulations / Legal Clause Retrieval | Large, over 50 pages | 2–5 | 0–1 | |
|
|
|
| 5. Multi-repository Layered New Documents + Old Archive | Many | Adjust as appropriate |Adjust as appropriate | |
|
|
|
| 6. Social Media Comment Pool: Multilingual & Mixed Spelling | Very large volume of short text | 8–12 | 0 | |
|
|
|
| 7. Operational Logs for DevOps Troubleshooting | Very large volume of short text | 3–6 | 0 | |
|
|
|
| 8. Marketing Asset Library: Multilingual Product Descriptions | Medium | 6–10 | 1–2 | |
|
|
|
| 9. Training Courseware / eBooks | Large | 2–5 | 1–2 | |
|
|
|
| 10. Maintenance Manual: Equipment Diagrams + Steps | Medium | 3–7 | 1–2 | |
|
|
|
|
|
|
|
```mdx-code-block |
|
|
|
</APITable> |
|
|
|
``` |