You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

retrieval.mdx 3.2KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
  1. ---
  2. sidebar_position: 3
  3. slug: /retrieval_component
  4. ---
  5. # Retrieval component
  6. A component that retrieves information from specified datasets.
  7. ## Scenarios
  8. A **Retrieval** component is essential in most RAG scenarios, where information is extracted from designated knowledge bases before being sent to the LLM for content generation. As of v0.20.1, a **Retrieval** component can operate either as a workflow component or as a tool of an **Agent**, enabling the Agent to control its invocation and search queries.
  9. ## Configurations
  10. Click on a **Retrieval** component to open its configuration window.
  11. ### Query variables
  12. *Mandatory*
  13. Select the query source for retrieval.
  14. The **Retrieval** component relies on query variables to specify its data inputs (queries). All global variables defined before the **Retrieval** component are available in the dropdown list.
  15. ### Knowledge bases
  16. Select the knowledge base(s) to retrieve data from.
  17. - If no knowledge base is selected, meaning conversations with the agent will not be based on any knowledge base, ensure that the **Empty response** field is left blank to avoid an error.
  18. - If you select multiple knowledge bases, you must ensure that the knowledge bases (datasets) you select use the same embedding model; otherwise, an error message would occur.
  19. ### Similarity threshold
  20. RAGFlow employs a combination of weighted keyword similarity and weighted vector cosine similarity during retrieval. This parameter sets the threshold for similarities between the user query and chunks stored in the datasets. Any chunk with a similarity score below this threshold will be excluded from the results.
  21. Defaults to 0.2.
  22. ### Keyword similarity weight
  23. This parameter sets the weight of keyword similarity in the combined similarity score. The total of the two weights must equal 1.0. Its default value is 0.7, which means the weight of vector similarity in the combined search is 1 - 0.7 = 0.3.
  24. ### Top N
  25. This parameter selects the "Top N" chunks from retrieved ones and feed them to the LLM.
  26. Defaults to 8.
  27. ### Rerank model
  28. *Optional*
  29. If a rerank model is selected, a combination of weighted keyword similarity and weighted reranking score will be used for retrieval.
  30. :::caution WARNING
  31. Using a rerank model will *significantly* increase the system's response time.
  32. :::
  33. ### Empty response
  34. - Set this as a response if no results are retrieved from the knowledge base(s) for your query, or
  35. - Leave this field blank to allow the chat model to improvise when nothing is found.
  36. :::caution WARNING
  37. If you do not specify a knowledge base, you must leave this field blank; otherwise, an error would occur.
  38. :::
  39. ### Cross-language search
  40. Select one or more languages for cross‑language search. If no language is selected, the system searches with the original query.
  41. ### Use knowledge graph
  42. Whether to use knowledge graph(s) in the specified knowledge base(s) during retrieval for multi-hop question answering. When enabled, this would involve iterative searches across entity, relationship, and community report chunks, greatly increasing retrieval time.
  43. ### Output
  44. The global variable name for the output of the **Retrieval** component, which can be referenced by other components in the workflow.