### What problem does this PR solve? #6721 ### Type of change - [x] Documentation Updatetags/v0.18.0
| - **Model**: The chat model to use. | - **Model**: The chat model to use. | ||||
| - Ensure you set the chat model correctly on the **Model providers** page. | - Ensure you set the chat model correctly on the **Model providers** page. | ||||
| - You can use different models for different components to increase flexibility or improve overall performance. | - You can use different models for different components to increase flexibility or improve overall performance. | ||||
| - **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| This parameter has three options: | This parameter has three options: | ||||
| - **Improvise**: Produces more creative responses. | - **Improvise**: Produces more creative responses. | ||||
| - **Precise**: (Default) Produces more conservative responses. | - **Precise**: (Default) Produces more conservative responses. | ||||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | ||||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | ||||
| - Defaults to 0.7. | - Defaults to 0.7. | ||||
| - **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. | |||||
| - Defaults to 512. | |||||
| - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. | |||||
| :::tip NOTE | :::tip NOTE | ||||
| - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. | - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. |
| - **Model**: The chat model to use. | - **Model**: The chat model to use. | ||||
| - Ensure you set the chat model correctly on the **Model providers** page. | - Ensure you set the chat model correctly on the **Model providers** page. | ||||
| - You can use different models for different components to increase flexibility or improve overall performance. | - You can use different models for different components to increase flexibility or improve overall performance. | ||||
| - **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| This parameter has three options: | This parameter has three options: | ||||
| - **Improvise**: Produces more creative responses. | - **Improvise**: Produces more creative responses. | ||||
| - **Precise**: (Default) Produces more conservative responses. | - **Precise**: (Default) Produces more conservative responses. | ||||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | ||||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | ||||
| - Defaults to 0.7. | - Defaults to 0.7. | ||||
| - **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. | |||||
| - Defaults to 512. | |||||
| - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. | |||||
| :::tip NOTE | :::tip NOTE | ||||
| - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. | - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. |
| - **Model**: The chat model to use. | - **Model**: The chat model to use. | ||||
| - Ensure you set the chat model correctly on the **Model providers** page. | - Ensure you set the chat model correctly on the **Model providers** page. | ||||
| - You can use different models for different components to increase flexibility or improve overall performance. | - You can use different models for different components to increase flexibility or improve overall performance. | ||||
| - **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| This parameter has three options: | This parameter has three options: | ||||
| - **Improvise**: Produces more creative responses. | - **Improvise**: Produces more creative responses. | ||||
| - **Precise**: (Default) Produces more conservative responses. | - **Precise**: (Default) Produces more conservative responses. | ||||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | ||||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | ||||
| - Defaults to 0.7. | - Defaults to 0.7. | ||||
| - **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. | |||||
| - Defaults to 512. | |||||
| - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. | |||||
| :::tip NOTE | :::tip NOTE | ||||
| - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. | - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. |
| 4. Update **Model Setting**: | 4. Update **Model Setting**: | ||||
| - In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue. | - In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue. | ||||
| - **Preset configurations** refers to the level that the LLM improvises. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| - **Temperature**: Level of the prediction randomness of the LLM. The higher the value, the more creative the LLM is. | |||||
| - **Top P** is also known as "nucleus sampling". See [here](https://en.wikipedia.org/wiki/Top-p_sampling) for more information. | |||||
| - **Max Tokens**: The maximum length of the LLM's responses. Note that the responses may be curtailed if this value is set too low. | |||||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||||
| This parameter has three options: | |||||
| - **Improvise**: Produces more creative responses. | |||||
| - **Precise**: (Default) Produces more conservative responses. | |||||
| - **Balance**: A middle ground between **Improvise** and **Precise**. | |||||
| - **Temperature**: The randomness level of the model's output. | |||||
| Defaults to 0.1. | |||||
| - Lower values lead to more deterministic and predictable outputs. | |||||
| - Higher values lead to more creative and varied outputs. | |||||
| - A temperature of zero results in the same output for the same prompt. | |||||
| - **Top P**: Nucleus sampling. | |||||
| - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. | |||||
| - Defaults to 0.3. | |||||
| - **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. | |||||
| - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. | |||||
| - Defaults to 0.4. | |||||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | |||||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | |||||
| - Defaults to 0.7. | |||||
| 5. Now, let's start the show: | 5. Now, let's start the show: | ||||
| ## Leave a joined team | ## Leave a joined team | ||||
|  | |||||
|  |
| Released on March 13, 2025. | Released on March 13, 2025. | ||||
| ### Compatibility changes | |||||
| - Removes the **Max_tokens** setting from **Chat configuration**. | |||||
| - Removes the **Max_tokens** setting from **Generate**, **Rewrite**, **Categorize**, **Keyword** agent components. | |||||
| From this release onwards, if you still see RAGFlow's responses being cut short or truncated, check the **Max_tokens** setting of your model provider. | |||||
| ### Improvements | ### Improvements | ||||
| - Adds OpenAI-compatible APIs. | - Adds OpenAI-compatible APIs. |