### What problem does this PR solve? #6721 ### Type of change - [x] Documentation Updatetags/v0.18.0
| @@ -33,7 +33,7 @@ Click the dropdown menu of **Model** to show the model configuration window. | |||
| - **Model**: The chat model to use. | |||
| - Ensure you set the chat model correctly on the **Model providers** page. | |||
| - You can use different models for different components to increase flexibility or improve overall performance. | |||
| - **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| This parameter has three options: | |||
| - **Improvise**: Produces more creative responses. | |||
| - **Precise**: (Default) Produces more conservative responses. | |||
| @@ -52,9 +52,6 @@ Click the dropdown menu of **Model** to show the model configuration window. | |||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | |||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | |||
| - Defaults to 0.7. | |||
| - **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. | |||
| - Defaults to 512. | |||
| - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. | |||
| :::tip NOTE | |||
| - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. | |||
| @@ -34,7 +34,7 @@ Click the dropdown menu of **Model** to show the model configuration window. | |||
| - **Model**: The chat model to use. | |||
| - Ensure you set the chat model correctly on the **Model providers** page. | |||
| - You can use different models for different components to increase flexibility or improve overall performance. | |||
| - **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| This parameter has three options: | |||
| - **Improvise**: Produces more creative responses. | |||
| - **Precise**: (Default) Produces more conservative responses. | |||
| @@ -53,9 +53,6 @@ Click the dropdown menu of **Model** to show the model configuration window. | |||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | |||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | |||
| - Defaults to 0.7. | |||
| - **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. | |||
| - Defaults to 512. | |||
| - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. | |||
| :::tip NOTE | |||
| - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. | |||
| @@ -32,7 +32,7 @@ Click the dropdown menu of **Model** to show the model configuration window. | |||
| - **Model**: The chat model to use. | |||
| - Ensure you set the chat model correctly on the **Model providers** page. | |||
| - You can use different models for different components to increase flexibility or improve overall performance. | |||
| - **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| This parameter has three options: | |||
| - **Improvise**: Produces more creative responses. | |||
| - **Precise**: (Default) Produces more conservative responses. | |||
| @@ -51,9 +51,6 @@ Click the dropdown menu of **Model** to show the model configuration window. | |||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | |||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | |||
| - Defaults to 0.7. | |||
| - **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. | |||
| - Defaults to 512. | |||
| - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. | |||
| :::tip NOTE | |||
| - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. | |||
| @@ -48,10 +48,25 @@ You start an AI conversation by creating an assistant. | |||
| 4. Update **Model Setting**: | |||
| - In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue. | |||
| - **Preset configurations** refers to the level that the LLM improvises. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| - **Temperature**: Level of the prediction randomness of the LLM. The higher the value, the more creative the LLM is. | |||
| - **Top P** is also known as "nucleus sampling". See [here](https://en.wikipedia.org/wiki/Top-p_sampling) for more information. | |||
| - **Max Tokens**: The maximum length of the LLM's responses. Note that the responses may be curtailed if this value is set too low. | |||
| - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. | |||
| This parameter has three options: | |||
| - **Improvise**: Produces more creative responses. | |||
| - **Precise**: (Default) Produces more conservative responses. | |||
| - **Balance**: A middle ground between **Improvise** and **Precise**. | |||
| - **Temperature**: The randomness level of the model's output. | |||
| Defaults to 0.1. | |||
| - Lower values lead to more deterministic and predictable outputs. | |||
| - Higher values lead to more creative and varied outputs. | |||
| - A temperature of zero results in the same output for the same prompt. | |||
| - **Top P**: Nucleus sampling. | |||
| - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. | |||
| - Defaults to 0.3. | |||
| - **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. | |||
| - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. | |||
| - Defaults to 0.4. | |||
| - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. | |||
| - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. | |||
| - Defaults to 0.7. | |||
| 5. Now, let's start the show: | |||
| @@ -39,4 +39,4 @@ _After accepting the team invite, you should be able to view and update the team | |||
| ## Leave a joined team | |||
|  | |||
|  | |||
| @@ -11,6 +11,13 @@ Key features, improvements and bug fixes in the latest releases. | |||
| Released on March 13, 2025. | |||
| ### Compatibility changes | |||
| - Removes the **Max_tokens** setting from **Chat configuration**. | |||
| - Removes the **Max_tokens** setting from **Generate**, **Rewrite**, **Categorize**, **Keyword** agent components. | |||
| From this release onwards, if you still see RAGFlow's responses being cut short or truncated, check the **Max_tokens** setting of your model provider. | |||
| ### Improvements | |||
| - Adds OpenAI-compatible APIs. | |||