Fix issue with `keep_alive=-1` for ollama chat model by allowing a user to set an additional configuration option (#9017)
### What problem does this PR solve?
fix issue with `keep_alive=-1` for ollama chat model by allowing a user
to set an additional configuration option. It is no-breaking change
because it still uses a previous default value such as: `keep_alive=-1`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [X] Performance Improvement
- [X] Other (please describe):
- Additional configuration option has been added to control behavior of
RAGFlow while working with ollama LLM
### What problem does this PR solve?
Add model provider DeepInfra. This model list comes from our community.
NOTE: most endpoints haven't been tested, but they should work as OpenAI
does.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Fix: fix typo in OpenAI error logging message (#8865)
### What problem does this PR solve?
Correct the logging message from "OpenAI cat_with_tools" to "OpenAI
chat_with_tools" in the `_exceptions` method of the `Base` class to
accurately reflect the method name and improve error traceability.
### Type of change
- [x] Typo
### What problem does this PR solve?
Add xAI provider (experimental feature, requires user feedback).
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Based on https://github.com/infiniflow/ragflow/issues/8740
1. A better handle for 'NoneType' object is not subscriptable
2. Add some logs to get the internal message
### Type of change
- [x] Refactoring
fix: retry embedding with Qwen family models when limits temporarily reached. (#8690)
fix: retry embedding with Qwen family models when limits temporarily
reached.
APIs of Qwen family models are limited by calling rates. When reached,
the "output" attribute of the "resp" will be None, and in turn cause
TypeError when trying to retrieve "embeddings". Since these limits are
almost temporary, I have added a simple retry mechanism to avoid it.
Besides, if retry_max reached, the error can be early raised, instead of
hidden behind "TypeError".
### What problem does this PR solve?
Sometimes Qwen blocks calling due to rate limits, but it will cause the
whole parsing procedure stops when creating knowledge base. In this
situation, resp["output"] will be None, and resp["output"]["embeddings"]
will cause TypeError. Since the limits are temporary, I apply a simple
retry mechanism to solve it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
The following error occurred during local testing, which should be fixed
by configuring 'exist_ok=True'.
```log
set_progress(7461edc253), progress: -1, progress_msg: 21:41:41 Page(1~100000001): [ERROR][Errno 17] File exists: '/ragflow/tmp'
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Add Google Cloud Vision API Integration (Image2Text) (#8608)
### What problem does this PR solve?
This PR introduces Google Cloud Vision API integration to enhance image
understanding capabilities in the application. It addresses the need for
advanced image description and chat functionalities by implementing a
new `GoogleCV` class to handle API interactions and updating relevant
configurations. This enables users to leverage Google Cloud Vision for
image-to-text tasks, improving the application's ability to process and
interpret visual data.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
fix: Correctly format message parts in GoogleChat (#8596)
### What problem does this PR solve?
This PR addresses an incompatibility issue with the Google Chat API by
correcting the message content format in the `GoogleChat` class.
Previously, the content was directly assigned to the "parts" field,
which did not align with the API's expected format. This change ensures
that messages are properly formatted with a "text" key within a
dictionary, as required by the API.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
fix the error 'Unknown field for GenerationConfig: max_tokens' when u… (#8473)
### What problem does this PR solve?
[https://github.com/infiniflow/ragflow/issues/8324](url)
docker image version: v0.19.1
The `_clean_conf` function was not implemented in the `_chat` and
`chat_streamly` methods of the `GeminiChat` class, causing the error
"Unknown field for GenerationConfig: max_tokens" when the default LLM
config includes the "max_tokens" parameter.
**Buggy Code(ragflow/rag/llm/chat_model.py)**
```python
class GeminiChat(Base):
def __init__(self, key, model_name, base_url=None, **kwargs):
super().__init__(key, model_name, base_url=base_url, **kwargs)
from google.generativeai import GenerativeModel, client
client.configure(api_key=key)
_client = client.get_default_generative_client()
self.model_name = "models/" + model_name
self.model = GenerativeModel(model_name=self.model_name)
self.model._client = _client
def _clean_conf(self, gen_conf):
for k in list(gen_conf.keys()):
if k not in ["temperature", "top_p"]:
del gen_conf[k]
return gen_conf
def _chat(self, history, gen_conf):
from google.generativeai.types import content_types
system = history[0]["content"] if history and history[0]["role"] == "system" else ""
hist = []
for item in history:
if item["role"] == "system":
continue
hist.append(deepcopy(item))
item = hist[-1]
if "role" in item and item["role"] == "assistant":
item["role"] = "model"
if "role" in item and item["role"] == "system":
item["role"] = "user"
if "content" in item:
item["parts"] = item.pop("content")
if system:
self.model._system_instruction = content_types.to_content(system)
response = self.model.generate_content(hist, generation_config=gen_conf)
ans = response.text
return ans, response.usage_metadata.total_token_count
def chat_streamly(self, system, history, gen_conf):
from google.generativeai.types import content_types
if system:
self.model._system_instruction = content_types.to_content(system)
#❌_clean_conf was not implemented
for k in list(gen_conf.keys()):
if k not in ["temperature", "top_p", "max_tokens"]:
del gen_conf[k]
for item in history:
if "role" in item and item["role"] == "assistant":
item["role"] = "model"
if "content" in item:
item["parts"] = item.pop("content")
ans = ""
try:
response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
for resp in response:
ans = resp.text
yield ans
yield response._chunks[-1].usage_metadata.total_token_count
except Exception as e:
yield ans + "\n**ERROR**: " + str(e)
yield 0
```
**Implement the _clean_conf function**
```python
class GeminiChat(Base):
def __init__(self, key, model_name, base_url=None, **kwargs):
super().__init__(key, model_name, base_url=base_url, **kwargs)
from google.generativeai import GenerativeModel, client
client.configure(api_key=key)
_client = client.get_default_generative_client()
self.model_name = "models/" + model_name
self.model = GenerativeModel(model_name=self.model_name)
self.model._client = _client
def _clean_conf(self, gen_conf):
for k in list(gen_conf.keys()):
if k not in ["temperature", "top_p"]:
del gen_conf[k]
return gen_conf
def _chat(self, history, gen_conf):
from google.generativeai.types import content_types
#✅ implement _clean_conf to remove the wrong parameters
gen_conf = self._clean_conf(gen_conf)
system = history[0]["content"] if history and history[0]["role"] == "system" else ""
hist = []
for item in history:
if item["role"] == "system":
continue
hist.append(deepcopy(item))
item = hist[-1]
if "role" in item and item["role"] == "assistant":
item["role"] = "model"
if "role" in item and item["role"] == "system":
item["role"] = "user"
if "content" in item:
item["parts"] = item.pop("content")
if system:
self.model._system_instruction = content_types.to_content(system)
response = self.model.generate_content(hist, generation_config=gen_conf)
ans = response.text
return ans, response.usage_metadata.total_token_count
def chat_streamly(self, system, history, gen_conf):
from google.generativeai.types import content_types
#✅ implement _clean_conf to remove the wrong parameters
gen_conf = self._clean_conf(gen_conf)
if system:
self.model._system_instruction = content_types.to_content(system)
#✅Removed duplicate parameter filtering logic "for k in list(gen_conf.keys()):"
for item in history:
if "role" in item and item["role"] == "assistant":
item["role"] = "model"
if "content" in item:
item["parts"] = item.pop("content")
ans = ""
try:
response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
for resp in response:
ans = resp.text
yield ans
yield response._chunks[-1].usage_metadata.total_token_count
except Exception as e:
yield ans + "\n**ERROR**: " + str(e)
yield 0
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
fix a bug when using huggingface embedding api (#8432)
### What problem does this PR solve?
image_version: v0.19.1
This PR fixes a bug in the HuggingFaceEmBedding API method that was
causing AssertionError: assert len(vects) == len(docs) during the
document embedding process.
#### Problem
The HuggingFaceEmbed.encode() method had an early return statement
inside the for loop, causing it to return after processing only the
first text input instead of processing all texts in the input list.
**Error Messenge**
```python
AssertionError: assert len(vects) == len(docs) # input chunks != embedded vectors from embedding api
File "/ragflow/rag/svr/task_executor.py", line 442, in embedding
```
**Buggy code(/ragflow/rag/llm/embedding_model.py)**
```python
class HuggingFaceEmbed(Base):
def __init__(self, key, model_name, base_url=None):
if not model_name:
raise ValueError("Model name cannot be None")
self.key = key
self.model_name = model_name.split("___")[0]
self.base_url = base_url or "http://127.0.0.1:8080"
def encode(self, texts: list):
embeddings = []
for text in texts:
response = requests.post(...)
if response.status_code == 200:
try:
embedding = response.json()
embeddings.append(embedding[0])
# ❌ Early return
return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts])
except Exception as _e:
log_exception(_e, response)
else:
raise Exception(...)
```
**Fixed Code(I just Rollback this function to the v0.19.0 version)**
```python
Class HuggingFaceEmbed(Base):
def __init__(self, key, model_name, base_url=None):
if not model_name:
raise ValueError("Model name cannot be None")
self.key = key
self.model_name = model_name.split("___")[0]
self.base_url = base_url or "http://127.0.0.1:8080"
def encode(self, texts: list):
embeddings = []
for text in texts:
response = requests.post(...)
if response.status_code == 200:
embedding = response.json()
embeddings.append(embedding[0]) # ✅ Only append, no return
else:
raise Exception(...)
return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) # ✅ Return after processing all
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This is a cherry-pick from #7781 as requested.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
- Simplify AzureChat constructor by passing base_url directly
- Clean up spacing and formatting in chat_model.py
- Remove redundant parentheses and improve code consistency
- #8423
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: auto-keyword and auto-question fail with qwq model (#8190)
### What problem does this PR solve?
Fix auto-keyword and auto-question fail with qwq model. #8189
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Refa: Add error handling for JSON decode in embedding models (#8162)
### What problem does this PR solve?
Improve robustness of Jina, Nvidia, and SILICONFLOW embedding models by:
1. Adding try-catch blocks for JSON decode errors
2. Logging error details including response content
3. Raising exceptions with meaningful error messages
### Type of change
- [x] Refactoring
Feat: Support tool calling in Generate component (#7572)
### What problem does this PR solve?
Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.
This PR does two things:
1. A simple plugin mechanism based on `pluginlib`:
This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.
A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.
In the future, it can load plugins from external location with little
code change.
Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.
2. A tool selector in the `Generate` component:
Added a tool selector to select one or more tools for LLM:

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:

### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Fix: local variable referenced before assignment (#6909)
### What problem does this PR solve?
Fix: local variable referenced before assignment. #6803
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)