Feat:Optimize the table extraction logic in the Markdown parser: (#5663)
Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with
nested HTML tags. Improve performance by using conditional checks to
reduce unnecessary regular expression matching.
### What problem does this PR solve?
Optimize the table extraction logic in the Markdown parser:
Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with
nested HTML tags.
Improve performance by using conditional checks to reduce unnecessary
regular expression matching.
### Type of change
- [x] Performance Improvement
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
Feat(api): Add dsl parameters to control whether dsl fields are included (#5769)
1. **Issue**: When calling `list_agent_session` via the HTTP API, users
may only need to display conversation messages, and do not want to see
the associated dsl, which can be very large. Therefore, consider adding
a control option to determine whether the DSL should be returned, with
the default being to return it.
2. **Documentation Discrepancy**: In the HTTP API documentation, under
"List agent sessions," the "Response" section states that the "data"
field is a dictionary when "success" is returned. However, the actual
returned data is a list. This discrepancy has been corrected.
Fix: Fixed the issue that files cannot be uploaded on the file management page. #5730 (#5763)
### What problem does this PR solve?
Fix: Fixed the issue that files cannot be uploaded on the file
management page. #5730
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Resolve inconsistency in APIToken dialog_id field definition (#5749)
The `dialog_id` field was inconsistently defined:
- In the `migrate_db()` function, it was set to `null=True`.
- In the model class, it was defined as `null=False`.
This inconsistency caused an issue during the initial deployment where
the database table did not allow `dialog_id` to be null. As a result,
calling `APITokenService.save(**obj)` in `system_app.py` raised the
following error:
```
peewee.IntegrityError: null value in column "dialog_id" violates not-null constraint
```
### What problem does this PR solve?
Error: peewee.IntegrityError: null value in column "dialog_id" violates
not-null constraint
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
close #5730
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Fix: Remove the document language parameter. #5640 (#5728)
### What problem does this PR solve?
Fix: Remove the document language parameter. #5686
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: Remove the max token parameter. #5640 #5646 (#5693)
### What problem does this PR solve?
Fix: Remove the max token parameter. #5640#5646
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Feat: Add rerank option to huggingface's model type drop-down box. #5658 (#5689)
### What problem does this PR solve?
Feat: Add rerank option to huggingface's model type drop-down box. #5658
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Feat: Use react-hook-form to synchronize the data of the categorize form to the agent node. #3221 (#5665)
### What problem does this PR solve?
Feat: Use react-hook-form to synchronize the data of the categorize form
to the agent node. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Feat: The parsing method is paper and needs to display Document parser. #5467 (#5652)
### What problem does this PR solve?
Feat: The parsing method is paper and needs to display Document parser.
#5467
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refactored DocumentService.update_progress
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The `ocr.res` file is already included in the model directory
`rag/res/deepdoc`, but it doesn't seem to be utilized here.
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
close issue #5600
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
use to_df replace to_pl when get infinity Result (#5604)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Performance Improvement
---------
Co-authored-by: wangwei <dwxiayi@163.com>
Fix:Fix the bug of incorrectly gets the APIToken. (#5597)
### What problem does this PR solve?
Fix the issue where, when getting a user's APIToken, if the user is part
of another user's team, it incorrectly gets the Team owner's APIToken
instead.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Feat: Render DynamicCategorize with shadcn-ui. #3221 (#5610)
### What problem does this PR solve?
Feat: Render DynamicCategorize with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Feat: Render MessageForm with shadcn-ui. #3221 (#5596)
### What problem does this PR solve?
Feat: Render MessageForm with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Fix: better start experience PYTHONPATH in shell (#5593)
### What problem does this PR solve?
As title export PYTHONPATH in the shell
### Type of change
- [x] Refactoring
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Introduced jemalloc.
Python uses pymalloc (which is an reimplementation of gblibc malloc) to
manage RES. It has pools for small objects to avoid returning memory to
OS aggressively. My experience is: Replacing pymalloc with
[jemalloc](https://github.com/jemalloc/jemalloc) can reduce RES and
speedup task_executor.py.
### Type of change
- [x] Performance Improvement
Fix: fix may lose part of information of last stream chunck (#5584)
### What problem does this PR solve?
Fix may lose part of information of last stream chunck
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)