### What problem does this PR solve?
Support extracting questions and answers from PDF files
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Add umap-learn, fasttext and volcengine in requirements_arm.txt (#945)
### What problem does this PR solve?
Complete the requirements for ARM
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
The fasttext library is missing, and it is used in the operators.py file. (#925)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Optimize task broker and executor for reduce memory usage and deployment
complexity.
### Type of change
- [x] Performance Improvement
- [x] Refactoring
### Change Log
- Enhance redis utils for message queue(use stream)
- Modify task broker logic via message queue (1.get parse event from
message queue 2.use ThreadPoolExecutor async executor )
- Modify the table column name of document and task (process_duation ->
process_duration maybe just a spelling mistake)
- Reformat some code style(just what i see)
- Add requirement_dev.txt for developer
- Add redis container on docker compose
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Fix missing 'ollama' package in requirements.txt (#621)
### What problem does this PR solve?
This commit resolves an issue where the 'ollama' package was
inadvertently omitted from the requirements.txt file. The package has
now been added to ensure all dependencies are correctly installed for
the project.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fix: missing 'redis' package in requirements.txt (#622)
### What problem does this PR solve?
This commit resolves an issue where the 'redis' package was
inadvertently omitted from the requirements.txt file. The package has
now been added to ensure all dependencies are correctly installed for
the project.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO
def extract_text_from_doc_bytes(doc_bytes):
file_like_object = BytesIO(doc_bytes)
parsed = parser.from_buffer(file_like_object)
return parsed["content"]
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: chrysanthemum-boy <fannc@qq.com>
### Description
Following up on https://github.com/infiniflow/ragflow/pull/275, this PR
adds support for FastEmbed model configurations.
The options are not exhaustive. You can find the full list
[here](https://qdrant.github.io/fastembed/examples/Supported_Models/).
P.S. I ran into OOM issues when building the image.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>