Zhichang Yu
a2a5631da4
Rework logging (#3358)
Unified all log files into one.
### What problem does this PR solve?
Unified all log files into one.
### Type of change
- [x] Refactoring
11 meses atrás
yqkcn
570ad420a8
remove unused import (#2679)
### What problem does this PR solve?
### Type of change
- [x] Refactoring
1 ano atrás
Kevin Hu
fc867cb959
rename get_txt to get_text (#2649)
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
1 ano atrás
yqkcn
aea553c3a8
Add get_txt function (#2639)
### What problem does this PR solve?
Add get_txt function to reduce duplicate code
### Type of change
- [x] Refactoring
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
1 ano atrás
Jin Hai
6b3a40be5c
Format file format from Windows/dos to Unix (#1949)
### What problem does this PR solve?
Related source file is in Windows/DOS format, they are format to Unix
format.
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
1 ano atrás
KevinHuSh
0171082cc5
fix create dialog bug (#982)
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
1 ano atrás
Zhedong Cen
8dd45459be
Add support for HTML file (#973)
### What problem does this PR solve?
Add support for HTML file
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
1 ano atrás
KevinHuSh
7013d7f620
refine text decode (#657)
### What problem does this PR solve?
#651
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
1 ano atrás
KevinHuSh
8c07992b6c
refine code (#595)
### What problem does this PR solve?
### Type of change
- [x] Refactoring
1 ano atrás
Jin Hai
f1c98aad6b
Update version info (#564)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
1 ano atrás
KevinHuSh
369400c483
fix bug of table in docx (#510)
### What problem does this PR solve?
#509
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
1 ano atrás
chrysanthemum-boy
72384b191d
Add `.doc` file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO
def extract_text_from_doc_bytes(doc_bytes):
file_like_object = BytesIO(doc_bytes)
parsed = parser.from_buffer(file_like_object)
return parsed["content"]
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: chrysanthemum-boy <fannc@qq.com>
1 ano atrás
KevinHuSh
0dfc8ddc0f
enlarge docker memory usage (#501)
### What problem does this PR solve?
### Type of change
- [x] Refactoring
1 ano atrás
KevinHuSh
a38e163035
remove doc from supported processing types (#488)
### What problem does this PR solve?
#474
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
1 ano atrás
KevinHuSh
ed6081845a
Fit a lot of encodings for text file. (#458)
### What problem does this PR solve?
#384
### Type of change
- [x] Performance Improvement
1 ano atrás
KevinHuSh
f6c7204002
refine log format (#312)
### What problem does this PR solve?
Issue link:#264
### Type of change
- [x] Documentation Update
- [x] Refactoring
1 ano atrás
KevinHuSh
fd7fcb5baf
apply pep8 formalize (#155)
1 ano atrás
KevinHuSh
f6aee7f230
add use layout or not option (#145)
* add use layout or not option
* trival
1 ano atrás
KevinHuSh
602038ac49
fix task cancling bug (#98)
1 ano atrás
KevinHuSh
8a57f2afd5
change callback strategy, add timezone to docker (#96)
1 ano atrás
KevinHuSh
7bfaf0df29
fix position extraction bug (#93)
* fix position extraction bug
* remove delimiter for naive parser
1 ano atrás
KevinHuSh
685b4d8a95
fix table desc bugs, add positions to chunks (#91)
1 ano atrás
KevinHuSh
8a726fb04b
solve task execution issues (#90)
1 ano atrás
KevinHuSh
7fd1eca582
init README of deepdoc, add picture processer. (#71)
* init README of deepdoc, add picture processer.
* add resume parsing
1 ano atrás
KevinHuSh
cacd36c5e1
use onnx models, new deepdoc (#68)
1 ano atrás
KevinHuSh
a8294f2168
Refine resume parts and fix bugs in retrival using sql (#66)
1 ano atrás
KevinHuSh
407b2523b6
remove unused codes, seperate layout detection out as a new api. Add new rag methed 'table' (#55)
1 ano atrás
KevinHuSh
51482f3e2a
Some document API refined. (#53)
Add naive chunking method to RAG
1 ano atrás
KevinHuSh
e6acaf6738
Add Q&A and Book, fix task running bugs (#50)
1 ano atrás