ragflow

提交图

作者	SHA1	备注	提交日期
Jay Xu	79e2edc835	Fix "File contains no valid workbook part" (#9360) ### What problem does this PR solve? fix "File contains no valid workbook part" stacktrace: ``` Traceback (most recent call last): File "/ragflow/deepdoc/parser/excel_parser.py", line 54, in _load_excel_to_workbook return RAGFlowExcelParser._dataframe_to_workbook(df) File "/ragflow/deepdoc/parser/excel_parser.py", line 69, in _dataframe_to_workbook ws.cell(row=row_num, column=col_num, value=value) File "/ragflow/.venv/lib/python3.10/site-packages/openpyxl/worksheet/worksheet.py", line 246, in cell cell.value = value File "/ragflow/.venv/lib/python3.10/site-packages/openpyxl/cell/cell.py", line 218, in value self._bind_value(value) File "/ragflow/.venv/lib/python3.10/site-packages/openpyxl/cell/cell.py", line 197, in _bind_value value = self.check_string(value) File "/ragflow/.venv/lib/python3.10/site-packages/openpyxl/cell/cell.py", line 165, in check_string raise IllegalCharacterError(f"{value} cannot be used in worksheets.") ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2 个月前
Jay Xu	569ab011c4	Add fallback to use 'calamine' parse engine in excel_parser.py (#9374) ### What problem does this PR solve? add fallback to `calamine` engine when parse error raised using the default `openpyxl` / `xlrd` engine. e.g. the following error can be fixed: ``` Traceback (most recent call last): File "/ragflow/deepdoc/parser/excel_parser.py", line 53, in _load_excel_to_workbook df = pd.read_excel(file_like_object) File "/ragflow/.venv/lib/python3.10/site-packages/pandas/io/excel/_base.py", line 495, in read_excel io = ExcelFile( File "/ragflow/.venv/lib/python3.10/site-packages/pandas/io/excel/_base.py", line 1567, in __init__ self._reader = self._engines[engine]( File "/ragflow/.venv/lib/python3.10/site-packages/pandas/io/excel/_xlrd.py", line 46, in __init__ super().__init__( File "/ragflow/.venv/lib/python3.10/site-packages/pandas/io/excel/_base.py", line 573, in __init__ self.book = self.load_workbook(self.handles.handle, engine_kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/pandas/io/excel/_xlrd.py", line 63, in load_workbook return open_workbook(file_contents=data, **engine_kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/xlrd/__init__.py", line 172, in open_workbook bk = open_workbook_xls( File "/ragflow/.venv/lib/python3.10/site-packages/xlrd/book.py", line 68, in open_workbook_xls bk.biff2_8_load( File "/ragflow/.venv/lib/python3.10/site-packages/xlrd/book.py", line 641, in biff2_8_load cd.locate_named_stream(UNICODE_LITERAL(qname)) File "/ragflow/.venv/lib/python3.10/site-packages/xlrd/compdoc.py", line 398, in locate_named_stream result = self._locate_stream( File "/ragflow/.venv/lib/python3.10/site-packages/xlrd/compdoc.py", line 429, in _locate_stream raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) xlrd.compdoc.CompDocError: Workbook corruption: seen[2] == 4 ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2 个月前
Jin Hai	03daf4618c	Refactor parser code (#9042) ### What problem does this PR solve? Refactor code ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	3 个月前
donblack01	0b48a2e0d1	Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type (#6613) ### What problem does this PR solve? Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: tangyu <1@1.com>	7 个月前
Yongteng Lei	7cd37c37cd	Feat: add CSV file parsing support (#5989) ### What problem does this PR solve? Add CSV file parsing support #4552, #5849, #5870 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	7 个月前
hy89	b0c21b00d9	Refactor: Optimize error handling and support parsing of XLS(EXCEL97—2003) files. (#5633) Optimize error handling and support parsing of XLS(EXCEL97—2003) files.	8 个月前
SkyfireWXY	8fcca1b958	fix: big xls file error (#4859) ### What problem does this PR solve? if *.xls file is too large, .eg >50M, I get error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	8 个月前
Jin Hai	3894de895b	Update comments (#4569) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	9 个月前
ly0303521	101b8ff813	fix chunk method "Table" losing content when the Excel file has multi… (#4123) …ple sheets ### What problem does this PR solve? discussed in https://github.com/infiniflow/ragflow/pull/4102 - In excel_parser.py, `total` means the total number of rows in Excel, but it return in the first iterate, that lead to the wrong `to_page` - In table.py, it when Excel file has multiple sheets, it will be divided into multiple parts, every part size is 3000, `data` may be empty, because it has recorded in the last iterate. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	10 个月前
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	11 个月前
Jin Hai	cdea1d0a85	Update readme and add license (#1018) ### What problem does this PR solve? - Update readme - Add license ### Type of change - [x] Documentation Update --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	1年前
KevinHuSh	a12fcf9156	fix minio helth bug (#850) ### What problem does this PR solve? #643 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	1年前
GYH	c27c02ea67	Split Excel file into different chunks (#847) ### What problem does this PR solve? Split Excel into different chunk ### Type of change - [x] New Feature (non-breaking change which adds functionality)	1年前
KevinHuSh	7013d7f620	refine text decode (#657) ### What problem does this PR solve? #651 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	1年前
KevinHuSh	9d60a84958	refactor code (#583) ### What problem does this PR solve? ### Type of change - [x] Refactoring	1年前
KevinHuSh	ed6081845a	Fit a lot of encodings for text file. (#458) ### What problem does this PR solve? #384 ### Type of change - [x] Performance Improvement	1年前
KevinHuSh	36f2d7b797	To avoid assertion while no rows in excel (#197) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ Issue link:#[[Link the issue here](https://github.com/infiniflow/ragflow/issues/196)] ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Test cases - [ ] Python SDK impacted, Need to update PyPI - [ ] Other (please describe):	1年前
KevinHuSh	fd7fcb5baf	apply pep8 formalize (#155)	1年前
KevinHuSh	6999598101	refine for English corpus (#135)	1年前
KevinHuSh	675a9f8d9a	add dockerfile for cuda envirement. Refine table search strategy, (#123)	1年前
KevinHuSh	f1f09df901	add local llm implementation (#119)	1年前
KevinHuSh	cacd36c5e1	use onnx models, new deepdoc (#68)	1年前
KevinHuSh	30791976d5	build python version rag-flow (#21) * clean rust version project * clean rust version project * build python version rag-flow	1年前

22 次代码提交 (00919fd59990a20f9fc7625e5f5ac5984e341a6e)