浏览代码

fix chunk method "Table" losing content when the Excel file has multi… (#4123)

…ple sheets

### What problem does this PR solve?
discussed in https://github.com/infiniflow/ragflow/pull/4102
- In excel_parser.py, `total` means the total number of rows in Excel,
but it return in the first iterate, that lead to the wrong `to_page`
- In table.py, it when Excel file has multiple sheets, it will be
divided into multiple parts, every part size is 3000, `data` may be
empty, because it has recorded in the last iterate.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
tags/v0.15.1
ly0303521 10 个月前
父节点
当前提交
101b8ff813
没有帐户链接到提交者的电子邮件
共有 2 个文件被更改,包括 3 次插入1 次删除
  1. 1
    1
      deepdoc/parser/excel_parser.py
  2. 2
    0
      rag/app/table.py

+ 1
- 1
deepdoc/parser/excel_parser.py 查看文件

@@ -90,7 +90,7 @@ class RAGFlowExcelParser:
for sheetname in wb.sheetnames:
ws = wb[sheetname]
total += len(list(ws.rows))
return total
return total

if fnm.split(".")[-1].lower() in ["csv", "txt"]:
encoding = find_codec(binary)

+ 2
- 0
rag/app/table.py 查看文件

@@ -66,6 +66,8 @@ class Excel(ExcelParser):
continue
data.append(row)
done += 1
if np.array(data).size == 0:
continue
res.append(pd.DataFrame(np.array(data), columns=headers))

callback(0.3, ("Extract records: {}~{}".format(from_page + 1, min(to_page, from_page + rn)) + (

正在加载...
取消
保存