瀏覽代碼

Fix: Improve First Chunk Size (#7806)

### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7790

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
tags/v0.19.x
Stephen Hu 5 月之前
父節點
當前提交
db4371c745
沒有連結到貢獻者的電子郵件帳戶。
共有 1 個檔案被更改,包括 3 行新增3 行删除
  1. 3
    3
      rag/nlp/__init__.py

+ 3
- 3
rag/nlp/__init__.py 查看文件

@@ -524,7 +524,7 @@ def naive_merge(sections, chunk_token_num=128, delimiter="\n。;!?"):
if tnum < 8:
pos = ""
# Ensure that the length of the merged chunk does not exceed chunk_token_num
if tk_nums[-1] > chunk_token_num:
if cks[-1] == "" or tk_nums[-1] > chunk_token_num:

if t.find(pos) < 0:
t += pos
@@ -560,7 +560,7 @@ def naive_merge_with_images(texts, images, chunk_token_num=128, delimiter="\n。
if tnum < 8:
pos = ""
# Ensure that the length of the merged chunk does not exceed chunk_token_num
if tk_nums[-1] > chunk_token_num:
if cks[-1] == "" or tk_nums[-1] > chunk_token_num:
if t.find(pos) < 0:
t += pos
cks.append(t)
@@ -627,7 +627,7 @@ def naive_merge_docx(sections, chunk_token_num=128, delimiter="\n。;!?"):
tnum = num_tokens_from_string(t)
if tnum < 8:
pos = ""
if tk_nums[-1] > chunk_token_num:
if cks[-1] == "" or tk_nums[-1] > chunk_token_num:
if t.find(pos) < 0:
t += pos
cks.append(t)

Loading…
取消
儲存