瀏覽代碼

Fix: order chunks from docx by positions. (#7979)

### What problem does this PR solve?

#7934

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
tags/v0.19.1
Kevin Hu 5 月之前
父節點
當前提交
93f5df716f
No account linked to committer's email address
共有 1 個文件被更改,包括 2 次插入1 次删除
  1. 2
    1
      rag/nlp/__init__.py

+ 2
- 1
rag/nlp/__init__.py 查看文件

@@ -279,12 +279,13 @@ def tokenize_chunks(chunks, doc, eng, pdf_parser=None):
def tokenize_chunks_with_images(chunks, doc, eng, images):
res = []
# wrap up as es documents
for ck, image in zip(chunks, images):
for ii, (ck, image) in enumerate(zip(chunks, images)):
if len(ck.strip()) == 0:
continue
logging.debug("-- {}".format(ck))
d = copy.deepcopy(doc)
d["image"] = image
add_positions(d, [[ii]*5])
tokenize(d, ck, eng)
res.append(d)
return res

Loading…
取消
儲存