Преглед изворни кода

Fix: Solve the OOM issue when passing large PDF files while using QA chunking method. (#8464)

### What problem does this PR solve?

Using the QA chunking method with a large PDF (e.g., 300+ pages) may
lead to OOM in the ragflow-worker module.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
tags/v0.20.0
liuzhenghua пре 4 месеци
родитељ
комит
5256980ffb
No account linked to committer's email address
1 измењених фајлова са 3 додато и 3 уклоњено
  1. 3
    3
      rag/app/qa.py

+ 3
- 3
rag/app/qa.py Прегледај датотеку

return (len(match.group(0)), s.lstrip('#').lstrip()) if match else (0, s) return (len(match.group(0)), s.lstrip('#').lstrip()) if match else (0, s)




def chunk(filename, binary=None, lang="Chinese", callback=None, **kwargs):
def chunk(filename, binary=None, from_page=0, to_page=100000, lang="Chinese", callback=None, **kwargs):
""" """
Excel and csv(txt) format files are supported. Excel and csv(txt) format files are supported.
If the file is in excel format, there should be 2 column question and answer without header. If the file is in excel format, there should be 2 column question and answer without header.
callback(0.1, "Start to parse.") callback(0.1, "Start to parse.")
pdf_parser = Pdf() pdf_parser = Pdf()
qai_list, tbls = pdf_parser(filename if not binary else binary, qai_list, tbls = pdf_parser(filename if not binary else binary,
from_page=0, to_page=10000, callback=callback)
from_page=from_page, to_page=to_page, callback=callback)
for q, a, image, poss in qai_list: for q, a, image, poss in qai_list:
res.append(beAdocPdf(deepcopy(doc), q, a, eng, image, poss)) res.append(beAdocPdf(deepcopy(doc), q, a, eng, image, poss))
return res return res


def dummy(prog=None, msg=""): def dummy(prog=None, msg=""):
pass pass
chunk(sys.argv[1], from_page=0, to_page=10, callback=dummy)
chunk(sys.argv[1], from_page=0, to_page=10, callback=dummy)

Loading…
Откажи
Сачувај