Przeglądaj źródła

Feat: text file support position retaining. (#6231)

### What problem does this PR solve?

#5832

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
tags/v0.18.0
Kevin Hu 7 miesięcy temu
rodzic
commit
a087d13ccb
No account linked to committer's email address
1 zmienionych plików z 3 dodań i 1 usunięć
  1. 3
    1
      rag/nlp/__init__.py

+ 3
- 1
rag/nlp/__init__.py Wyświetl plik

@@ -258,7 +258,7 @@ def tokenize(d, t, eng):
def tokenize_chunks(chunks, doc, eng, pdf_parser=None):
res = []
# wrap up as es documents
for ck in chunks:
for ii, ck in enumerate(chunks):
if len(ck.strip()) == 0:
continue
logging.debug("-- {}".format(ck))
@@ -270,6 +270,8 @@ def tokenize_chunks(chunks, doc, eng, pdf_parser=None):
ck = pdf_parser.remove_tag(ck)
except NotImplementedError:
pass
else:
add_positions(d, [[ii]*5])
tokenize(d, ck, eng)
res.append(d)
return res

Ładowanie…
Anuluj
Zapisz