瀏覽代碼

FIX: If chunk["content_with_weight"] contains one or more unpaired surrogate characters (such as incomplete emoji or other special characters), then calling .encode("utf-8") directly will raise a UnicodeEncodeError. (#9246)

FIX: If chunk["content_with_weight"] contains one or more unpaired
surrogate characters (such as incomplete emoji or other special
characters), then calling .encode("utf-8") directly will raise a
UnicodeEncodeError.

### What problem does this PR solve?
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
tags/v0.20.1
gooodboyAo 2 月之前
父節點
當前提交
a7eba61067
No account linked to committer's email address
共有 1 個文件被更改,包括 1 次插入1 次删除
  1. 1
    1
      rag/svr/task_executor.py

+ 1
- 1
rag/svr/task_executor.py 查看文件

@@ -284,7 +284,7 @@ async def build_chunks(task, progress_callback):
try:
d = copy.deepcopy(document)
d.update(chunk)
d["id"] = xxhash.xxh64((chunk["content_with_weight"] + str(d["doc_id"])).encode("utf-8")).hexdigest()
d["id"] = xxhash.xxh64((chunk["content_with_weight"] + str(d["doc_id"])).encode("utf-8", "surrogatepass")).hexdigest()
d["create_time"] = str(datetime.now()).replace("T", " ")[:19]
d["create_timestamp_flt"] = datetime.now().timestamp()
if not d.get("image"):

Loading…
取消
儲存