Parcourir la source

FIX: If chunk["content_with_weight"] contains one or more unpaired surrogate characters (such as incomplete emoji or other special characters), then calling .encode("utf-8") directly will raise a UnicodeEncodeError. (#9246)

FIX: If chunk["content_with_weight"] contains one or more unpaired
surrogate characters (such as incomplete emoji or other special
characters), then calling .encode("utf-8") directly will raise a
UnicodeEncodeError.

### What problem does this PR solve?
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
tags/v0.20.1
gooodboyAo il y a 2 mois
Parent
révision
a7eba61067
Aucun compte lié à l'adresse e-mail de l'auteur
1 fichiers modifiés avec 1 ajouts et 1 suppressions
  1. 1
    1
      rag/svr/task_executor.py

+ 1
- 1
rag/svr/task_executor.py Voir le fichier

@@ -284,7 +284,7 @@ async def build_chunks(task, progress_callback):
try:
d = copy.deepcopy(document)
d.update(chunk)
d["id"] = xxhash.xxh64((chunk["content_with_weight"] + str(d["doc_id"])).encode("utf-8")).hexdigest()
d["id"] = xxhash.xxh64((chunk["content_with_weight"] + str(d["doc_id"])).encode("utf-8", "surrogatepass")).hexdigest()
d["create_time"] = str(datetime.now()).replace("T", " ")[:19]
d["create_timestamp_flt"] = datetime.now().timestamp()
if not d.get("image"):

Chargement…
Annuler
Enregistrer