fix: retry embedding with Qwen family models when limits temporarily reached. APIs of Qwen family models are limited by calling rates. When reached, the "output" attribute of the "resp" will be None, and in turn cause TypeError when trying to retrieve "embeddings". Since these limits are almost temporary, I have added a simple retry mechanism to avoid it. Besides, if retry_max reached, the error can be early raised, instead of hidden behind "TypeError". ### What problem does this PR solve? Sometimes Qwen blocks calling due to rate limits, but it will cause the whole parsing procedure stops when creating knowledge base. In this situation, resp["output"] will be None, and resp["output"]["embeddings"] will cause TypeError. Since the limits are temporary, I apply a simple retry mechanism to solve it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>tags/v0.20.0
| @@ -200,13 +200,22 @@ class QWenEmbed(Base): | |||
| def encode(self, texts: list): | |||
| import dashscope | |||
| import time | |||
| batch_size = 4 | |||
| res = [] | |||
| token_count = 0 | |||
| texts = [truncate(t, 2048) for t in texts] | |||
| for i in range(0, len(texts), batch_size): | |||
| retry_max = 5 | |||
| resp = dashscope.TextEmbedding.call(model=self.model_name, input=texts[i : i + batch_size], api_key=self.key, text_type="document") | |||
| while resp["output"] is None and retry_max > 0: | |||
| time.sleep(10) | |||
| resp = dashscope.TextEmbedding.call(model=self.model_name, input=texts[i : i + batch_size], api_key=self.key, text_type="document") | |||
| retry_max -= 1 | |||
| if retry_max == 0 and resp["output"] is None: | |||
| log_exception(ValueError("Retry_max reached, calling embedding model failed")) | |||
| raise | |||
| try: | |||
| embds = [[] for _ in range(len(resp["output"]["embeddings"]))] | |||
| for e in resp["output"]["embeddings"]: | |||