Вы не можете выбрать более 25 тем Темы должны начинаться с буквы или цифры, могут содержать дефисы(-) и должны содержать не более 35 символов.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
  1. import os
  2. import re
  3. import tiktoken
  4. def singleton(cls, *args, **kw):
  5. instances = {}
  6. def _singleton():
  7. key = str(cls) + str(os.getpid())
  8. if key not in instances:
  9. instances[key] = cls(*args, **kw)
  10. return instances[key]
  11. return _singleton
  12. def rmSpace(txt):
  13. txt = re.sub(r"([^a-z0-9.,]) +([^ ])", r"\1\2", txt, flags=re.IGNORECASE)
  14. return re.sub(r"([^ ]) +([^a-z0-9.,])", r"\1\2", txt, flags=re.IGNORECASE)
  15. def findMaxDt(fnm):
  16. m = "1970-01-01 00:00:00"
  17. try:
  18. with open(fnm, "r") as f:
  19. while True:
  20. l = f.readline()
  21. if not l:
  22. break
  23. l = l.strip("\n")
  24. if l == 'nan':
  25. continue
  26. if l > m:
  27. m = l
  28. except Exception as e:
  29. pass
  30. return m
  31. def findMaxTm(fnm):
  32. m = 0
  33. try:
  34. with open(fnm, "r") as f:
  35. while True:
  36. l = f.readline()
  37. if not l:
  38. break
  39. l = l.strip("\n")
  40. if l == 'nan':
  41. continue
  42. if int(l) > m:
  43. m = int(l)
  44. except Exception as e:
  45. pass
  46. return m
  47. encoder = tiktoken.encoding_for_model("gpt-3.5-turbo")
  48. def num_tokens_from_string(string: str) -> int:
  49. """Returns the number of tokens in a text string."""
  50. num_tokens = len(encoder.encode(string))
  51. return num_tokens
  52. def truncate(string: str, max_len: int) -> int:
  53. """Returns truncated text if the length of text exceed max_len."""
  54. return encoder.decode(encoder.encode(string)[:max_len])