Você não pode selecionar mais de 25 tópicos Os tópicos devem começar com uma letra ou um número, podem incluir traços ('-') e podem ter até 35 caracteres.

python_api_reference.md 30KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175
  1. # DRAFT Python API Reference
  2. **THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
  3. :::tip NOTE
  4. Knowledgebase APIs
  5. :::
  6. ## Create knowledge base
  7. ```python
  8. RAGFlow.create_dataset(
  9. name: str,
  10. avatar: str = "",
  11. description: str = "",
  12. language: str = "English",
  13. permission: str = "me",
  14. document_count: int = 0,
  15. chunk_count: int = 0,
  16. parse_method: str = "naive",
  17. parser_config: DataSet.ParserConfig = None
  18. ) -> DataSet
  19. ```
  20. Creates a knowledge base (dataset).
  21. ### Parameters
  22. #### name: `str`, *Required*
  23. The unique name of the dataset to create. It must adhere to the following requirements:
  24. - Permitted characters include:
  25. - English letters (a-z, A-Z)
  26. - Digits (0-9)
  27. - "_" (underscore)
  28. - Must begin with an English letter or underscore.
  29. - Maximum 65,535 characters.
  30. - Case-insensitive.
  31. #### avatar: `str`
  32. Base64 encoding of the avatar. Defaults to `""`
  33. #### description
  34. #### tenant_id: `str`
  35. The id of the tenant associated with the created dataset is used to identify different users. Defaults to `None`.
  36. - If creating a dataset, tenant_id must not be provided.
  37. - If updating a dataset, tenant_id can't be changed.
  38. #### description: `str`
  39. The description of the created dataset. Defaults to `""`.
  40. #### language: `str`
  41. The language setting of the created dataset. Defaults to `"English"`. ????????????
  42. #### permission
  43. Specify who can operate on the dataset. Defaults to `"me"`.
  44. #### document_count: `int`
  45. The number of documents associated with the dataset. Defaults to `0`.
  46. #### chunk_count: `int`
  47. The number of data chunks generated or processed by the created dataset. Defaults to `0`.
  48. #### parse_method, `str`
  49. The method used by the dataset to parse and process data. Defaults to `"naive"`.
  50. #### parser_config
  51. The parser configuration of the dataset. A `ParserConfig` object contains the following attributes:
  52. - `chunk_token_count`: Defaults to `128`.
  53. - `layout_recognize`: Defaults to `True`.
  54. - `delimiter`: Defaults to `'\n!?。;!?'`.
  55. - `task_page_size`: Defaults to `12`.
  56. ### Returns
  57. - Success: A `dataset` object.
  58. - Failure: `Exception`
  59. ### Examples
  60. ```python
  61. from ragflow import RAGFlow
  62. rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  63. ds = rag_object.create_dataset(name="kb_1")
  64. ```
  65. ---
  66. ## Delete knowledge bases
  67. ```python
  68. RAGFlow.delete_datasets(ids: list[str] = None)
  69. ```
  70. Deletes knowledge bases by name or ID.
  71. ### Parameters
  72. #### ids
  73. The IDs of the knowledge bases to delete.
  74. ### Returns
  75. - Success: No value is returned.
  76. - Failure: `Exception`
  77. ### Examples
  78. ```python
  79. rag.delete_datasets(ids=["id_1","id_2"])
  80. ```
  81. ---
  82. ## List knowledge bases
  83. ```python
  84. RAGFlow.list_datasets(
  85. page: int = 1,
  86. page_size: int = 1024,
  87. orderby: str = "create_time",
  88. desc: bool = True,
  89. id: str = None,
  90. name: str = None
  91. ) -> list[DataSet]
  92. ```
  93. Retrieves a list of knowledge bases.
  94. ### Parameters
  95. #### page: `int`
  96. The current page number to retrieve from the paginated results. Defaults to `1`.
  97. #### page_size: `int`
  98. The number of records on each page. Defaults to `1024`.
  99. #### order_by: `str`
  100. The field by which the records should be sorted. This specifies the attribute or column used to order the results. Defaults to `"create_time"`.
  101. #### desc: `bool`
  102. Whether the sorting should be in descending order. Defaults to `True`.
  103. #### id: `str`
  104. The id of the dataset to be got. Defaults to `None`.
  105. #### name: `str`
  106. The name of the dataset to be got. Defaults to `None`.
  107. ### Returns
  108. - Success: A list of `DataSet` objects representing the retrieved knowledge bases.
  109. - Failure: `Exception`.
  110. ### Examples
  111. #### List all knowledge bases
  112. ```python
  113. for ds in rag_object.list_datasets():
  114. print(ds)
  115. ```
  116. #### Retrieve a knowledge base by ID
  117. ```python
  118. dataset = rag_object.list_datasets(id = "id_1")
  119. print(dataset[0])
  120. ```
  121. ---
  122. ## Update knowledge base
  123. ```python
  124. DataSet.update(update_message: dict)
  125. ```
  126. Updates the current knowledge base.
  127. ### Parameters
  128. #### update_message: `dict[str, str|int]`, *Required*
  129. - `"name"`: `str` The name of the knowledge base to update.
  130. - `"tenant_id"`: `str` The `"tenant_id` you get after calling `create_dataset()`.
  131. - `"embedding_model"`: `str` The embedding model for generating vector embeddings.
  132. - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
  133. - `"parser_method"`: `str`
  134. - `"naive"`: General
  135. - `"manual`: Manual
  136. - `"qa"`: Q&A
  137. - `"table"`: Table
  138. - `"paper"`: Paper
  139. - `"book"`: Book
  140. - `"laws"`: Laws
  141. - `"presentation"`: Presentation
  142. - `"picture"`: Picture
  143. - `"one"`:One
  144. - `"knowledge_graph"`: Knowledge Graph
  145. - `"email"`: Email
  146. ### Returns
  147. - Success: No value is returned.
  148. - Failure: `Exception`
  149. ### Examples
  150. ```python
  151. from ragflow import RAGFlow
  152. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  153. ds = rag.list_datasets(name="kb_1")
  154. ds.update({"embedding_model":"BAAI/bge-zh-v1.5", "parse_method":"manual"})
  155. ```
  156. ---
  157. :::tip API GROUPING
  158. File management inside knowledge base
  159. :::
  160. ## Upload document
  161. ```python
  162. DataSet.upload_documents(document_list: list[dict])
  163. ```
  164. ### Parameters
  165. #### document_list:`list[dict]`
  166. A list composed of dicts containing `name` and `blob`.
  167. ### Returns
  168. no return
  169. ### Examples
  170. ```python
  171. from ragflow import RAGFlow
  172. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  173. ds = rag.create_dataset(name="kb_1")
  174. ds.upload_documents([{name="1.txt", blob="123"}, ...] }
  175. ```
  176. ---
  177. ## Update document
  178. ```python
  179. Document.update(update_message:dict)
  180. ```
  181. ### Parameters
  182. #### update_message:`dict`
  183. only `name`,`parser_config`,`parser_method` can be changed
  184. ### Returns
  185. no return
  186. ### Examples
  187. ```python
  188. from ragflow import RAGFlow
  189. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  190. ds=rag.list_datasets(id='id')
  191. ds=ds[0]
  192. doc = ds.list_documents(id="wdfxb5t547d")
  193. doc = doc[0]
  194. doc.update([{"parser_method": "manual"...}])
  195. ```
  196. ---
  197. ## Download document
  198. ```python
  199. Document.download() -> bytes
  200. ```
  201. ### Returns
  202. bytes of the document.
  203. ### Examples
  204. ```python
  205. from ragflow import RAGFlow
  206. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  207. ds=rag.list_datasets(id="id")
  208. ds=ds[0]
  209. doc = ds.list_documents(id="wdfxb5t547d")
  210. doc = doc[0]
  211. open("~/ragflow.txt", "wb+").write(doc.download())
  212. print(doc)
  213. ```
  214. ---
  215. ## List documents
  216. ```python
  217. Dataset.list_documents(id:str =None, keywords: str=None, offset: int=0, limit:int = 1024,order_by:str = "create_time", desc: bool = True) -> list[Document]
  218. ```
  219. ### Parameters
  220. #### id: `str`
  221. The id of the document to be got
  222. #### keywords: `str`
  223. List documents whose name has the given keywords. Defaults to `None`.
  224. #### offset: `int`
  225. The beginning number of records for paging. Defaults to `0`.
  226. #### limit: `int`
  227. Records number to return, -1 means all of them. Records number to return, -1 means all of them.
  228. #### orderby: `str`
  229. The field by which the records should be sorted. This specifies the attribute or column used to order the results.
  230. #### desc:`bool`
  231. A boolean flag indicating whether the sorting should be in descending order.
  232. ### Returns
  233. list[Document]
  234. A document object containing the following attributes:
  235. #### id: `str`
  236. Id of the retrieved document. Defaults to `""`.
  237. #### thumbnail: `str`
  238. Thumbnail image of the retrieved document. Defaults to `""`.
  239. #### knowledgebase_id: `str`
  240. Knowledge base ID related to the document. Defaults to `""`.
  241. #### parser_method: `str`
  242. Method used to parse the document. Defaults to `""`.
  243. #### parser_config: `ParserConfig`
  244. Configuration object for the parser. Defaults to `None`.
  245. #### source_type: `str`
  246. Source type of the document. Defaults to `""`.
  247. #### type: `str`
  248. Type or category of the document. Defaults to `""`.
  249. #### created_by: `str`
  250. Creator of the document. Defaults to `""`.
  251. #### name: `str`
  252. string
  253. ''
  254. Name or title of the document. Defaults to `""`.
  255. #### size: `int`
  256. Size of the document in bytes or some other unit. Defaults to `0`.
  257. #### token_count: `int`
  258. Number of tokens in the document. Defaults to `""`.
  259. #### chunk_count: `int`
  260. Number of chunks the document is split into. Defaults to `0`.
  261. #### progress: `float`
  262. Current processing progress as a percentage. Defaults to `0.0`.
  263. #### progress_msg: `str`
  264. Message indicating current progress status. Defaults to `""`.
  265. #### process_begin_at: `datetime`
  266. Start time of the document processing. Defaults to `None`.
  267. #### process_duation: `float`
  268. Duration of the processing in seconds or minutes. Defaults to `0.0`.
  269. ### Examples
  270. ```python
  271. from ragflow import RAGFlow
  272. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  273. ds = rag.create_dataset(name="kb_1")
  274. filename1 = "~/ragflow.txt"
  275. blob=open(filename1 , "rb").read()
  276. list_files=[{"name":filename1,"blob":blob}]
  277. ds.upload_documents(list_files)
  278. for d in ds.list_documents(keywords="rag", offset=0, limit=12):
  279. print(d)
  280. ```
  281. ---
  282. ## Delete documents
  283. ```python
  284. DataSet.delete_documents(ids: list[str] = None)
  285. ```
  286. ### Returns
  287. no return
  288. ### Examples
  289. ```python
  290. from ragflow import RAGFlow
  291. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  292. ds = rag.list_datasets(name="kb_1")
  293. ds = ds[0]
  294. ds.delete_documents(ids=["id_1","id_2"])
  295. ```
  296. ---
  297. ## Parse and stop parsing document
  298. ```python
  299. DataSet.async_parse_documents(document_ids:list[str]) -> None
  300. DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
  301. ```
  302. ### Parameters
  303. #### document_ids:`list[str]`
  304. The ids of the documents to be parsed
  305. ????????????????????????????????????????????????????
  306. ### Returns
  307. no return
  308. ????????????????????????????????????????????????????
  309. ### Examples
  310. ```python
  311. #documents parse and cancel
  312. rag = RAGFlow(API_KEY, HOST_ADDRESS)
  313. ds = rag.create_dataset(name="God5")
  314. documents = [
  315. {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
  316. {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
  317. {'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
  318. ]
  319. ds.upload_documents(documents)
  320. documents=ds.list_documents(keywords="test")
  321. ids=[]
  322. for document in documents:
  323. ids.append(document.id)
  324. ds.async_parse_documents(ids)
  325. print("Async bulk parsing initiated")
  326. ds.async_cancel_parse_documents(ids)
  327. print("Async bulk parsing cancelled")
  328. ```
  329. ## List chunks
  330. ```python
  331. Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
  332. ```
  333. ### Parameters
  334. - `keywords`: `str`
  335. List chunks whose name has the given keywords
  336. default: `None`
  337. - `offset`: `int`
  338. The beginning number of records for paging
  339. default: `1`
  340. - `limit`: `int`
  341. Records number to return
  342. default: `30`
  343. - `id`: `str`
  344. The ID of the chunk to be retrieved
  345. default: `None`
  346. ### Returns
  347. list[chunk]
  348. ### Examples
  349. ```python
  350. from ragflow import RAGFlow
  351. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  352. ds = rag.list_datasets("123")
  353. ds = ds[0]
  354. ds.async_parse_documents(["wdfxb5t547d"])
  355. for c in doc.list_chunks(keywords="rag", offset=0, limit=12):
  356. print(c)
  357. ```
  358. ## Add chunk
  359. ```python
  360. Document.add_chunk(content:str) -> Chunk
  361. ```
  362. ### Parameters
  363. #### content: `str`, *Required*
  364. Contains the main text or information of the chunk.
  365. #### important_keywords :`list[str]`
  366. list the key terms or phrases that are significant or central to the chunk's content.
  367. ### Returns
  368. chunk
  369. ### Examples
  370. ```python
  371. from ragflow import RAGFlow
  372. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  373. ds = rag.list_datasets(id="123")
  374. ds = ds[0]
  375. doc = ds.list_documents(id="wdfxb5t547d")
  376. doc = doc[0]
  377. chunk = doc.add_chunk(content="xxxxxxx")
  378. ```
  379. ---
  380. ## Delete chunk
  381. ```python
  382. Document.delete_chunks(chunk_ids: list[str])
  383. ```
  384. ### Parameters
  385. #### chunk_ids:`list[str]`
  386. The list of chunk_id
  387. ### Returns
  388. no return
  389. ### Examples
  390. ```python
  391. from ragflow import RAGFlow
  392. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  393. ds = rag.list_datasets(id="123")
  394. ds = ds[0]
  395. doc = ds.list_documents(id="wdfxb5t547d")
  396. doc = doc[0]
  397. chunk = doc.add_chunk(content="xxxxxxx")
  398. doc.delete_chunks(["id_1","id_2"])
  399. ```
  400. ---
  401. ## Update chunk
  402. ```python
  403. Chunk.update(update_message: dict)
  404. ```
  405. ### Parameters
  406. - `content`: `str`
  407. Contains the main text or information of the chunk
  408. - `important_keywords`: `list[str]`
  409. List the key terms or phrases that are significant or central to the chunk's content
  410. - `available`: `int`
  411. Indicating the availability status, `0` means unavailable and `1` means available
  412. ### Returns
  413. no return
  414. ### Examples
  415. ```python
  416. from ragflow import RAGFlow
  417. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  418. ds = rag.list_datasets(id="123")
  419. ds = ds[0]
  420. doc = ds.list_documents(id="wdfxb5t547d")
  421. doc = doc[0]
  422. chunk = doc.add_chunk(content="xxxxxxx")
  423. chunk.update({"content":"sdfx...})
  424. ```
  425. ---
  426. ## Retrieval
  427. ```python
  428. RAGFlow.retrieve(question:str="", datasets:list[str]=None, document=list[str]=None, offset:int=1, limit:int=30, similarity_threshold:float=0.2, vector_similarity_weight:float=0.3, top_k:int=1024,rerank_id:str=None,keyword:bool=False,higlight:bool=False) -> list[Chunk]
  429. ```
  430. ### Parameters
  431. #### question: `str`, *Required*
  432. The user query or query keywords. Defaults to `""`.
  433. #### datasets: `list[Dataset]`, *Required*
  434. The scope of datasets.
  435. #### document: `list[Document]`
  436. The scope of document. `None` means no limitation. Defaults to `None`.
  437. #### offset: `int`
  438. The beginning point of retrieved records. Defaults to `0`.
  439. #### limit: `int`
  440. The maximum number of records needed to return. Defaults to `6`.
  441. #### Similarity_threshold: `float`
  442. The minimum similarity score. Defaults to `0.2`.
  443. #### similarity_threshold_weight: `float`
  444. The weight of vector cosine similarity, 1 - x is the term similarity weight. Defaults to `0.3`.
  445. #### top_k: `int`
  446. Number of records engaged in vector cosine computaton. Defaults to `1024`.
  447. #### rerank_id:`str`
  448. ID of the rerank model. Defaults to `None`.
  449. #### keyword:`bool`
  450. Indicating whether keyword-based matching is enabled (True) or disabled (False).
  451. #### highlight:`bool`
  452. Specifying whether to enable highlighting of matched terms in the results (True) or not (False).
  453. ### Returns
  454. list[Chunk]
  455. ### Examples
  456. ```python
  457. from ragflow import RAGFlow
  458. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  459. ds = rag.list_datasets(name="ragflow")
  460. ds = ds[0]
  461. name = 'ragflow_test.txt'
  462. path = './test_data/ragflow_test.txt'
  463. rag.create_document(ds, name=name, blob=open(path, "rb").read())
  464. doc = ds.list_documents(name=name)
  465. doc = doc[0]
  466. ds.async_parse_documents([doc.id])
  467. for c in rag.retrieve(question="What's ragflow?",
  468. datasets=[ds.id], documents=[doc.id],
  469. offset=1, limit=30, similarity_threshold=0.2,
  470. vector_similarity_weight=0.3,
  471. top_k=1024
  472. ):
  473. print(c)
  474. ```
  475. ---
  476. :::tip API GROUPING
  477. Chat APIs
  478. :::
  479. ## Create chat assistant
  480. ```python
  481. RAGFlow.create_chat(
  482. name: str = "assistant",
  483. avatar: str = "path",
  484. knowledgebases: list[DataSet] = [],
  485. llm: Chat.LLM = None,
  486. prompt: Chat.Prompt = None
  487. ) -> Chat
  488. ```
  489. Creates a chat assistant.
  490. ### Returns
  491. - Success: A `Chat` object representing the chat assistant.
  492. - Failure: `Exception`
  493. #### name: `str`
  494. The name of the chat assistant. Defaults to `"assistant"`.
  495. #### avatar: `str`
  496. Base64 encoding of the avatar. Defaults to `""`.
  497. #### knowledgebases: `list[str]`
  498. The associated knowledge bases. Defaults to `["kb1"]`.
  499. #### llm: `LLM`
  500. The llm of the created chat. Defaults to `None`. When the value is `None`, a dictionary with the following values will be generated as the default.
  501. - **model_name**, `str`
  502. The chat model name. If it is `None`, the user's default chat model will be returned.
  503. - **temperature**, `float`
  504. Controls the randomness of the model's predictions. A lower temperature increases the model's conficence in its responses; a higher temperature increases creativity and diversity. Defaults to `0.1`.
  505. - **top_p**, `float`
  506. Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3`
  507. - **presence_penalty**, `float`
  508. This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`.
  509. - **frequency penalty**, `float`
  510. Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`.
  511. - **max_token**, `int`
  512. This sets the maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
  513. #### Prompt: `str`
  514. Instructions for the LLM to follow.
  515. - `"similarity_threshold"`: `float` A similarity score to evaluate distance between two lines of text. It's weighted keywords similarity and vector cosine similarity. If the similarity between query and chunk is less than this threshold, the chunk will be filtered out. Defaults to `0.2`.
  516. - `"keywords_similarity_weight"`: `float` It's weighted keywords similarity and vector cosine similarity or rerank score (0~1). Defaults to `0.7`.
  517. - `"top_n"`: `int` Not all the chunks whose similarity score is above the 'similarity threshold' will be feed to LLMs. LLM can only see these 'Top N' chunks. Defaults to `8`.
  518. - `"variables"`: `list[dict[]]` If you use dialog APIs, the variables might help you chat with your clients with different strategies. The variables are used to fill in the 'System' part in prompt in order to give LLM a hint. The 'knowledge' is a very special variable which will be filled-in with the retrieved chunks. All the variables in 'System' should be curly bracketed. Defaults to `[{"key": "knowledge", "optional": True}]`
  519. - `"rerank_model"`: `str` If it is not specified, vector cosine similarity will be used; otherwise, reranking score will be used. Defaults to `""`.
  520. - `"empty_response"`: `str` If nothing is retrieved in the knowledge base for the user's question, this will be used as the response. To allow the LLM to improvise when nothing is retrieved, leave this blank. Defaults to `None`.
  521. - `"opener"`: `str` The opening greeting for the user. Defaults to `"Hi! I am your assistant, can I help you?"`.
  522. - `"show_quote`: `bool` Indicates whether the source of text should be displayed Defaults to `True`.
  523. - `"prompt"`: `str` The prompt content. Defaults to `You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence "The answer you are looking for is not found in the knowledge base!" Answers need to consider chat history.
  524. Here is the knowledge base:
  525. {knowledge}
  526. The above is the knowledge base.`.
  527. ### Examples
  528. ```python
  529. from ragflow import RAGFlow
  530. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  531. knowledge_base = rag.list_datasets(name="kb_1")
  532. assistant = rag.create_chat("Miss R", knowledgebases=knowledge_base)
  533. ```
  534. ---
  535. ## Update chat
  536. ```python
  537. Chat.update(update_message: dict)
  538. ```
  539. Updates the current chat assistant.
  540. ### Parameters
  541. #### update_message: `dict[str, Any]`, *Required*
  542. - `"name"`: `str` The name of the chat assistant to update.
  543. - `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""`
  544. - `"knowledgebases"`: `list[str]` Knowledge bases to update.
  545. - `"llm"`: `dict` The LLM settings:
  546. - `"model_name"`, `str` The chat model name.
  547. - `"temperature"`, `float` Controls the randomness of the model's predictions.
  548. - `"top_p"`, `float` Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from.
  549. - `"presence_penalty"`, `float` This discourages the model from repeating the same information by penalizing words that have appeared in the conversation.
  550. - `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words.
  551. - `"max_token"`, `int` This sets the maximum length of the model’s output, measured in the number of tokens (words or pieces of words).
  552. - `"prompt"` : Instructions for the LLM to follow.
  553. - `"similarity_threshold"`: `float` A score to evaluate distance between two lines of text. It's weighted keywords similarity and vector cosine similarity. If the similarity between query and chunk is less than this threshold, the chunk will be filtered out. Defaults to `0.2`.
  554. - `"keywords_similarity_weight"`: `float` It's weighted keywords similarity and vector cosine similarity or rerank score (0~1). Defaults to `0.7`.
  555. - `"top_n"`: `int` Not all the chunks whose similarity score is above the 'similarity threshold' will be feed to LLMs. LLM can only see these 'Top N' chunks. Defaults to `8`.
  556. - `"variables"`: `list[dict[]]` If you use dialog APIs, the variables might help you chat with your clients with different strategies. The variables are used to fill in the 'System' part in prompt in order to give LLM a hint. The 'knowledge' is a very special variable which will be filled-in with the retrieved chunks. All the variables in 'System' should be curly bracketed. Defaults to `[{"key": "knowledge", "optional": True}]`
  557. - `"rerank_model"`: `str` If it is not specified, vector cosine similarity will be used; otherwise, reranking score will be used. Defaults to `""`.
  558. - `"empty_response"`: `str` If nothing is retrieved in the knowledge base for the user's question, this will be used as the response. To allow the LLM to improvise when nothing is retrieved, leave this blank. Defaults to `None`.
  559. - `"opener"`: `str` The opening greeting for the user. Defaults to `"Hi! I am your assistant, can I help you?"`.
  560. - `"show_quote`: `bool` Indicates whether the source of text should be displayed Defaults to `True`.
  561. - `"prompt"`: `str` The prompt content. Defaults to `You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence "The answer you are looking for is not found in the knowledge base!" Answers need to consider chat history.
  562. Here is the knowledge base:
  563. {knowledge}
  564. The above is the knowledge base.`.
  565. ### Returns
  566. - Success: No value is returned.
  567. - Failure: `Exception`
  568. ### Examples
  569. ```python
  570. from ragflow import RAGFlow
  571. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  572. knowledge_base = rag.list_datasets(name="kb_1")
  573. assistant = rag.create_chat("Miss R", knowledgebases=knowledge_base)
  574. assistant.update({"name": "Stefan", "llm": {"temperature": 0.8}, "prompt": {"top_n": 8}})
  575. ```
  576. ---
  577. ## Delete chats
  578. Deletes specified chat assistants.
  579. ```python
  580. RAGFlow.delete_chats(ids: list[str] = None)
  581. ```
  582. ### Parameters
  583. #### ids
  584. IDs of the chat assistants to delete. If not specified, all chat assistants will be deleted.
  585. ### Returns
  586. - Success: No value is returned.
  587. - Failure: `Exception`
  588. ### Examples
  589. ```python
  590. from ragflow import RAGFlow
  591. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  592. rag.delete_chats(ids=["id_1","id_2"])
  593. ```
  594. ---
  595. ## List chats
  596. ```python
  597. RAGFlow.list_chats(
  598. page: int = 1,
  599. page_size: int = 1024,
  600. orderby: str = "create_time",
  601. desc: bool = True,
  602. id: str = None,
  603. name: str = None
  604. ) -> list[Chat]
  605. ```
  606. ### Parameters
  607. #### page
  608. Specifies the page on which the records will be displayed. Defaults to `1`.
  609. #### page_size
  610. The number of records on each page. Defaults to `1024`.
  611. #### order_by
  612. The attribute by which the results are sorted. Defaults to `"create_time"`.
  613. #### desc
  614. Indicates whether to sort the results in descending order. Defaults to `True`.
  615. #### id: `string`
  616. The ID of the chat to retrieve. Defaults to `None`.
  617. #### name: `string`
  618. The name of the chat to retrieve. Defaults to `None`.
  619. ### Returns
  620. - Success: A list of `Chat` objects.
  621. - Failure: `Exception`.
  622. ### Examples
  623. ```python
  624. from ragflow import RAGFlow
  625. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  626. for assistant in rag.list_chats():
  627. print(assistant)
  628. ```
  629. ---
  630. :::tip API GROUPING
  631. Chat-session APIs
  632. :::
  633. ## Create session
  634. ```python
  635. Chat.create_session(name: str = "New session") -> Session
  636. ```
  637. Creates a chat session.
  638. ### Parameters
  639. #### name
  640. The name of the chat session to create.
  641. ### Returns
  642. - Success: A `Session` object containing the following attributes:
  643. - `id`: `str` The auto-generated unique identifier of the created session.
  644. - `name`: `str` The name of the created session.
  645. - `message`: `list[Message]` The messages of the created session assistant. Default: `[{"role": "assistant", "content": "Hi! I am your assistant,can I help you?"}]`
  646. - `chat_id`: `str` The ID of the associated chat assistant.
  647. - Failure: `Exception`
  648. ### Examples
  649. ```python
  650. from ragflow import RAGFlow
  651. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  652. assistant = rag.list_chats(name="Miss R")
  653. assistant = assistant[0]
  654. session = assistant.create_session()
  655. ```
  656. ## Update session
  657. ```python
  658. Session.update(update_message: dict)
  659. ```
  660. Updates the current session.
  661. ### Parameters
  662. #### update_message: `dict[str, Any]`, *Required*
  663. - `"name"`: `str` The name of the session to update.
  664. ### Returns
  665. - Success: No value is returned.
  666. - Failure: `Exception`
  667. ### Examples
  668. ```python
  669. from ragflow import RAGFlow
  670. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  671. assistant = rag.list_chats(name="Miss R")
  672. assistant = assistant[0]
  673. session = assistant.create_session("session_name")
  674. session.update({"name": "updated_name"})
  675. ```
  676. ---
  677. ## Chat
  678. ```python
  679. Session.ask(question: str, stream: bool = False) -> Optional[Message, iter[Message]]
  680. ```
  681. ### Parameters
  682. #### question *Required*
  683. The question to start an AI chat. Defaults to `None`.
  684. #### stream
  685. Indicates whether to output responses in a streaming way. Defaults to `False`.
  686. ### Returns
  687. Optional[Message, iter[Message]]
  688. - Message object, if `stream` is set to `False`
  689. - iter[Message] object, if `stream` is set to `True`
  690. #### id: `str`
  691. The ID of the message. `id` is automatically generated.
  692. #### content: `str`
  693. The content of the message. Defaults to `"Hi! I am your assistant, can I help you?"`.
  694. #### reference: `list[Chunk]`
  695. The auto-generated reference of the message. Each `chunk` object includes the following attributes:
  696. - **id**: `str`
  697. The id of the chunk.
  698. - **content**: `str`
  699. The content of the chunk.
  700. - **document_id**: `str`
  701. The ID of the document being referenced.
  702. - **document_name**: `str`
  703. The name of the referenced document being referenced.
  704. - **knowledgebase_id**: `str`
  705. The id of the knowledge base to which the relevant document belongs.
  706. - **image_id**: `str`
  707. The id of the image related to the chunk.
  708. - **similarity**: `float`
  709. A general similarity score, usually a composite score derived from various similarity measures . This score represents the degree of similarity between two objects. The value ranges between 0 and 1, where a value closer to 1 indicates higher similarity.
  710. - **vector_similarity**: `float`
  711. A similarity score based on vector representations. This score is obtained by converting texts, words, or objects into vectors and then calculating the cosine similarity or other distance measures between these vectors to determine the similarity in vector space. A higher value indicates greater similarity in the vector space.
  712. - **term_similarity**: `float`
  713. The similarity score based on terms or keywords. This score is calculated by comparing the similarity of key terms between texts or datasets, typically measuring how similar two words or phrases are in meaning or context. A higher value indicates a stronger similarity between terms.
  714. - **position**: `list[string]`
  715. Indicates the position or index of keywords or specific terms within the text. An array is typically used to mark the location of keywords or specific elements, facilitating precise operations or analysis of the text.
  716. ### Examples
  717. ```python
  718. from ragflow import RAGFlow
  719. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  720. assistant = rag.list_chats(name="Miss R")
  721. assistant = assistant[0]
  722. sess = assistant.create_session()
  723. print("\n==================== Miss R =====================\n")
  724. print(assistant.get_prologue())
  725. while True:
  726. question = input("\n==================== User =====================\n> ")
  727. print("\n==================== Miss R =====================\n")
  728. cont = ""
  729. for ans in sess.ask(question, stream=True):
  730. print(ans.content[len(cont):], end='', flush=True)
  731. cont = ans.content
  732. ```
  733. ---
  734. ## List sessions
  735. ```python
  736. Chat.list_sessions(
  737. page: int = 1,
  738. page_size: int = 1024,
  739. orderby: str = "create_time",
  740. desc: bool = True,
  741. id: str = None,
  742. name: str = None
  743. ) -> list[Session]
  744. ```
  745. Lists sessions associated with the current chat assistant.
  746. ### Parameters
  747. #### page
  748. Specifies the page on which records will be displayed. Defaults to `1`.
  749. #### page_size
  750. The number of records on each page. Defaults to `1024`.
  751. #### orderby
  752. The field by which the records should be sorted. This specifies the attribute or column used to sort the results. Defaults to `"create_time"`.
  753. #### desc
  754. Whether the sorting should be in descending order. Defaults to `True`.
  755. #### id
  756. The ID of the chat session to retrieve. Defaults to `None`.
  757. #### name
  758. The name of the chat to retrieve. Defaults to `None`.
  759. ### Returns
  760. - Success: A list of `Session` objects associated with the current chat assistant.
  761. - Failure: `Exception`.
  762. ### Examples
  763. ```python
  764. from ragflow import RAGFlow
  765. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  766. assistant = rag.list_chats(name="Miss R")
  767. assistant = assistant[0]
  768. for session in assistant.list_sessions():
  769. print(session)
  770. ```
  771. ---
  772. ## Delete sessions
  773. ```python
  774. Chat.delete_sessions(ids:list[str] = None)
  775. ```
  776. Deletes specified sessions or all sessions associated with the current chat assistant.
  777. ### Parameters
  778. #### ids
  779. IDs of the sessions to delete. If not specified, all sessions associated with the current chat assistant will be deleted.
  780. ### Returns
  781. - Success: No value is returned.
  782. - Failure: `Exception`
  783. ### Examples
  784. ```python
  785. from ragflow import RAGFlow
  786. rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
  787. assistant = rag.list_chats(name="Miss R")
  788. assistant = assistant[0]
  789. assistant.delete_sessions(ids=["id_1","id_2"])
  790. ```