Bläddra i källkod

DRAFT: Miscellaneous proofedits on Python APIs (#2903)

### What problem does this PR solve?



### Type of change


- [x] Documentation Update
tags/v0.13.0
writinwaters 1 år sedan
förälder
incheckning
1e6d44d6ef
Inget konto är kopplat till bidragsgivarens mejladress
1 ändrade filer med 167 tillägg och 125 borttagningar
  1. 167
    125
      api/python_api_reference.md

+ 167
- 125
api/python_api_reference.md Visa fil



**THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.** **THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**


---

:::tip NOTE :::tip NOTE
Dataset Management Dataset Management
::: :::


---

## Create dataset ## Create dataset


```python ```python


#### permission #### permission


Specifies who can operate on the dataset. You can set it only to `"me"` for now.
Specifies who can access the dataset to create. You can set it only to `"me"` for now.


#### chunk_method, `str` #### chunk_method, `str`


The default parsing method of the knwoledge . Defaults to `"naive"`.
The chunking method of the dataset to create. Available options:

- `"naive"`: General (default)
- `"manual`: Manual
- `"qa"`: Q&A
- `"table"`: Table
- `"paper"`: Paper
- `"book"`: Book
- `"laws"`: Laws
- `"presentation"`: Presentation
- `"picture"`: Picture
- `"one"`:One
- `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email


#### parser_config #### parser_config




- `chunk_token_count`: Defaults to `128`. - `chunk_token_count`: Defaults to `128`.
- `layout_recognize`: Defaults to `True`. - `layout_recognize`: Defaults to `True`.
- `delimiter`: Defaults to `'\n!?。;!?'`.
- `delimiter`: Defaults to `"\n!?。;!?"`.
- `task_page_size`: Defaults to `12`. - `task_page_size`: Defaults to `12`.


### Returns ### Returns
from ragflow import RAGFlow from ragflow import RAGFlow


rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380") rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
ds = rag_object.create_dataset(name="kb_1")
dataset = rag_object.create_dataset(name="kb_1")
``` ```


--- ---
RAGFlow.delete_datasets(ids: list[str] = None) RAGFlow.delete_datasets(ids: list[str] = None)
``` ```


Deletes datasets by name or ID.
Deletes specified datasets or all datasets in the system.


### Parameters ### Parameters


#### ids
#### ids: `list[str]`


The IDs of the datasets to delete.
The IDs of the datasets to delete. Defaults to `None`. If not specified, all datasets in the system will be deleted.


### Returns ### Returns


### Examples ### Examples


```python ```python
rag.delete_datasets(ids=["id_1","id_2"])
rag_object.delete_datasets(ids=["id_1","id_2"])
``` ```


--- ---


#### page: `int` #### page: `int`


The current page number to retrieve from the paginated results. Defaults to `1`.
Specifies the page on which the datasets will be displayed. Defaults to `1`.


#### page_size: `int` #### page_size: `int`


The number of records on each page. Defaults to `1024`.
The number of datasets on each page. Defaults to `1024`.


#### order_by: `str`
#### orderby: `str`


The field by which the records should be sorted. This specifies the attribute or column used to order the results. Defaults to `"create_time"`.
The field by which datasets should be sorted. Available options:

- `"create_time"` (default)
- `"update_time"`


#### desc: `bool` #### desc: `bool`




#### id: `str` #### id: `str`


The id of the dataset to be got. Defaults to `None`.
The ID of the dataset to retrieve. Defaults to `None`.


#### name: `str` #### name: `str`


The name of the dataset to be got. Defaults to `None`.
The name of the dataset to retrieve. Defaults to `None`.


### Returns ### Returns


- Success: A list of `DataSet` objects representing the retrieved datasets.
- Success: A list of `DataSet` objects.
- Failure: `Exception`. - Failure: `Exception`.


### Examples ### Examples
#### List all datasets #### List all datasets


```python ```python
for ds in rag_object.list_datasets():
print(ds)
for dataset in rag_object.list_datasets():
print(dataset)
``` ```


#### Retrieve a dataset by ID #### Retrieve a dataset by ID
DataSet.update(update_message: dict) DataSet.update(update_message: dict)
``` ```


Updates the current dataset.
Updates configurations for the current dataset.


### Parameters ### Parameters


#### update_message: `dict[str, str|int]`, *Required* #### update_message: `dict[str, str|int]`, *Required*


A dictionary representing the attributes to update, with the following keys:

- `"name"`: `str` The name of the dataset to update. - `"name"`: `str` The name of the dataset to update.
- `"embedding_model"`: `str` The embedding model for generating vector embeddings.
- `"embedding_model"`: `str` The embedding model name to update.
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`. - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
- `"chunk_method"`: `str` The default parsing method for the dataset.
- `"chunk_method"`: `str` The chunking method for the dataset. Available options:
- `"naive"`: General - `"naive"`: General
- `"manual`: Manual - `"manual`: Manual
- `"qa"`: Q&A - `"qa"`: Q&A
```python ```python
from ragflow import RAGFlow from ragflow import RAGFlow


rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag.list_datasets(name="kb_name")
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_name")
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"}) dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
``` ```




### Parameters ### Parameters


#### document_list
#### document_list: `list[dict]`, *Required*


A list of dictionaries representing the documents to upload, each containing the following keys: A list of dictionaries representing the documents to upload, each containing the following keys:




#### update_message: `dict[str, str|dict[]]`, *Required* #### update_message: `dict[str, str|dict[]]`, *Required*


A dictionary representing the attributes to update, with the following keys:

- `"name"`: `str` The name of the document to update. - `"name"`: `str` The name of the document to update.
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document: - `"parser_config"`: `dict[str, Any]` The parsing configuration for the document:
- `"chunk_token_count"`: Defaults to `128`. - `"chunk_token_count"`: Defaults to `128`.
```python ```python
from ragflow import RAGFlow from ragflow import RAGFlow


rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset=rag.list_datasets(id='id')
dataset=dataset[0]
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id='id')
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d") doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0] doc = doc[0]
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}]) doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
Document.download() -> bytes Document.download() -> bytes
``` ```


Downloads the current document from RAGFlow.
Downloads the current document.


### Returns ### Returns




### Parameters ### Parameters


#### id
#### id: `str`


The ID of the document to retrieve. Defaults to `None`. The ID of the document to retrieve. Defaults to `None`.


#### keywords
#### keywords: `str`


The keywords to match document titles. Defaults to `None`. The keywords to match document titles. Defaults to `None`.


#### offset
#### offset: `int`


The beginning number of records for paging. Defaults to `0`.
The starting index for the documents to retrieve. Typically used in confunction with `limit`. Defaults to `0`.


#### limit
#### limit: `int`


Records number to return, `-1` means all of them. Records number to return, `-1` means all of them.
The maximum number of documents to retrieve. Defaults to `1024`. A value of `-1` indicates that all documents should be returned.


#### orderby
#### orderby: `str`


The field by which the documents should be sorted. Available options:
The field by which documents should be sorted. Available options:


- `"create_time"` (Default)
- `"create_time"` (default)
- `"update_time"` - `"update_time"`


#### desc
#### desc: `bool`


Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`. Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`.




A `Document` object contains the following attributes: A `Document` object contains the following attributes:


- `id` Id of the retrieved document. Defaults to `""`.
- `thumbnail` Thumbnail image of the retrieved document. Defaults to `""`.
- `knowledgebase_id` Dataset ID related to the document. Defaults to `""`.
- `chunk_method` Method used to parse the document. Defaults to `""`.
- `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `None`.
- `source_type`: Source type of the document. Defaults to `""`.
- `type`: Type or category of the document. Defaults to `""`.
- `created_by`: `str` Creator of the document. Defaults to `""`.
- `name` Name or title of the document. Defaults to `""`.
- `size`: `int` Size of the document in bytes or some other unit. Defaults to `0`.
- `token_count`: `int` Number of tokens in the document. Defaults to `""`.
- `chunk_count`: `int` Number of chunks the document is split into. Defaults to `0`.
- `progress`: `float` Current processing progress as a percentage. Defaults to `0.0`.
- `progress_msg`: `str` Message indicating current progress status. Defaults to `""`.
- `process_begin_at`: `datetime` Start time of the document processing. Defaults to `None`.
- `process_duation`: `float` Duration of the processing in seconds or minutes. Defaults to `0.0`.
- `id`: The document ID. Defaults to `""`.
- `name`: The document name. Defaults to `""`.
- `thumbnail`: The thumbnail image of the document. Defaults to `None`.
- `knowledgebase_id`: The dataset ID associated with the document. Defaults to `None`.
- `chunk_method` The chunk method name. Defaults to `""`. ?????naive??????
- `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `{"pages": [[1, 1000000]]}`.
- `source_type`: The source type of the document. Defaults to `"local"`.
- `type`: Type or category of the document???????????. Defaults to `""`.
- `created_by`: `str` The creator of the document. Defaults to `""`.
- `size`: `int` The document size in bytes. Defaults to `0`.
- `token_count`: `int` The number of tokens in the document. Defaults to `0`.
- `chunk_count`: `int` The number of chunks that the document is split into. Defaults to `0`.
- `progress`: `float` The current processing progress as a percentage. Defaults to `0.0`.
- `progress_msg`: `str` A message indicating the current progress status. Defaults to `""`.
- `process_begin_at`: `datetime` The start time of document processing. Defaults to `None`.
- `process_duation`: `float` Duration of the processing in seconds or minutes.??????? Defaults to `0.0`.
- `run`: `str` ?????????????????? Defaults to `"0"`.
- `status`: `str` ??????????????????? Defaults to `"1"`.


### Examples ### Examples


dataset = rag.create_dataset(name="kb_1") dataset = rag.create_dataset(name="kb_1")


filename1 = "~/ragflow.txt" filename1 = "~/ragflow.txt"
blob=open(filename1 , "rb").read()
list_files=[{"name":filename1,"blob":blob}]
dataset.upload_documents(list_files)
for d in dataset.list_documents(keywords="rag", offset=0, limit=12):
print(d)
blob = open(filename1 , "rb").read()
dataset.upload_documents([{"name":filename1,"blob":blob}])
for doc in dataset.list_documents(keywords="rag", offset=0, limit=12):
print(doc)
``` ```


--- ---
DataSet.delete_documents(ids: list[str] = None) DataSet.delete_documents(ids: list[str] = None)
``` ```


Deletes specified documents or all documents from the current dataset.
Deletes documents by ID.

### Parameters

#### ids: `list[list]`

The IDs of the documents to delete. Defaults to `None`. If not specified, all documents in the dataset will be deleted.


### Returns ### Returns


```python ```python
from ragflow import RAGFlow from ragflow import RAGFlow


rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
ds = rag.list_datasets(name="kb_1")
ds = ds[0]
ds.delete_documents(ids=["id_1","id_2"])
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_1")
dataset = dataset[0]
dataset.delete_documents(ids=["id_1","id_2"])
``` ```


--- ---


### Parameters ### Parameters


#### document_ids: `list[str]`
#### document_ids: `list[str]`, *Required*


The IDs of the documents to parse. The IDs of the documents to parse.


### Examples ### Examples


```python ```python
#documents parse and cancel
rag = RAGFlow(API_KEY, HOST_ADDRESS)
ds = rag.create_dataset(name="dataset_name")
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [ documents = [
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()}, {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()}, {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()} {'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
] ]
ds.upload_documents(documents)
documents=ds.list_documents(keywords="test")
ids=[]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents: for document in documents:
ids.append(document.id) ids.append(document.id)
ds.async_parse_documents(ids)
print("Async bulk parsing initiated")
ds.async_cancel_parse_documents(ids)
print("Async bulk parsing cancelled")
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
``` ```


--- ---


### Parameters ### Parameters


#### document_ids: `list[str]`
#### document_ids: `list[str]`, *Required*


The IDs of the documents to stop parsing.
The IDs of the documents for which parsing should be stopped.


### Returns ### Returns


### Examples ### Examples


```python ```python
#documents parse and cancel
rag = RAGFlow(API_KEY, HOST_ADDRESS)
ds = rag.create_dataset(name="dataset_name")
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [ documents = [
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()}, {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()}, {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()} {'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
] ]
ds.upload_documents(documents)
documents=ds.list_documents(keywords="test")
ids=[]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents: for document in documents:
ids.append(document.id) ids.append(document.id)
ds.async_parse_documents(ids)
print("Async bulk parsing initiated")
ds.async_cancel_parse_documents(ids)
print("Async bulk parsing cancelled")
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
dataset.async_cancel_parse_documents(ids)
print("Async bulk parsing cancelled.")
``` ```


--- ---
Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk] Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
``` ```


Retrieves a list of document chunks.

### Parameters ### Parameters


#### keywords
#### keywords: `str`
List chunks whose name has the given keywords. Defaults to `None` List chunks whose name has the given keywords. Defaults to `None`


#### offset
#### offset: `int`


The beginning number of records for paging. Defaults to `1`
The starting index for the chunks to retrieve. Defaults to `1`


#### limit #### limit


Records number to return. Default: `30`
The maximum number of chunks to retrieve. Default: `30`


#### id #### id




### Returns ### Returns


list[chunk]
- Success: A list of `Chunk` objects.
- Failure: `Exception`.


### Examples ### Examples


```python ```python
from ragflow import RAGFlow from ragflow import RAGFlow


rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
ds = rag.list_datasets("123")
ds = ds[0]
ds.async_parse_documents(["wdfxb5t547d"])
for c in doc.list_chunks(keywords="rag", offset=0, limit=12):
print(c)
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets("123")
dataset = dataset[0]
dataset.async_parse_documents(["wdfxb5t547d"])
for chunk in doc.list_chunks(keywords="rag", offset=0, limit=12):
print(chunk)
``` ```


## Add chunk ## Add chunk


#### content: *Required* #### content: *Required*


The main text or information of the chunk.
The text content of the chunk.


#### important_keywords :`list[str]` #### important_keywords :`list[str]`


Document.delete_chunks(chunk_ids: list[str]) Document.delete_chunks(chunk_ids: list[str])
``` ```


Deletes chunks by ID.

### Parameters ### Parameters


#### chunk_ids:`list[str]`
#### chunk_ids: `list[str]`


A list of chunk_id.
The IDs of the chunks to delete. Defaults to `None`. If not specified, all chunks of the current document will be deleted.


### Returns ### Returns


Chunk.update(update_message: dict) Chunk.update(update_message: dict)
``` ```


Updates the current chunk.
Updates content or configurations for the current chunk.


### Parameters ### Parameters


#### update_message: `dict[str, str|list[str]|int]` *Required* #### update_message: `dict[str, str|list[str]|int]` *Required*


A dictionary representing the attributes to update, with the following keys:

- `"content"`: `str` Content of the chunk. - `"content"`: `str` Content of the chunk.
- `"important_keywords"`: `list[str]` A list of key terms to attach to the chunk. - `"important_keywords"`: `list[str]` A list of key terms to attach to the chunk.
- `"available"`: `int` The chunk's availability status in the dataset.
- `"available"`: `int` The chunk's availability status in the dataset. Value options:
- `0`: Unavailable - `0`: Unavailable
- `1`: Available - `1`: Available




#### offset: `int` #### offset: `int`


The beginning point of retrieved chunks. Defaults to `0`.
The starting index for the documents to retrieve. Defaults to `0`??????.


#### limit: `int` #### limit: `int`


The maximum number of chunks to return. Defaults to `6`.
The maximum number of chunks to retrieve. Defaults to `6`.


#### Similarity_threshold: `float` #### Similarity_threshold: `float`


Chat Assistant Management Chat Assistant Management
::: :::


---

## Create chat assistant ## Create chat assistant


```python ```python
Chat.update(update_message: dict) Chat.update(update_message: dict)
``` ```


Updates the current chat assistant.
Updates configurations for the current chat assistant.


### Parameters ### Parameters


#### update_message: `dict[str, Any]`, *Required*
#### update_message: `dict[str, str|list[str]|dict[]]`, *Required*

A dictionary representing the attributes to update, with the following keys:


- `"name"`: `str` The name of the chat assistant to update. - `"name"`: `str` The name of the chat assistant to update.
- `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""` - `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""`
- `"knowledgebases"`: `list[str]` datasets to update.
- `"knowledgebases"`: `list[str]` The datasets to update.
- `"llm"`: `dict` The LLM settings: - `"llm"`: `dict` The LLM settings:
- `"model_name"`, `str` The chat model name. - `"model_name"`, `str` The chat model name.
- `"temperature"`, `float` Controls the randomness of the model's predictions. - `"temperature"`, `float` Controls the randomness of the model's predictions.


## Delete chats ## Delete chats


Deletes specified chat assistants.

```python ```python
RAGFlow.delete_chats(ids: list[str] = None) RAGFlow.delete_chats(ids: list[str] = None)
``` ```


Deletes chat assistants by ID.

### Parameters ### Parameters


#### ids
#### ids: `list[str]`


IDs of the chat assistants to delete. If not specified, all chat assistants will be deleted.
The IDs of the chat assistants to delete. Defaults to `None`. If not specified, all chat assistants in the system will be deleted.


### Returns ### Returns




#### page #### page


Specifies the page on which the records will be displayed. Defaults to `1`.
Specifies the page on which the chat assistants will be displayed. Defaults to `1`.


#### page_size #### page_size


The number of records on each page. Defaults to `1024`.
The number of chat assistants on each page. Defaults to `1024`.


#### order_by #### order_by


```python ```python
from ragflow import RAGFlow from ragflow import RAGFlow


rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
for assistant in rag.list_chats():
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
for assistant in rag_object.list_chats():
print(assistant) print(assistant)
``` ```


Chat-session APIs Chat-session APIs
::: :::


---

## Create session ## Create session


```python ```python
Session.update(update_message: dict) Session.update(update_message: dict)
``` ```


Updates the current session.
Updates the current session name.


### Parameters ### Parameters


#### update_message: `dict[str, Any]`, *Required* #### update_message: `dict[str, Any]`, *Required*


A dictionary representing the attributes to update, with only one key:

- `"name"`: `str` The name of the session to update. - `"name"`: `str` The name of the session to update.


### Returns ### Returns


#### page #### page


Specifies the page on which records will be displayed. Defaults to `1`.
Specifies the page on which the sessions will be displayed. Defaults to `1`.


#### page_size #### page_size


The number of records on each page. Defaults to `1024`.
The number of sessions on each page. Defaults to `1024`.


#### orderby #### orderby


The field by which the sessions should be sorted. Available options:
The field by which sessions should be sorted. Available options:


- `"create_time"` (Default)
- `"create_time"` (default)
- `"update_time"` - `"update_time"`


#### desc #### desc
```python ```python
from ragflow import RAGFlow from ragflow import RAGFlow


rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag.list_chats(name="Miss R")
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0] assistant = assistant[0]
for session in assistant.list_sessions(): for session in assistant.list_sessions():
print(session) print(session)
Chat.delete_sessions(ids:list[str] = None) Chat.delete_sessions(ids:list[str] = None)
``` ```


Deletes specified sessions or all sessions associated with the current chat assistant.
Deletes sessions by ID.


### Parameters ### Parameters


#### ids
#### ids: `list[str]`


IDs of the sessions to delete. If not specified, all sessions associated with the current chat assistant will be deleted.
The IDs of the sessions to delete. Defaults to `None`. If not specified, all sessions associated with the current chat assistant will be deleted.


### Returns ### Returns



Laddar…
Avbryt
Spara