瀏覽代碼

Updated parser_config description (#3104)

### What problem does this PR solve?



### Type of change


- [x] Documentation Update
tags/v0.13.0
writinwaters 1 年之前
父節點
當前提交
86b546f657
沒有連結到貢獻者的電子郵件帳戶。
共有 2 個檔案被更改,包括 95 行新增31 行删除
  1. 31
    16
      api/http_api_reference.md
  2. 64
    15
      api/python_api_reference.md

+ 31
- 16
api/http_api_reference.md 查看文件

- `"chunk_method"`: (*Body parameter*), `enum<string>` - `"chunk_method"`: (*Body parameter*), `enum<string>`
The chunking method of the dataset to create. Available options: The chunking method of the dataset to create. Available options:
- `"naive"`: General (default) - `"naive"`: General (default)
- `"manual`: Manual
- `"manual"`: Manual
- `"qa"`: Q&A - `"qa"`: Q&A
- `"table"`: Table - `"table"`: Table
- `"paper"`: Paper - `"paper"`: Paper
- `"picture"`: Picture - `"picture"`: Picture
- `"one"`: One - `"one"`: One
- `"knowledge_graph"`: Knowledge Graph - `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email


- `"parser_config"`: (*Body parameter*), `object` - `"parser_config"`: (*Body parameter*), `object`
The configuration settings for the dataset parser, a JSON object containing the following attributes:
- `"chunk_token_count"`: Defaults to `128`.
- `"layout_recognize"`: Defaults to `true`.
- `"html4excel"`: Indicates whether to convert Excel documents into HTML format. Defaults to `false`.
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
- `"task_page_size"`: Defaults to `12`. For PDF only.
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
- If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
- `"chunk_token_count"`: Defaults to `128`.
- `"layout_recognize"`: Defaults to `true`.
- `"html4excel"`: Indicates whether to convert Excel documents into HTML format. Defaults to `false`.
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
- `"task_page_size"`: Defaults to `12`. For PDF only.
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
- If `"chunk_method"` is `"table"` or `"one"`, `"parser_config"` is an empty JSON object.
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
- `"chunk_token_count"`: Defaults to `128`.
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
- `"entity_types"`: Defaults to `["organization","person","location","event","time"]`


### Response ### Response


- `"picture"`: Picture - `"picture"`: Picture
- `"one"`:One - `"one"`:One
- `"knowledge_graph"`: Knowledge Graph - `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email


### Response ### Response


- `"picture"`: Picture - `"picture"`: Picture
- `"one"`: One - `"one"`: One
- `"knowledge_graph"`: Knowledge Graph - `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email
- `"parser_config"`: (*Body parameter*), `object` - `"parser_config"`: (*Body parameter*), `object`
The parsing configuration for the document:
- `"chunk_token_count"`: Defaults to `128`.
- `"layout_recognize"`: Defaults to `true`.
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
- `"task_page_size"`: Defaults to `12`. For PDF only.
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
- If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
- `"chunk_token_count"`: Defaults to `128`.
- `"layout_recognize"`: Defaults to `true`.
- `"html4excel"`: Indicates whether to convert Excel documents into HTML format. Defaults to `false`.
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
- `"task_page_size"`: Defaults to `12`. For PDF only.
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
- If `"chunk_method"` is `"table"` or `"one"`, `"parser_config"` is an empty JSON object.
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
- `"chunk_token_count"`: Defaults to `128`.
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
- `"entity_types"`: Defaults to `["organization","person","location","event","time"]`


### Response ### Response



+ 64
- 15
api/python_api_reference.md 查看文件

- `"picture"`: Picture - `"picture"`: Picture
- `"one"`: One - `"one"`: One
- `"knowledge_graph"`: Knowledge Graph - `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email


#### parser_config #### parser_config


The parser configuration of the dataset. A `ParserConfig` object contains the following attributes:

- `chunk_token_count`: Defaults to `128`.
- `layout_recognize`: Defaults to `True`.
- `delimiter`: Defaults to `"\n!?。;!?"`.
- `task_page_size`: Defaults to `12`.
The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `"chunk_method"`:

- `"chunk_method"`=`"naive"`:
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
- `chunk_method`=`"qa"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"manuel"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"table"`:
`None`
- `chunk_method`=`"paper"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"book"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"laws"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"presentation"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"one"`:
`None`
- `chunk_method`=`"knowledge-graph"`:
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`


### Returns ### Returns


- `"picture"`: Picture - `"picture"`: Picture
- `"one"`: One - `"one"`: One
- `"knowledge_graph"`: Knowledge Graph - `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email


### Returns ### Returns


A dictionary representing the attributes to update, with the following keys: A dictionary representing the attributes to update, with the following keys:


- `"display_name"`: `str` The name of the document to update. - `"display_name"`: `str` The name of the document to update.
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document:
- `"chunk_token_count"`: Defaults to `128`.
- `"layout_recognize"`: Defaults to `True`.
- `"delimiter"`: Defaults to `'\n!?。;!?'`.
- `"task_page_size"`: Defaults to `12`.
- `"chunk_method"`: `str` The parsing method to apply to the document. - `"chunk_method"`: `str` The parsing method to apply to the document.
- `"naive"`: General - `"naive"`: General
- `"manual`: Manual - `"manual`: Manual
- `"picture"`: Picture - `"picture"`: Picture
- `"one"`: One - `"one"`: One
- `"knowledge_graph"`: Knowledge Graph - `"knowledge_graph"`: Knowledge Graph
- `"email"`: Email
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
- `"chunk_method"`=`"naive"`:
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
- `chunk_method`=`"qa"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"manuel"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"table"`:
`None`
- `chunk_method`=`"paper"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"book"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"laws"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"presentation"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"one"`:
`None`
- `chunk_method`=`"knowledge-graph"`:
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`


### Returns ### Returns


- `thumbnail`: The thumbnail image of the document. Defaults to `None`. - `thumbnail`: The thumbnail image of the document. Defaults to `None`.
- `dataset_id`: The dataset ID associated with the document. Defaults to `None`. - `dataset_id`: The dataset ID associated with the document. Defaults to `None`.
- `chunk_method` The chunk method name. Defaults to `"naive"`. - `chunk_method` The chunk method name. Defaults to `"naive"`.
- `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `{"pages": [[1, 1000000]]}`.
- `source_type`: The source type of the document. Defaults to `"local"`. - `source_type`: The source type of the document. Defaults to `"local"`.
- `type`: Type or category of the document. Defaults to `""`. Reserved for future use. - `type`: Type or category of the document. Defaults to `""`. Reserved for future use.
- `created_by`: `str` The creator of the document. Defaults to `""`. - `created_by`: `str` The creator of the document. Defaults to `""`.
- `"DONE"` - `"DONE"`
- `"FAIL"` - `"FAIL"`
- `status`: `str` Reserved for future use. - `status`: `str` Reserved for future use.
- `parser_config`: `ParserConfig` Configuration object for the parser. Its attributes vary based on the selected `chunk_method`:
- `chunk_method`=`"naive"`:
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
- `chunk_method`=`"qa"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"manuel"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"table"`:
`None`
- `chunk_method`=`"paper"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"book"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"laws"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"presentation"`:
`{"raptor": {"user_raptor": False}}`
- `chunk_method`=`"one"`:
`None`
- `chunk_method`=`"knowledge-graph"`:
`{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`


### Examples ### Examples



Loading…
取消
儲存