{/** * @typedef Props * @property {string} apiBaseUrl */} import { CodeGroup } from '@/app/components/develop/code.tsx' import { Row, Col, Properties, Property, Heading, SubProperty, PropertyInstruction, Paragraph } from '@/app/components/develop/md.tsx' # Knowledge API
high_quality High quality: Embedding using embedding model, built as vector database index
- economy Economy: Build using inverted index of keyword table index
text_model Text documents are directly embedded; `economy` mode defaults to using this form
- hierarchical_model Parent-child mode
- qa_model Q&A Mode: Generates Q&A pairs for segmented documents and then embeds the questions
English, Chinese
mode (string) Cleaning, segmentation mode, automatic / custom / hierarchical
- rules (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule
- enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) Segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000
- parent_mode Retrieval mode of parent chunks: full-doc full text retrieval / paragraph paragraph retrieval
- subchunk_segmentation (object) Child chunk rules
- separator Segmentation identifier. Currently, only one delimiter is allowed. The default is ***
- max_tokens The maximum length (tokens) must be validated to be shorter than the length of the parent chunk
- chunk_overlap Define the overlap between adjacent chunks (optional)
search_method (string) Search method
- hybrid_search Hybrid search
- semantic_search Semantic search
- full_text_search Full-text search
- reranking_enable (bool) Whether to enable reranking
- reranking_mode (object) Rerank model configuration
- reranking_provider_name (string) Rerank model provider
- reranking_model_name (string) Rerank model name
- top_k (int) Number of results to return
- score_threshold_enabled (bool) Whether to enable score threshold
- score_threshold (float) Score threshold
original_document_id Source document ID (optional)
- Used to re-upload the document or modify the document cleaning and segmentation configuration. The missing information is copied from the source document
- The source document cannot be an archived document
- When original_document_id is passed in, the update operation is performed on behalf of the document. process_rule is a fillable item. If not filled in, the segmentation method of the source document will be used by default
- When original_document_id is not passed in, the new operation is performed on behalf of the document, and process_rule is required
- indexing_technique Index mode
- high_quality High quality: embedding using embedding model, built as vector database index
- economy Economy: Build using inverted index of keyword table index
- doc_form Format of indexed content
- text_model Text documents are directly embedded; `economy` mode defaults to using this form
- hierarchical_model Parent-child mode
- qa_model Q&A Mode: Generates Q&A pairs for segmented documents and then embeds the questions
- doc_language In Q&A mode, specify the language of the document, for example: English, Chinese
- process_rule Processing rules
- mode (string) Cleaning, segmentation mode, automatic / custom / hierarchical
- rules (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule
- enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) Segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000
- parent_mode Retrieval mode of parent chunks: full-doc full text retrieval / paragraph paragraph retrieval
- subchunk_segmentation (object) Child chunk rules
- separator Segmentation identifier. Currently, only one delimiter is allowed. The default is ***
- max_tokens The maximum length (tokens) must be validated to be shorter than the length of the parent chunk
- chunk_overlap Define the overlap between adjacent chunks (optional)
search_method (string) Search method
- hybrid_search Hybrid search
- semantic_search Semantic search
- full_text_search Full-text search
- reranking_enable (bool) Whether to enable reranking
- reranking_mode (object) Rerank model configuration
- reranking_provider_name (string) Rerank model provider
- reranking_model_name (string) Rerank model name
- top_k (int) Number of results to return
- score_threshold_enabled (bool) Whether to enable score threshold
- score_threshold (float) Score threshold
high_quality High quality
- economy Economy
only_me Only me
- all_team_members All team members
- partial_members Partial members
vendor Vendor
- external External knowledge
search_method (string) Search method
- hybrid_search Hybrid search
- semantic_search Semantic search
- full_text_search Full-text search
- reranking_enable (bool) Whether to enable reranking
- reranking_model (object) Rerank model configuration
- reranking_provider_name (string) Rerank model provider
- reranking_model_name (string) Rerank model name
- top_k (int) Number of results to return
- score_threshold_enabled (bool) Whether to enable score threshold
- score_threshold (float) Score threshold
high_quality High quality
- economy Economy
only_me Only me
- all_team_members All team members
- partial_members Partial members
search_method (text) Search method: One of the following four keywords is required
- keyword_search Keyword search
- semantic_search Semantic search
- full_text_search Full-text search
- hybrid_search Hybrid search
- reranking_enable (bool) Whether to enable reranking, required if the search mode is semantic_search or hybrid_search (optional)
- reranking_mode (object) Rerank model configuration, required if reranking is enabled
- reranking_provider_name (string) Rerank model provider
- reranking_model_name (string) Rerank model name
- weights (float) Semantic search weight setting in hybrid search mode
- top_k (integer) Number of results to return (optional)
- score_threshold_enabled (bool) Whether to enable score threshold
- score_threshold (float) Score threshold
mode (string) Cleaning, segmentation mode, automatic / custom / hierarchical
- rules (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule
- enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) Segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000
- parent_mode Retrieval mode of parent chunks: full-doc full text retrieval / paragraph paragraph retrieval
- subchunk_segmentation (object) Child chunk rules
- separator Segmentation identifier. Currently, only one delimiter is allowed. The default is ***
- max_tokens The maximum length (tokens) must be validated to be shorter than the length of the parent chunk
- chunk_overlap Define the overlap between adjacent chunks (optional)
mode (string) Cleaning, segmentation mode, automatic / custom / hierarchical
- rules (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule
- enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) Segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000
- parent_mode Retrieval mode of parent chunks: full-doc full text retrieval / paragraph paragraph retrieval
- subchunk_segmentation (object) Child chunk rules
- separator Segmentation identifier. Currently, only one delimiter is allowed. The default is ***
- max_tokens The maximum length (tokens) must be validated to be shorter than the length of the parent chunk
- chunk_overlap Define the overlap between adjacent chunks (optional)
content (text) Text content / question content, required
- answer (text) Answer content, if the mode of the knowledge is Q&A mode, pass the value (optional)
- keywords (list) Keywords (optional)
content (text) Text content / question content, required
- answer (text) Answer content, passed if the knowledge is in Q&A mode (optional)
- keywords (list) Keyword (optional)
- enabled (bool) False / true (optional)
- regenerate_child_chunks (bool) Whether to regenerate child chunks (optional)
search_method (text) Search method: One of the following four keywords is required
- keyword_search Keyword search
- semantic_search Semantic search
- full_text_search Full-text search
- hybrid_search Hybrid search
- reranking_enable (bool) Whether to enable reranking, required if the search mode is semantic_search or hybrid_search (optional)
- reranking_mode (object) Rerank model configuration, required if reranking is enabled
- reranking_provider_name (string) Rerank model provider
- reranking_model_name (string) Rerank model name
- weights (float) Semantic search weight setting in hybrid search mode
- top_k (integer) Number of results to return (optional)
- score_threshold_enabled (bool) Whether to enable score threshold
- score_threshold (float) Score threshold
- metadata_filtering_conditions (object) Metadata filtering conditions
- logical_operator (string) Logical operator: and | or
- conditions (array[object]) Conditions list
- name (string) Metadata field name
- comparison_operator (string) Comparison operator, allowed values:
- String comparison:
- contains: Contains
- not contains: Does not contain
- start with: Starts with
- end with: Ends with
- is: Equals
- is not: Does not equal
- empty: Is empty
- not empty: Is not empty
- Numeric comparison:
- =: Equals
- ≠: Does not equal
- >: Greater than
- < : Less than
- ≥: Greater than or equal
- ≤: Less than or equal
- Time comparison:
- before: Before
- after: After
- value (string|number|null) Comparison value
type (string) Metadata type, required
- name (string) Metadata name, required
name (string) Metadata name, required
document_id (string) Document ID
- metadata_list (list) Metadata list
- id (string) Metadata ID
- value (string) Metadata value
- name (string) Metadata name
| code | status | message |
|---|---|---|
| no_file_uploaded | 400 | Please upload your file. |
| too_many_files | 400 | Only one file is allowed. |
| file_too_large | 413 | File size exceeded. |
| unsupported_file_type | 415 | File type not allowed. |
| high_quality_dataset_only | 400 | Current operation only supports 'high-quality' datasets. |
| dataset_not_initialized | 400 | The dataset is still being initialized or indexing. Please wait a moment. |
| archived_document_immutable | 403 | The archived document is not editable. |
| dataset_name_duplicate | 409 | The dataset name already exists. Please modify your dataset name. |
| invalid_action | 400 | Invalid action. |
| document_already_finished | 400 | The document has been processed. Please refresh the page or go to the document details. |
| document_indexing | 400 | The document is being processed and cannot be edited. |
| invalid_metadata | 400 | The metadata content is incorrect. Please check and verify. |