Przeglądaj źródła

Updated deploy a local llm using IPEX-LLM (#1578)

### What problem does this PR solve?



### Type of change


- [x] Documentation Update
tags/v0.9.0
writinwaters 1 rok temu
rodzic
commit
43cd455b52
No account linked to committer's email address
5 zmienionych plików z 82 dodań i 51 usunięć
  1. 1
    1
      README.md
  2. 1
    1
      README_ja.md
  3. 1
    1
      README_zh.md
  4. 78
    47
      docs/guides/deploy_local_llm.mdx
  5. 1
    1
      docs/guides/manage_files.md

+ 1
- 1
README.md Wyświetl plik

</div> </div>
## 📌 Latest Updates
## 🔥 Latest Updates
- 2024-07-08 Supports workflow based on [Graph](./graph/README.md). - 2024-07-08 Supports workflow based on [Graph](./graph/README.md).
- 2024-06-27 Supports Markdown and Docx in the Q&A parsing method. - 2024-06-27 Supports Markdown and Docx in the Q&A parsing method.

+ 1
- 1
README_ja.md Wyświetl plik

</div> </div>
## 📌 最新情報
## 🔥 最新情報
- 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート - 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート
- 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。 - 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。
- 2024-06-27 Docxファイルからの画像の抽出をサポートします。 - 2024-06-27 Docxファイルからの画像の抽出をサポートします。

+ 1
- 1
README_zh.md Wyświetl plik

</div> </div>




## 📌 近期更新
## 🔥 近期更新


- 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。 - 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。
- 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。 - 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。

docs/guides/deploy_local_llm.md → docs/guides/deploy_local_llm.mdx Wyświetl plik

--- ---


# Deploy a local LLM # Deploy a local LLM
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';


RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models. RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.




## Deploy a local model using Xinference ## Deploy a local model using Xinference


Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.
Xorbits Inference ([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.


:::note :::note
- For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/). - For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/).


### 3. Launch your local model ### 3. Launch your local model


Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method
:
Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method:
```bash ```bash
$ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
``` ```


## Deploy a local model using IPEX-LLM ## Deploy a local model using IPEX-LLM


IPEX-LLM([IPEX-LLM](https://github.com/intel-analytics/ipex-llm)) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency
[IPEX-LLM](https://github.com/intel-analytics/ipex-llm) is a PyTorch library for running LLMs on local Intel CPUs or GPUs (including iGPU or discrete GPUs like Arc, Flex, and Max) with low latency. It supports Ollama on Linux and Windows systems.


To deploy a local model, eg., **Qwen2**, using IPEX-LLM, follow the steps below:
To deploy a local model, e.g., **Qwen2**, using IPEX-LLM-accelerated Ollama:


### 1. Check firewall settings ### 1. Check firewall settings


sudo ufw allow 11434/tcp sudo ufw allow 11434/tcp
``` ```


### 2. Install and Start Ollama serve using IPEX-LLM
### 2. Launch Ollama service using IPEX-LLM


#### 2.1 Install IPEX-LLM for Ollama #### 2.1 Install IPEX-LLM for Ollama


IPEX-LLM's support for `ollama` now is available for Linux system and Windows system.
:::tip NOTE
IPEX-LLM's supports Ollama on Linux and Windows systems.
:::


Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md), and follow the instructions in section [Prerequisites](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#0-prerequisites) to setup and section [Install IPEX-LLM cpp](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp) to install the IPEX-LLM with Ollama binaries.
For detailed information about installing IPEX-LLM for Ollama, see [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md):
- [Prerequisites](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#0-prerequisites)
- [Install IPEX-LLM cpp with Ollama binaries](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp)


**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `ollama` commands with IPEX-LLM.**
*After the installation, you should have created a Conda environment, e.g., `llm-cpp`, for running Ollama commands with IPEX-LLM.*


#### 2.2 Initialize Ollama #### 2.2 Initialize Ollama


Activate the `llm-cpp` conda environment and initialize Ollama by executing the commands below. A symbolic link to `ollama` will appear in your current directory.
1. Activate the `llm-cpp` Conda environment and initialize Ollama:


- For **Linux users**:
<Tabs
defaultValue="linux"
values={[
{label: 'Linux', value: 'linux'},
{label: 'Windows', value: 'windows'},
]}>
<TabItem value="linux">
```bash ```bash
conda activate llm-cpp conda activate llm-cpp
init-ollama init-ollama
``` ```
</TabItem>
<TabItem value="windows">


- For **Windows users**:

Please run the following command with **administrator privilege in Miniforge Prompt**.
Run these commands with *administrator privileges in Miniforge Prompt*:


```cmd ```cmd
conda activate llm-cpp conda activate llm-cpp
init-ollama.bat init-ollama.bat
``` ```
</TabItem>
</Tabs>


> [!NOTE]
> If you have installed higher version `ipex-llm[cpp]` and want to upgrade your ollama binary file, don't forget to remove old binary files first and initialize again with `init-ollama` or `init-ollama.bat`.
2. If the installed `ipex-llm[cpp]` requires an upgrade to the Ollama binary files, remove the old binary files and reinitialize Ollama using `init-ollama` (Linux) or `init-ollama.bat` (Windows).
*A symbolic link to Ollama appears in your current directory, and you can use this executable file following standard Ollama commands.*


**Now you can use this executable file by standard ollama's usage.**
#### 2.3 Launch Ollama service


#### 2.3 Run Ollama Serve
1. Set the environment variable `OLLAMA_NUM_GPU` to `999` to ensure that all layers of your model run on the Intel GPU; otherwise, some layers may default to CPU.
2. For optimal performance on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), set the following environment variable before launching the Ollama service:


You may launch the Ollama service as below:
```bash
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
```
3. Launch the Ollama service:


- For **Linux users**:
<Tabs
defaultValue="linux"
values={[
{label: 'Linux', value: 'linux'},
{label: 'Windows', value: 'windows'},
]}>
<TabItem value="linux">


```bash ```bash
export OLLAMA_NUM_GPU=999 export OLLAMA_NUM_GPU=999
./ollama serve ./ollama serve
``` ```


- For **Windows users**:
</TabItem>
<TabItem value="windows">


Please run the following command in Miniforge Prompt.
Run the following command *in Miniforge Prompt*:


```cmd ```cmd
set OLLAMA_NUM_GPU=999 set OLLAMA_NUM_GPU=999


ollama serve ollama serve
``` ```
</TabItem>
</Tabs>




> Please set environment variable `OLLAMA_NUM_GPU` to `999` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.


> If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), it is recommended to additionaly set the following environment variable for optimal performance before executing `ollama serve`:
>
> ```bash
> export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
> ```


> To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
:::tip NOTE
To enable the Ollama service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` rather than simply `./ollama serve`.
:::


The console will display messages similar to the following:
*The console displays messages similar to the following:*


![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png) ![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png)


### 3. Pull and Run Ollama Model
### 3. Pull and Run Ollama model

#### 3.1 Pull Ollama model


Keep the Ollama service on and open another terminal and run `./ollama pull <model_name>` in Linux (`ollama.exe pull <model_name>` in Windows) to automatically pull a model. e.g. `qwen2:latest`:
With the Ollama service running, open a new terminal and run `./ollama pull <model_name>` (Linux) or `ollama.exe pull <model_name>` (Windows) to pull the desired model. e.g., `qwen2:latest`:


![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png) ![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png)


#### Run Ollama Model
#### 3.2 Run Ollama model

<Tabs
defaultValue="linux"
values={[
{label: 'Linux', value: 'linux'},
{label: 'Windows', value: 'windows'},
]}>
<TabItem value="linux">


- For **Linux users**:
```bash ```bash
./ollama run qwen2:latest ./ollama run qwen2:latest
``` ```
- For **Windows users**:
</TabItem>
<TabItem value="windows">

```cmd ```cmd
ollama run qwen2:latest ollama run qwen2:latest
``` ```
### 4. Configure RAGflow to use IPEX-LLM accelerated Ollama

The confiugraiton follows the steps in


Ollama Section 4 [Add Ollama](#4-add-ollama),
</TabItem>
</Tabs>


Section 5 [Complete basic Ollama settings](#5-complete-basic-ollama-settings),
### 4. Configure RAGflow


Section 6 [Update System Model Settings](#6-update-system-model-settings),
To enable IPEX-LLM accelerated Ollama in RAGFlow, you must also complete the configurations in RAGFlow. The steps are identical to those outlined in the *Deploy a local model using Ollama* section:


Section 7 [Update Chat Configuration](#7-update-chat-configuration)
1. [Add Ollama](#4-add-ollama)
2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
3. [Update System Model Settings](#6-update-system-model-settings)
4. [Update Chat Configuration](#7-update-chat-configuration)

+ 1
- 1
docs/guides/manage_files.md Wyświetl plik



![link multiple kb](https://github.com/infiniflow/ragflow/assets/93570324/6c508803-fb1f-435d-b688-683066fd7fff) ![link multiple kb](https://github.com/infiniflow/ragflow/assets/93570324/6c508803-fb1f-435d-b688-683066fd7fff)


## Move file to specified folder
## Move file to a specific folder


As of RAGFlow v0.8.0, this feature is *not* available. As of RAGFlow v0.8.0, this feature is *not* available.



Ładowanie…
Anuluj
Zapisz