sidebar_position: 1
Select a visual model for parsing your PDFs.
RAGFlow isn’t one-size-fits-all. It is built for flexibility and supports deeper customization to accommodate more complex use cases. From v0.17.0 onwards, RAGFlow decouples DeepDoc-specific data extraction tasks from chunking methods for PDF files. This separation enables you to autonomously select a visual model for OCR (Optical Character Recognition), TSR (Table Structure Recognition), and DLR (Document Layout Recognition) tasks that balances speed and performance to suit your specific use cases. If your PDFs contain only plain text, you can opt to skip these tasks by selecting the Naive option, to reduce the overall parsing time.
The PDF parser dropdown menu appears.
Select the option that works best with your scenario:
:::caution WARNING Third-party visual models are marked Experimental, because we have not fully tested these models for the aforementioned data extraction tasks. :::
Use a visual model to extract data if your PDFs contain formatted or image-based text rather than plain text. DeepDoc is the default visual model but can be time-consuming. You can also choose a lightweight or high-performance img2txt model depending on your needs and hardware capabilities.
No, you cannot. This dropdown menu is for PDFs only. To use this feature, convert your DOCX files to PDF first.