ragflow

History

zhudongwork 10432a1be7 Refa: Optimize pptx shape extraction to reduce content loss (#6703) ### What problem does this PR solve? When parsing pptx files, some shapes do not contain the `shape_type` attribute, which causes the original code to throw an exception during extraction, leading to failure in content extraction. This optimization introduces handling logic for such anomalous shapes, providing a safer and more robust processing mechanism. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [x] Performance Improvement - [ ] Other (please describe):		6 months ago
..
resume	Fix:when start with source code not in docker env report 'UnicodeDec… (#5802)	7 months ago
__init__.py	Update comments (#4569)	9 months ago
docx_parser.py	Update comments (#4569)	9 months ago
excel_parser.py	Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type (#6613)	7 months ago
figure_parser.py	Feat: add VLM-boosted DocX parser (#6307)	7 months ago
html_parser.py	Update comments (#4569)	9 months ago
json_parser.py	Update comments (#4569)	9 months ago
markdown_parser.py	Feat：Optimize the table extraction logic in the Markdown parser: (#5663)	7 months ago
pdf_parser.py	fix RAGFlowPdfParser AttributeError: 'PdfReader' object has no attribute 'close' err (#6859)	6 months ago
ppt_parser.py	Refa: Optimize pptx shape extraction to reduce content loss (#6703)	6 months ago
txt_parser.py	Fix: delimiter issue. (#5720)	8 months ago
utils.py	Update comments (#4569)	9 months ago