AI Documents (PDF) Support
This module extends the AI Documents feature with PDF document support.
Features
- PDF Text Extraction: Extract text content from PDF documents
- Page-by-Page Processing: Text is extracted from each page of the PDF
Getting Started
- Enable the
AI Documents (PDF)feature in Orchard Core admin - Upload PDF files in the Documents tab of your chat interactions
- Text content will be automatically extracted and used for RAG
Technical Details
This module uses the PdfPig library for PDF text extraction. PdfPig is a fully open-source PDF library that:
- Extracts text content from PDF documents
- Does not require any external dependencies
- Works cross-platform
Limitations
- Scanned PDFs: Scanned documents that contain images of text (not actual text) will not be extracted correctly. For best results, use PDFs with actual text content.
- Complex Layouts: Some complex PDF layouts may not preserve exact text formatting.
Supported File Types
| Extension | MIME Type |
|---|---|
| application/pdf |
Note: The
AI Documentsfeature is provided on demand and is only enabled when another feature that requires it is enabled. To configure document indexing you must enable either theAI Documents - Azure AI Searchfeature (CrestApps.OrchardCore.AI.Documents.AzureAI) or theAI Documents - Elasticsearchfeature (CrestApps.OrchardCore.AI.Documents.Elasticsearch) in Orchard Core admin.