Skip to main content

AI Documents

Feature NameAI Documents
Feature IDCrestApps.OrchardCore.AI.Documents

Provides the foundation for document processing, text extraction, and Retrieval-Augmented Generation (RAG) capabilities.

Overview

This module is the foundation for all document-related functionality in the CrestApps AI suite. It provides document upload, text extraction, embedding, and RAG (Retrieval-Augmented Generation) capabilities shared by AI Chat Interactions, AI Profiles, and AI Chat Sessions / Widgets.

The base feature is enabled by dependency only — it activates automatically when either AI Documents for Chat Interactions or AI Documents for Profiles is enabled.

The base feature (CrestApps.OrchardCore.AI.Documents) provides the shared infrastructure used by both chat interaction and profile document features:

  • Unified Document Store: A single IAIDocumentStore for storing and querying documents across all reference types (chat interactions, profiles)
  • Text Extraction: Automatic text extraction from uploaded documents via registered IngestionDocumentReader implementations (from Microsoft.Extensions.DataIngestion)
  • Settings UI: Admin settings page for configuring the default document index and default document retrieval mode (Settings > Artificial Intelligence)
  • Document Processing Tools: AI tools for listing, reading, and searching documents
  • RAG Search Tool: Semantic vector search across uploaded documents
  • Strategy-Based Processing: Adds document-focused prompt-processing strategies
  • Index & Migrations: Shared AIDocumentIndex with ReferenceId and ReferenceType columns for multi-purpose document storage

Sub-Features

FeatureIDDescription
AI Documents for Chat InteractionsCrestApps.OrchardCore.AI.Documents.ChatInteractionsProvides document upload and Retrieval-Augmented Generation (RAG) support for AI Chat Interactions.
AI Documents for ProfilesCrestApps.OrchardCore.AI.Documents.ProfilesProvides document upload and Retrieval-Augmented Generation (RAG) support for AI Profiles.
AI Documents for Chat SessionsCrestApps.OrchardCore.AI.Documents.ChatSessionsProvides document upload and RAG support for AI Chat Sessions and AI Chat Widgets.
AI Documents - Azure Blob StorageCrestApps.OrchardCore.AI.Documents.AzureStores uploaded AI documents in Azure Blob Storage instead of the tenant web root. See AI Documents - Azure Blob Storage.

Document uploads and storage

All AI document uploads use the same shared document-storage pipeline. That means uploads from chat interactions, profile documents, chat sessions, and chat widgets are all stored through the same IDocumentFileStore.

By default, uploaded documents are stored on the local file system in a tenant-scoped folder under the web root:

wwwroot\<tenant-name>\AIDocuments

This tenant-specific path is enforced by the AI Documents module so local uploads stay isolated per tenant.

If you want uploaded files stored in Azure Blob Storage instead of the local file system, enable the optional AI Documents - Azure Blob Storage feature and configure OrchardCore:CrestApps:AI:AzureDocuments.

AI Documents for Chat Interactions

Feature NameAI Documents for Chat Interactions
Feature IDCrestApps.OrchardCore.AI.Documents.ChatInteractions

Provides document upload and Retrieval-Augmented Generation (RAG) support for AI Chat Interactions.

When enabled, a Documents tab appears in the chat interaction UI, allowing users to upload documents and chat against their own data.

Documents uploaded to a chat interaction are scoped to that session.

Key Capabilities

  • Document Upload: Upload documents via drag-and-drop or file browser
  • Text Extraction: Content is automatically extracted from uploaded documents
  • Chunking & Embedding: Text is split into chunks and embedded for semantic vector search
  • RAG Integration: Relevant document chunks are retrieved and used as context for AI responses
  • Document Management: View, manage, and remove uploaded documents within a chat session

Document Processing

When documents are attached to a chat interaction, the orchestrator manages document context automatically. It coordinates text extraction, chunking, embedding, and retrieval to provide relevant document content to the AI model.

The orchestrator supports various document-related operations:

  • Question Answering (RAG) — Uses vector search to find relevant document chunks for answering questions
  • Summarization — Provides full document content for summarization requests
  • Tabular Analysis — Parses structured data (CSV, Excel) for calculations and analysis
  • Data Extraction — Extracts structured information from documents
  • Document Comparison — Provides multi-document content for comparison
  • Content Transformation — Provides content for reformatting or conversion
  • General Reference — Provides context when asking general questions that reference documents

Getting Started

  1. Set up an indexing provider: Enable Elasticsearch or Azure AI Search in the Orchard Core admin.
  2. Create an index: Navigate to Search > Indexing and create a new index (e.g., "AI Documents").
  3. Configure settings: Navigate to Settings > Artificial Intelligence and select your new index and default document retrieval mode. After the index is configured in production, avoid changing it to prevent losing access to documents in existing sessions.
  4. Enable the feature: Enable AI Chat Interaction Documents in the admin dashboard.
  5. Start using the Documents tab in your chat interactions.

AI Documents for Profiles

Feature NameAI Documents for Profiles
Feature IDCrestApps.OrchardCore.AI.Documents.Profiles

Provides document upload and Retrieval-Augmented Generation (RAG) support for AI Profiles.

When enabled, a Documents tab appears on the AI Profile editor, allowing administrators to attach text-based documents that will be chunked, embedded, and used as context across all chat sessions using that profile.

Unlike chat interaction documents (which are scoped to a single session), profile documents persist across all sessions using the profile. Profile documents are treated as background knowledge. End users should not be told that the profile has attached documents unless they explicitly upload documents in the current session.

Key Capabilities

  • Document Upload: Upload text-based documents (PDF, Word, Markdown, etc.) directly to an AI Profile
  • Automatic Text Extraction: Content is extracted from uploaded documents using registered IngestionDocumentReader implementations
  • Chunking & Embedding: Extracted text is split into chunks and embedded for semantic vector search
  • RAG Integration: Relevant document chunks are automatically retrieved and used as context for AI responses
  • Top N Configuration: Control how many matching chunks are included as context (default: 3)
  • Retrieval Mode Override: Choose Chunk or Hierarchical retrieval per AI Profile or profile-source AI Template, or leave it blank to use the site default

Supported File Types

Only embeddable file extensions are supported for AI Profile documents. The set of embeddable extensions is determined by the registered IngestionDocumentReader implementations. Typically, this includes:

FormatExtensionModule Required
Text.txtBuilt-in
Markdown.mdBuilt-in
JSON.jsonBuilt-in
XML.xmlBuilt-in
HTML.html, .htmBuilt-in
YAML.yml, .yamlBuilt-in
Log.logBuilt-in
PDF.pdfCrestApps.OrchardCore.AI.Documents.Pdf
Word.docxCrestApps.OrchardCore.AI.Documents.OpenXml
PowerPoint.pptxCrestApps.OrchardCore.AI.Documents.OpenXml

:::note Note Tabular file types (.csv, .tsv, .xlsx, .xls) are registered as non-embeddable and are not available for AI Profile document upload, since they are intended for tabular data analysis rather than text-based retrieval-augmented generation (RAG). :::

How It Works

Documents are managed directly through the AI Profile editor form. When you save a profile:

  1. New files selected in the Documents tab are uploaded, text-extracted, chunked, embedded, and stored
  2. Removed documents marked for deletion are removed from the store
  3. All changes are applied atomically when the profile is saved

There are no separate API endpoints for profile document management — everything is handled through the standard profile editor workflow.

Legacy profile-document rows stored under older CrestApps.OrchardCore.AI.* or CrestApps.AI.* YesSql type names are normalized to the current CrestApps.Core.AI.* document types when the AI Documents feature runs its migrations. This keeps previously uploaded profile documents removable and queryable after upgrading to the shared CrestApps.Core packages.

Getting Started

  1. Enable the AI Documents for Profiles feature in the Orchard Core admin dashboard.
  2. Navigate to Artificial Intelligence > AI Profiles and edit a profile.
  3. Use the Documents tab to upload text-based documents.
  4. Configure Document Top N and, when needed, Document retrieval mode to control how much document context is injected and whether the response uses chunk-level or hierarchical retrieval.

AI Documents for Chat Sessions

Feature NameAI Documents for Chat Sessions
Feature IDCrestApps.OrchardCore.AI.Documents.ChatSessions

Provides document upload and Retrieval-Augmented Generation (RAG) support directly within AI Chat Sessions and AI Chat Widgets (both admin and frontend).

When enabled, users can attach documents to any chat session via drag-and-drop or file browser. Documents are indexed using the same shared infrastructure (text extraction, chunking, embedding, and vector search) used by Chat Interactions and Profiles.

Unlike profile documents (which persist across all sessions), chat session documents are scoped to the individual session — similar to chat interaction documents.

Key Capabilities

  • Document Upload: Drag-and-drop or browse to attach files directly in the chat input area
  • Visual Attach Button: A persistent "Attach files" button appears above the chat input when enabled
  • Document Pills: Attached documents are shown as compact pill badges with remove (X) buttons
  • Drag-and-Drop Highlight: The input area highlights when files are dragged over it
  • Text Extraction & Embedding: Uploaded documents are automatically extracted, chunked, and embedded for vector search
  • RAG Integration: Relevant chunks are retrieved and used as context for AI responses
  • Per-Profile Opt-In: Each AI Profile has separate Allow session document uploads and Allow session image uploads options to control which file types are available in chat sessions

Per-Profile Opt-In

Because document processing is resource-intensive, document upload is not enabled by default even when the feature is active. Administrators must explicitly opt in for each AI Profile:

  1. Navigate to Artificial Intelligence > AI Profiles and edit a profile.
  2. In the Documents section, enable Allow session document uploads, Allow session image uploads, or both, depending on the experience you want to allow.
  3. Save the profile.

For AI Chat Widget content items, the same checkbox appears on the widget editor under the AI profile part settings.

Supported UIs

UIWhereNotes
AI Chat SessionAdmin > Artificial Intelligence > AI ChatFull session page
AI Chat Admin WidgetFloating admin widgetCompact chat widget on admin pages
AI Chat WidgetFrontend content widgetPublic-facing chat widget

Getting Started

  1. Set up an indexing provider: Enable Elasticsearch or Azure AI Search in the Orchard Core admin.
  2. Create an index: Navigate to Search > Indexing and create a new index (e.g., "AI Documents").
  3. Configure settings: Navigate to Settings > Artificial Intelligence and select your new index and default document retrieval mode. Use Allow document uploads and Allow image uploads to control which file types chat interactions can accept. After the index is configured in production, avoid changing it to prevent losing access to documents in existing sessions.
  4. Enable the feature: Enable AI Documents for Chat Sessions in the admin dashboard.
  5. Opt in per profile: Edit the desired AI Profile and enable Allow session document uploads, Allow session image uploads, or both.
  6. Open a chat session. The attach button and drag-and-drop zone are available when at least one upload type is enabled for the profile. Image uploads also require a configured vision deployment.

Supported Document Formats

FormatExtensionNotes
PDF.pdfRequires CrestApps.OrchardCore.AI.Documents.Pdf feature
Word.docxRequires CrestApps.OrchardCore.AI.Documents.OpenXml feature
Excel.xlsxRequires CrestApps.OrchardCore.AI.Documents.OpenXml feature
PowerPoint.pptxRequires CrestApps.OrchardCore.AI.Documents.OpenXml feature
Text.txtBuilt-in support
CSV.csvBuilt-in support
Markdown.mdBuilt-in support
JSON.jsonBuilt-in support
XML.xmlBuilt-in support
HTML.html, .htmBuilt-in support
YAML.yml, .yamlBuilt-in support

Note: Binary Office formats (.doc, .xls, .ppt) are not supported. Convert them to .docx, .xlsx, or .pptx before upload.

Configuration

Documents Tab Settings

SettingDescriptionDefault
Top N ResultsNumber of top matching document chunks to include as context3
Allow document uploadsEnables document uploads for chat interactionsTrue
Allow image uploadsEnables image uploads for chat interactions when a vision deployment is configuredFalse

File storage providers

ProviderDefaultNotes
Local file systemYesStores uploads under wwwroot\<tenant-name>\AIDocuments.
Azure Blob StorageOptionalEnable CrestApps.OrchardCore.AI.Documents.Azure to replace the default store.

For Azure Blob Storage setup and configuration details, see AI Documents - Azure Blob Storage.

Document Lifecycle & Cleanup

When a chat interaction, chat session, or AI profile is deleted, all associated documents are automatically cleaned up:

ScopeWhat happens on deletion
Chat InteractionDocument chunks are removed from all AI document indexes. AIDocument records are deleted from the document store.
Chat SessionAll session documents are deleted from the document store. Document chunks are removed from all AI document indexes via a deferred task.
AI ProfileDocuments are managed via the profile editor — removing a document triggers index chunk cleanup and store deletion on save.

This ensures the AI document indexes stay free of orphaned entries when their parent resources are removed.

Troubleshooting

"Index Not Configured" Warning

If you see this warning, navigate to Settings > Artificial Intelligence and select an index profile. If no index profiles are available, go to Search > Indexing, add an AI Documents index, and enable one of the AI Documents indexing features if the AI Documents index type is not listed.

"Embedding Search Service Not Available" Warning

This means the configured index profile doesn't have a registered embedding/search service. Supported providers include Elasticsearch and Azure AI Search. Make sure:

  1. The corresponding feature is enabled (Elasticsearch or Azure AI Search)
  2. Your index is configured to use a supported provider