Skip to content

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316

Open
SignalRT wants to merge 7 commits intoSciSharp:masterfrom
SignalRT:WebReview
Open

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316
SignalRT wants to merge 7 commits intoSciSharp:masterfrom
SignalRT:WebReview

Conversation

@SignalRT
Copy link
Collaborator

@SignalRT SignalRT commented Jan 19, 2026

Summary:
This PR delivers a full multimodal chat pipeline in LLama.Web: PDF and Word document ingestion with text extraction, image and audio uploads, native in‑browser audio recording (preview/attach/discard), plus streaming response
rendering with Markdown support.

Key Features:

  • Streaming chat responses rendered incrementally.
  • Markdown rendering in the UI (including code blocks, lists, etc.).
  • Multimodal inference pipeline with MTMD support wired into session execution.
  • PDF ingestion with text extraction and truncation safeguards.
  • Word (DOCX) ingestion with text extraction from document XML.
  • Image uploads supported end‑to‑end (validation, storage, rendering in chat).
  • Audio uploads supported end‑to‑end (validation, storage, playback in chat).
  • In‑browser audio recording (MediaRecorder) with preview + attach/discard workflow.
  • Capability‑aware UI (shows whether text/vision/audio are supported per model).
  • Download models automatically and shows the progress

Implementation Highlights

  • Attachment service handles file validation, storage, and extraction (PDF/DOCX).
  • Model session builds prompts with attached media and enforces capability checks.
  • Chat UI renders images/audio and guides users on supported inputs.
  • Captures audio and converts it to a browser file for existing upload flow.
  • Streaming tokens update the UI while Markdown is rendered on the fly.

Capability to upload images and ask about the images

image

Model auto-download + Capability to upload files and ask about the files
image

- Reworked MTMD prompt handling to preserve text/media ordering and evaluate multimodal input incrementally.
- Disabled unsupported multimodal features such as session persistence and context shifting.
- Added standalone MTMD media loading and synchronized MTMD weight operations.
- Updated MTMD example and tests to cover prompt ordering, guards, and opt-in NoCI execution.
- Fixed web model/session defaults for multimodal models, including template-derived stop markers and unspecified pooling.
- Improved LLama.Web audio attachment/recording flow, Qwen audio prompt handling, and chat composer UX.
- Removed the broken browser script include and added a safe markdown fallback.
Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure
Stop and load the model on change
Solve issue with the ENTER
@SignalRT SignalRT marked this pull request as ready for review March 20, 2026 23:18
@martindevans
Copy link
Member

martindevans commented Mar 20, 2026

One thing that I'm not sure about is the media queue in the SafeMtmdModelHandle. Why is it an implicit queue instead of an explicit parameter passed into the tokenize call?

Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper.

@martindevans
Copy link
Member

Other than that one comment, looks good to me!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes LLamaSharp’s multimodal support by migrating from LLava to MTMD, and substantially upgrades LLama.Web to support end-to-end multimodal chat (attachments, uploads, streaming markdown rendering) plus automatic model downloads with progress reporting.

Changes:

  • Replace LLava types/APIs/docs with MTMD equivalents (MtmdWeights, SafeMtmd* handles, executor multimodal plumbing).
  • Add LLama.Web pipeline: attachment upload + extraction (PDF/DOCX), media embeddings (image/audio), streaming UI rendering with markdown/mermaid, and capability-aware behavior.
  • Add model auto-download service with SignalR progress updates and corresponding UI/status wiring.

Reviewed changes

Copilot reviewed 77 out of 78 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
mkdocs.yml Updates documentation navigation to MTMD docs (removes LLava entries).
docs/xmldocs/llama.statelessexecutor.md Docs update for MTMD properties (ClipModel, Embeds).
docs/xmldocs/llama.native.safemtmdmodelhandle.md New generated docs for MTMD safe handle API.
docs/xmldocs/llama.native.safemtmdinputchunks.md New generated docs for MTMD input chunks wrapper.
docs/xmldocs/llama.native.safemtmdinputchunk.md New generated docs for MTMD input chunk wrapper.
docs/xmldocs/llama.native.safemtmdembed.md New generated docs for MTMD embed wrapper.
docs/xmldocs/llama.native.nativelibraryconfigcontainer.md Docs: rename LLava params to MTMD, fix AVX wording, update DryRun signature docs.
docs/xmldocs/llama.native.mtmdcontextparams.md New generated docs for MTMD context params.
docs/xmldocs/llama.mtmdweights.md New generated docs for MtmdWeights.
docs/xmldocs/llama.interactiveexecutor.md Docs update: MTMD fields, cancellation tokens, antiprompt processor, state limitations, embeds.
docs/xmldocs/llama.instructexecutor.md Docs update mirroring interactive executor changes for MTMD + cancellation tokens.
docs/xmldocs/llama.batched.conversation.md Docs update: add MTMD prompt overloads and remove LLava image embed overload.
docs/xmldocs/llama.batched.batchedexecutor.md Docs update: add MTMD clip model support.
docs/xmldocs/llama.abstractions.illamaexecutor.md Docs update: ClipModel/Embeds now MTMD types.
docs/xmldocs/index.md Docs index updated for MTMD types and removes LLava references.
docs/Tutorials/NativeLibraryConfig.md Tutorial updated for MTMD library configuration.
docs/Tutorials/Executors.md Tutorial updated for MTMD fields + state persistence limitations for multimodal executors.
docs/QuickStart.md QuickStart updated with MTMD example and embed loading flow.
docs/Examples/MtmdInteractiveModeExecute.md Example docs updated from SafeMtmdWeights/single-brace paths to MtmdWeights/double-brace paths.
LLama/Native/SafeMtmdModelHandle.cs Adds standalone embed creation APIs and refactors load methods to use them.
LLama/Native/Load/NativeLibraryConfig.cs Fixes DryRun out params initialization/behavior and documents outputs.
LLama/MtmdWeights.cs Adds locking and new standalone media load APIs; wraps tokenize/eval calls for thread safety.
LLama/LLamaInteractExecutor.cs MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes.
LLama/LLamaInstructExecutor.cs MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes.
LLama/ChatSession.cs Blocks session persistence APIs for multimodal sessions, refactors stateful executor access.
LLama/AntipromptProcessor.cs Uses StringComparison.Ordinal for antiprompt matching.
LLama.Web/wwwroot/js/sessionConnectionChat.js Adds attachment uploads, download status UI, and streaming markdown rendering.
LLama.Web/libman.json Adds offline web libs for markdown rendering (markdown-it plugins, katex, mermaid).
LLama.Web/appsettings.json Updates model list to downloadable models and adds mmproj paths/URLs + new defaults.
LLama.Web/_Imports.razor New shared imports for Blazor components/services.
LLama.Web/Shared/MainLayout.razor Adds Blazor main layout wrapper.
LLama.Web/Services/ModelSessionService.cs Adds attachment-aware prompt preparation + embeds, capabilities API, history handling.
LLama.Web/Services/ModelService.cs Integrates model download readiness checks and normalizes UBatchSize/BatchSize.
LLama.Web/Services/ModelLoaderService.cs Starts model downloads at startup and loads models after downloads complete.
LLama.Web/Services/ModelDownloadService.cs New background download service with SignalR progress + local storage management.
LLama.Web/Services/IModelSessionService.cs Updates Infer API to PromptRequest and adds capabilities method.
LLama.Web/Services/IModelService.cs Documentation/wording cleanups.
LLama.Web/Services/IModelDownloadService.cs New interface for model download management.
LLama.Web/Services/IAttachmentService.cs New interface for attachment storage/extraction lifecycle.
LLama.Web/Services/AttachmentService.cs New attachment pipeline: validation, storage, PDF/DOCX extraction, cleanup.
LLama.Web/README.md Documents local asset storage, LibMan restore, and attachment/model download locations.
LLama.Web/Program.cs Adds Blazor + controllers, registers new services, maps endpoints, logs storage paths.
LLama.Web/Pages/_Host.cshtml Adds Blazor server host page.
LLama.Web/Pages/Shared/_Parameters.cshtml Updates parameter binding to sampling pipeline fields.
LLama.Web/Pages/Shared/_Layout.cshtml Updates layout to load offline markdown/diagram libs and Blazor runtime.
LLama.Web/Pages/Shared/_ChatTemplates.cshtml Templates updated for markdown styling + attachment display.
LLama.Web/Pages/Index.cshtml.cs Removed legacy Razor Pages index model.
LLama.Web/Pages/Index.cshtml Removed legacy Razor Pages chat UI.
LLama.Web/Models/StorageInfo.cs New model for storage path UI info.
LLama.Web/Models/PromptRequest.cs New prompt request model including attachment IDs.
LLama.Web/Models/ModelSession.cs Major session refactor: template-based prompts, history, multimodal capability exposure, logging.
LLama.Web/Models/ModelDownloadStatus.cs New download snapshot/progress models and enums.
LLama.Web/Models/ModelCapabilities.cs New model capability DTO.
LLama.Web/Models/MemoryBrowserFile.cs In-memory IBrowserFile implementation.
LLama.Web/Models/LLamaModel.cs Loads MTMD mmproj weights when configured and disposes them.
LLama.Web/Models/AttachmentInfo.cs New attachment metadata + upload result models.
LLama.Web/LLama.Web.csproj Adds LibMan build integration and PdfPig dependency.
LLama.Web/Hubs/SessionConnectionHub.cs Adds download snapshot + storage info broadcasts; prompt now accepts PromptRequest; cleans up attachments on disconnect.
LLama.Web/Hubs/ISessionClient.cs Adds SignalR client methods for download progress/snapshots and storage info.
LLama.Web/Extensions.cs Comment/formatting cleanups for CSV/list helpers.
LLama.Web/Controllers/AttachmentController.cs New attachments API endpoints for upload + download.
LLama.Web/Common/ModelOptions.cs Adds model/mmproj download URL fields and default pooling type.
LLama.Web/Common/ModelLoadType.cs Comment cleanup.
LLama.Web/Async/AsyncLock.cs Comment cleanup.
LLama.Web/Async/AsyncGuard.cs Comment cleanup.
LLama.Web/App.razor New Blazor router app shell.
LLama.Unittest/NativeLibraryConfigContainerTests.cs Adds unit test to ensure DryRun preserves loaded library outputs.
LLama.Unittest/MtmdWeightsTests.cs Refactors MTMD tests to use fixture/collection and context-per-test.
LLama.Unittest/MtmdNoCiCollection.cs Adds shared MTMD fixture and disables parallelization for these tests.
LLama.Unittest/MtmdExecutorTests.cs Refactors and adds MTMD executor behavior tests (prompt ordering, chunk handling).
LLama.Unittest/MtmdContextGuardTests.cs Adds MTMD context guard + “no state/session persistence” behavior tests.
LLama.Examples/Examples/MtmdInteractiveModeExecute.cs Updates sample for MTMD standalone embed loads and template marker antiprompt handling.
.gitignore Ignores LLama.Web offline libs and downloaded models directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

[HttpPost]
[RequestSizeLimit(256_000_000)]
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[RequestSizeLimit(256_000_000)] caps uploads to ~256MB, but AttachmentService uses a MaxUploadSize of 512MB for browser uploads. This mismatch can lead to confusing failures (client thinks 512MB is allowed, server rejects at 256MB). Align these limits (and document the effective max).

Suggested change
[RequestSizeLimit(256_000_000)]
// Align request size limit with AttachmentService.MaxUploadSize (512 MB).
[RequestSizeLimit(512_000_000)]

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +29
<script src="~/lib/katex/dist/katex.min.js"></script>
<script src="~/lib/markdown-it/dist/markdown-it.min.js"></script>
<script src="~/lib/markdown-it-task-lists/dist/markdown-it-task-lists.min.js"></script>
<script src="~/lib/markdown-it-footnote/dist/markdown-it-footnote.min.js"></script>
<script src="~/lib/markdown-it-deflist/dist/markdown-it-deflist.min.js"></script>
<script src="~/lib/markdown-it-sub/dist/markdown-it-sub.min.js"></script>
<script src="~/lib/markdown-it-sup/dist/markdown-it-sup.min.js"></script>
<script src="~/lib/markdown-it-mark/dist/markdown-it-mark.min.js"></script>
<script src="~/lib/markdown-it-emoji/dist/markdown-it-emoji.min.js"></script>
<script src="~/lib/mermaid/dist/mermaid.min.js"></script>
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libman.json adds markdown-it-katex, and the JS renderer conditionally calls md.use(window.markdownitKatex), but _Layout.cshtml doesn’t load the markdown-it-katex script. As a result, KaTeX/LaTeX rendering will never activate. Either include the plugin script here or remove the unused dependency/conditional logic.

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +42
var result = new AttachmentUploadResult();
var storage = _attachments.GetOrAdd(connectionId, _ => new ConcurrentDictionary<string, AttachmentInfo>());
var root = Path.Combine(_uploadsRoot, connectionId);
Directory.CreateDirectory(root);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectionId is used directly to build filesystem paths (e.g., Path.Combine(_uploadsRoot, connectionId)), but the controller accepts connectionId from the client. This allows path traversal / writing outside the uploads root if a malicious value contains path separators or ... Sanitize/validate connectionId (e.g., restrict to a safe character set and reject path separators) and/or resolve the combined path and verify it stays under _uploadsRoot before creating directories or writing files.

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +190
if (_attachments.TryRemove(connectionId, out _))
{
var root = Path.Combine(_uploadsRoot, connectionId);
if (Directory.Exists(root))
Directory.Delete(root, recursive: true);
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanupAsync deletes Path.Combine(_uploadsRoot, connectionId) recursively. Since connectionId is client-controlled, this can be abused to delete arbitrary directories if path traversal is possible. After validating connectionId (as noted earlier), also ensure the computed directory is under _uploadsRoot before deleting.

Copilot uses AI. Check for mistakes.
Comment on lines +263 to +275
private static void ValidateUploads(IEnumerable<IFormFile> files)
{
var invalid = files
.Where(file => file != null)
.Where(file => !IsAllowedUpload(file.ContentType?.ToLowerInvariant() ?? string.Empty, Path.GetExtension(file.FileName).ToLowerInvariant()))
.Select(file => file.FileName)
.ToList();

if (invalid.Count == 0)
return;

throw new InvalidOperationException($"Unsupported files: {string.Join(", ", invalid)}. Use PDF, DOCX, or images.");
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload validation error message says "Use PDF, DOCX, or images." but audio files are also allowed by IsAllowedUpload (audio/* and common audio extensions). Update the message so it matches the actual accepted file types (and consider listing audio explicitly).

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +56
foreach (var file in files)
{
if (file == null || file.Length == 0)
continue;

var id = Guid.NewGuid().ToString("N");
var safeName = Path.GetFileName(file.FileName);
var filePath = Path.Combine(root, $"{id}-{safeName}");

await using (var stream = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.None, 81920, useAsync: true))
{
await file.CopyToAsync(stream, cancellationToken);
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxUploadSize is enforced for IBrowserFile uploads via OpenReadStream(maxAllowedSize: MaxUploadSize), but IFormFile uploads are not size-limited (beyond whatever server limits apply). To avoid unexpected large uploads/DoS, enforce file.Length <= MaxUploadSize for IFormFile as well (either in ValidateUploads or inside the foreach).

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +31
[HttpPost]
[RequestSizeLimit(256_000_000)]
public async Task<ActionResult<AttachmentUploadResult>> Upload([FromForm] string connectionId, [FromForm] List<IFormFile> files, CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(connectionId))
return BadRequest("Missing connectionId.");

if (files is null || files.Count == 0)
return BadRequest("No files provided.");

try
{
var result = await _attachmentService.SaveAsync(connectionId, files, cancellationToken);
return Ok(result);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attachments API trusts a client-supplied connectionId to decide where files are stored. Without validating that this connectionId actually belongs to the caller (or is even well-formed), a client can upload into another session’s namespace. Consider deriving the session identifier server-side (e.g., from auth/session state) or issuing a per-connection upload token, and at minimum validate/sanitize the connectionId value before passing it to the attachment service.

Copilot uses AI. Check for mistakes.
@SignalRT
Copy link
Collaborator Author

One thing that I'm not sure about is the media queue in the SafeMtmdModelHandle. Why is it an implicit queue instead of an explicit parameter passed into the tokenize call?

Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper.

That's a convenience API.

So my preference would be:

  1. Keep explicit media passing as the primary API.
  2. Treat the implicit queue as optional convenience only.
  3. Move that convenience up out of SafeMtmdModelHandle if we keep it at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants