End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering by SignalRT · Pull Request #1316 · SciSharp/LLamaSharp

SignalRT · 2026-01-19T22:48:14Z

Summary:
This PR delivers a full multimodal chat pipeline in LLama.Web: PDF and Word document ingestion with text extraction, image and audio uploads, native in‑browser audio recording (preview/attach/discard), plus streaming response
rendering with Markdown support.

Key Features:

Streaming chat responses rendered incrementally.
Markdown rendering in the UI (including code blocks, lists, etc.).
Multimodal inference pipeline with MTMD support wired into session execution.
PDF ingestion with text extraction and truncation safeguards.
Word (DOCX) ingestion with text extraction from document XML.
Image uploads supported end‑to‑end (validation, storage, rendering in chat).
Audio uploads supported end‑to‑end (validation, storage, playback in chat).
In‑browser audio recording (MediaRecorder) with preview + attach/discard workflow.
Capability‑aware UI (shows whether text/vision/audio are supported per model).
Download models automatically and shows the progress

Implementation Highlights

Attachment service handles file validation, storage, and extraction (PDF/DOCX).
Model session builds prompts with attached media and enforces capability checks.
Chat UI renders images/audio and guides users on supported inputs.
Captures audio and converts it to a browser file for existing upload flow.
Streaming tokens update the UI while Markdown is rendered on the fly.

Capability to upload images and ask about the images

Model auto-download + Capability to upload files and ask about the files

Initial version

- Reworked MTMD prompt handling to preserve text/media ordering and evaluate multimodal input incrementally. - Disabled unsupported multimodal features such as session persistence and context shifting. - Added standalone MTMD media loading and synchronized MTMD weight operations. - Updated MTMD example and tests to cover prompt ordering, guards, and opt-in NoCI execution. - Fixed web model/session defaults for multimodal models, including template-derived stop markers and unspecified pooling. - Improved LLama.Web audio attachment/recording flow, Qwen audio prompt handling, and chat composer UX. - Removed the broken browser script include and added a safe markdown fallback.

Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure

Stop and load the model on change Solve issue with the ENTER

martindevans · 2026-03-20T23:36:31Z

One thing that I'm not sure about is the media queue in the SafeMtmdModelHandle. Why is it an implicit queue instead of an explicit parameter passed into the tokenize call?

Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper.

martindevans · 2026-03-20T23:38:41Z

Other than that one comment, looks good to me!

Copilot

Pull request overview

This PR modernizes LLamaSharp’s multimodal support by migrating from LLava to MTMD, and substantially upgrades LLama.Web to support end-to-end multimodal chat (attachments, uploads, streaming markdown rendering) plus automatic model downloads with progress reporting.

Changes:

Replace LLava types/APIs/docs with MTMD equivalents (MtmdWeights, SafeMtmd* handles, executor multimodal plumbing).
Add LLama.Web pipeline: attachment upload + extraction (PDF/DOCX), media embeddings (image/audio), streaming UI rendering with markdown/mermaid, and capability-aware behavior.
Add model auto-download service with SignalR progress updates and corresponding UI/status wiring.

Reviewed changes

Copilot reviewed 77 out of 78 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
mkdocs.yml	Updates documentation navigation to MTMD docs (removes LLava entries).
docs/xmldocs/llama.statelessexecutor.md	Docs update for MTMD properties (`ClipModel`, `Embeds`).
docs/xmldocs/llama.native.safemtmdmodelhandle.md	New generated docs for MTMD safe handle API.
docs/xmldocs/llama.native.safemtmdinputchunks.md	New generated docs for MTMD input chunks wrapper.
docs/xmldocs/llama.native.safemtmdinputchunk.md	New generated docs for MTMD input chunk wrapper.
docs/xmldocs/llama.native.safemtmdembed.md	New generated docs for MTMD embed wrapper.
docs/xmldocs/llama.native.nativelibraryconfigcontainer.md	Docs: rename LLava params to MTMD, fix AVX wording, update DryRun signature docs.
docs/xmldocs/llama.native.mtmdcontextparams.md	New generated docs for MTMD context params.
docs/xmldocs/llama.mtmdweights.md	New generated docs for `MtmdWeights`.
docs/xmldocs/llama.interactiveexecutor.md	Docs update: MTMD fields, cancellation tokens, antiprompt processor, state limitations, embeds.
docs/xmldocs/llama.instructexecutor.md	Docs update mirroring interactive executor changes for MTMD + cancellation tokens.
docs/xmldocs/llama.batched.conversation.md	Docs update: add MTMD prompt overloads and remove LLava image embed overload.
docs/xmldocs/llama.batched.batchedexecutor.md	Docs update: add MTMD clip model support.
docs/xmldocs/llama.abstractions.illamaexecutor.md	Docs update: `ClipModel`/`Embeds` now MTMD types.
docs/xmldocs/index.md	Docs index updated for MTMD types and removes LLava references.
docs/Tutorials/NativeLibraryConfig.md	Tutorial updated for MTMD library configuration.
docs/Tutorials/Executors.md	Tutorial updated for MTMD fields + state persistence limitations for multimodal executors.
docs/QuickStart.md	QuickStart updated with MTMD example and embed loading flow.
docs/Examples/MtmdInteractiveModeExecute.md	Example docs updated from `SafeMtmdWeights`/single-brace paths to `MtmdWeights`/double-brace paths.
LLama/Native/SafeMtmdModelHandle.cs	Adds standalone embed creation APIs and refactors load methods to use them.
LLama/Native/Load/NativeLibraryConfig.cs	Fixes `DryRun` out params initialization/behavior and documents outputs.
LLama/MtmdWeights.cs	Adds locking and new standalone media load APIs; wraps tokenize/eval calls for thread safety.
LLama/LLamaInteractExecutor.cs	MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes.
LLama/LLamaInstructExecutor.cs	MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes.
LLama/ChatSession.cs	Blocks session persistence APIs for multimodal sessions, refactors stateful executor access.
LLama/AntipromptProcessor.cs	Uses `StringComparison.Ordinal` for antiprompt matching.
LLama.Web/wwwroot/js/sessionConnectionChat.js	Adds attachment uploads, download status UI, and streaming markdown rendering.
LLama.Web/libman.json	Adds offline web libs for markdown rendering (markdown-it plugins, katex, mermaid).
LLama.Web/appsettings.json	Updates model list to downloadable models and adds mmproj paths/URLs + new defaults.
LLama.Web/_Imports.razor	New shared imports for Blazor components/services.
LLama.Web/Shared/MainLayout.razor	Adds Blazor main layout wrapper.
LLama.Web/Services/ModelSessionService.cs	Adds attachment-aware prompt preparation + embeds, capabilities API, history handling.
LLama.Web/Services/ModelService.cs	Integrates model download readiness checks and normalizes UBatchSize/BatchSize.
LLama.Web/Services/ModelLoaderService.cs	Starts model downloads at startup and loads models after downloads complete.
LLama.Web/Services/ModelDownloadService.cs	New background download service with SignalR progress + local storage management.
LLama.Web/Services/IModelSessionService.cs	Updates Infer API to `PromptRequest` and adds capabilities method.
LLama.Web/Services/IModelService.cs	Documentation/wording cleanups.
LLama.Web/Services/IModelDownloadService.cs	New interface for model download management.
LLama.Web/Services/IAttachmentService.cs	New interface for attachment storage/extraction lifecycle.
LLama.Web/Services/AttachmentService.cs	New attachment pipeline: validation, storage, PDF/DOCX extraction, cleanup.
LLama.Web/README.md	Documents local asset storage, LibMan restore, and attachment/model download locations.
LLama.Web/Program.cs	Adds Blazor + controllers, registers new services, maps endpoints, logs storage paths.
LLama.Web/Pages/_Host.cshtml	Adds Blazor server host page.
LLama.Web/Pages/Shared/_Parameters.cshtml	Updates parameter binding to sampling pipeline fields.
LLama.Web/Pages/Shared/_Layout.cshtml	Updates layout to load offline markdown/diagram libs and Blazor runtime.
LLama.Web/Pages/Shared/_ChatTemplates.cshtml	Templates updated for markdown styling + attachment display.
LLama.Web/Pages/Index.cshtml.cs	Removed legacy Razor Pages index model.
LLama.Web/Pages/Index.cshtml	Removed legacy Razor Pages chat UI.
LLama.Web/Models/StorageInfo.cs	New model for storage path UI info.
LLama.Web/Models/PromptRequest.cs	New prompt request model including attachment IDs.
LLama.Web/Models/ModelSession.cs	Major session refactor: template-based prompts, history, multimodal capability exposure, logging.
LLama.Web/Models/ModelDownloadStatus.cs	New download snapshot/progress models and enums.
LLama.Web/Models/ModelCapabilities.cs	New model capability DTO.
LLama.Web/Models/MemoryBrowserFile.cs	In-memory `IBrowserFile` implementation.
LLama.Web/Models/LLamaModel.cs	Loads MTMD mmproj weights when configured and disposes them.
LLama.Web/Models/AttachmentInfo.cs	New attachment metadata + upload result models.
LLama.Web/LLama.Web.csproj	Adds LibMan build integration and PdfPig dependency.
LLama.Web/Hubs/SessionConnectionHub.cs	Adds download snapshot + storage info broadcasts; prompt now accepts `PromptRequest`; cleans up attachments on disconnect.
LLama.Web/Hubs/ISessionClient.cs	Adds SignalR client methods for download progress/snapshots and storage info.
LLama.Web/Extensions.cs	Comment/formatting cleanups for CSV/list helpers.
LLama.Web/Controllers/AttachmentController.cs	New attachments API endpoints for upload + download.
LLama.Web/Common/ModelOptions.cs	Adds model/mmproj download URL fields and default pooling type.
LLama.Web/Common/ModelLoadType.cs	Comment cleanup.
LLama.Web/Async/AsyncLock.cs	Comment cleanup.
LLama.Web/Async/AsyncGuard.cs	Comment cleanup.
LLama.Web/App.razor	New Blazor router app shell.
LLama.Unittest/NativeLibraryConfigContainerTests.cs	Adds unit test to ensure `DryRun` preserves loaded library outputs.
LLama.Unittest/MtmdWeightsTests.cs	Refactors MTMD tests to use fixture/collection and context-per-test.
LLama.Unittest/MtmdNoCiCollection.cs	Adds shared MTMD fixture and disables parallelization for these tests.
LLama.Unittest/MtmdExecutorTests.cs	Refactors and adds MTMD executor behavior tests (prompt ordering, chunk handling).
LLama.Unittest/MtmdContextGuardTests.cs	Adds MTMD context guard + “no state/session persistence” behavior tests.
LLama.Examples/Examples/MtmdInteractiveModeExecute.cs	Updates sample for MTMD standalone embed loads and template marker antiprompt handling.
.gitignore	Ignores LLama.Web offline libs and downloaded models directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T23:43:13Z

LLama.Web/Controllers/AttachmentController.cs

+    }
+
+    [HttpPost]
+    [RequestSizeLimit(256_000_000)]


[RequestSizeLimit(256_000_000)] caps uploads to ~256MB, but AttachmentService uses a MaxUploadSize of 512MB for browser uploads. This mismatch can lead to confusing failures (client thinks 512MB is allowed, server rejects at 256MB). Align these limits (and document the effective max).

Suggested change

[RequestSizeLimit(256_000_000)]

// Align request size limit with AttachmentService.MaxUploadSize (512 MB).

[RequestSizeLimit(512_000_000)]

Copilot · 2026-03-20T23:43:13Z

LLama.Web/Pages/Shared/_Layout.cshtml

+    <script src="~/lib/katex/dist/katex.min.js"></script>
+    <script src="~/lib/markdown-it/dist/markdown-it.min.js"></script>
+    <script src="~/lib/markdown-it-task-lists/dist/markdown-it-task-lists.min.js"></script>
+    <script src="~/lib/markdown-it-footnote/dist/markdown-it-footnote.min.js"></script>
+    <script src="~/lib/markdown-it-deflist/dist/markdown-it-deflist.min.js"></script>
+    <script src="~/lib/markdown-it-sub/dist/markdown-it-sub.min.js"></script>
+    <script src="~/lib/markdown-it-sup/dist/markdown-it-sup.min.js"></script>
+    <script src="~/lib/markdown-it-mark/dist/markdown-it-mark.min.js"></script>
+    <script src="~/lib/markdown-it-emoji/dist/markdown-it-emoji.min.js"></script>
+    <script src="~/lib/mermaid/dist/mermaid.min.js"></script>


libman.json adds markdown-it-katex, and the JS renderer conditionally calls md.use(window.markdownitKatex), but _Layout.cshtml doesn’t load the markdown-it-katex script. As a result, KaTeX/LaTeX rendering will never activate. Either include the plugin script here or remove the unused dependency/conditional logic.

Copilot · 2026-03-20T23:43:13Z

LLama.Web/Services/AttachmentService.cs

+        var result = new AttachmentUploadResult();
+        var storage = _attachments.GetOrAdd(connectionId, _ => new ConcurrentDictionary<string, AttachmentInfo>());
+        var root = Path.Combine(_uploadsRoot, connectionId);
+        Directory.CreateDirectory(root);


connectionId is used directly to build filesystem paths (e.g., Path.Combine(_uploadsRoot, connectionId)), but the controller accepts connectionId from the client. This allows path traversal / writing outside the uploads root if a malicious value contains path separators or ... Sanitize/validate connectionId (e.g., restrict to a safe character set and reject path separators) and/or resolve the combined path and verify it stays under _uploadsRoot before creating directories or writing files.

Copilot · 2026-03-20T23:43:14Z

LLama.Web/Services/AttachmentService.cs

+        if (_attachments.TryRemove(connectionId, out _))
+        {
+            var root = Path.Combine(_uploadsRoot, connectionId);
+            if (Directory.Exists(root))
+                Directory.Delete(root, recursive: true);
+        }


CleanupAsync deletes Path.Combine(_uploadsRoot, connectionId) recursively. Since connectionId is client-controlled, this can be abused to delete arbitrary directories if path traversal is possible. After validating connectionId (as noted earlier), also ensure the computed directory is under _uploadsRoot before deleting.

Copilot · 2026-03-20T23:43:14Z

LLama.Web/Services/AttachmentService.cs

+    private static void ValidateUploads(IEnumerable<IFormFile> files)
+    {
+        var invalid = files
+            .Where(file => file != null)
+            .Where(file => !IsAllowedUpload(file.ContentType?.ToLowerInvariant() ?? string.Empty, Path.GetExtension(file.FileName).ToLowerInvariant()))
+            .Select(file => file.FileName)
+            .ToList();
+
+        if (invalid.Count == 0)
+            return;
+
+        throw new InvalidOperationException($"Unsupported files: {string.Join(", ", invalid)}. Use PDF, DOCX, or images.");
+    }


The upload validation error message says "Use PDF, DOCX, or images." but audio files are also allowed by IsAllowedUpload (audio/* and common audio extensions). Update the message so it matches the actual accepted file types (and consider listing audio explicitly).

Copilot · 2026-03-20T23:43:14Z

LLama.Web/Services/AttachmentService.cs

+        foreach (var file in files)
+        {
+            if (file == null || file.Length == 0)
+                continue;
+
+            var id = Guid.NewGuid().ToString("N");
+            var safeName = Path.GetFileName(file.FileName);
+            var filePath = Path.Combine(root, $"{id}-{safeName}");
+
+            await using (var stream = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.None, 81920, useAsync: true))
+            {
+                await file.CopyToAsync(stream, cancellationToken);
+            }


MaxUploadSize is enforced for IBrowserFile uploads via OpenReadStream(maxAllowedSize: MaxUploadSize), but IFormFile uploads are not size-limited (beyond whatever server limits apply). To avoid unexpected large uploads/DoS, enforce file.Length <= MaxUploadSize for IFormFile as well (either in ValidateUploads or inside the foreach).

Copilot · 2026-03-20T23:43:14Z

LLama.Web/Controllers/AttachmentController.cs

+    [HttpPost]
+    [RequestSizeLimit(256_000_000)]
+    public async Task<ActionResult<AttachmentUploadResult>> Upload([FromForm] string connectionId, [FromForm] List<IFormFile> files, CancellationToken cancellationToken)
+    {
+        if (string.IsNullOrWhiteSpace(connectionId))
+            return BadRequest("Missing connectionId.");
+
+        if (files is null || files.Count == 0)
+            return BadRequest("No files provided.");
+
+        try
+        {
+            var result = await _attachmentService.SaveAsync(connectionId, files, cancellationToken);
+            return Ok(result);


The attachments API trusts a client-supplied connectionId to decide where files are stored. Without validating that this connectionId actually belongs to the caller (or is even well-formed), a client can upload into another session’s namespace. Consider deriving the session identifier server-side (e.g., from auth/session state) or issuing a per-connection upload token, and at minimum validate/sanitize the connectionId value before passing it to the attachment service.

SignalRT · 2026-03-23T20:20:07Z

One thing that I'm not sure about is the media queue in the SafeMtmdModelHandle. Why is it an implicit queue instead of an explicit parameter passed into the tokenize call?

Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper.

That's a convenience API.

So my preference would be:

Keep explicit media passing as the primary API.
Treat the implicit queue as optional convenience only.
Move that convenience up out of SafeMtmdModelHandle if we keep it at all.

SignalRT added 2 commits January 19, 2026 23:45

Improve LLama.Web

6859e57

Initial version

Add Missing Files

466a8cb

SignalRT mentioned this pull request Feb 20, 2026

Explanations about mtmd are needed (critical problem found) #1337

Closed

SignalRT added 2 commits March 14, 2026 13:23

Merge branch 'SciSharp:master' into WebReview

d6d0da8

SignalRT mentioned this pull request Mar 15, 2026

[BUG]: InteractiveExecutor when using MTMD with a limited context size, a NoKvSlot error occurs #1355

Open

SignalRT added 3 commits March 20, 2026 22:01

Clean up + Documentation

9e2c72f

Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure

Solve some issues in the WEB

c2170b0

Stop and load the model on change Solve issue with the ENTER

Add log on attachment management

68622e8

SignalRT marked this pull request as ready for review March 20, 2026 23:18

martindevans requested a review from Copilot March 20, 2026 23:38

Copilot started reviewing on behalf of martindevans March 20, 2026 23:39 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316
SignalRT wants to merge 7 commits intoSciSharp:masterfrom
SignalRT:WebReview

SignalRT commented Jan 19, 2026 •

edited

Loading

Uh oh!

martindevans commented Mar 20, 2026 •

edited

Loading

Uh oh!

martindevans commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

SignalRT commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	[RequestSizeLimit(256_000_000)]
	// Align request size limit with AttachmentService.MaxUploadSize (512 MB).
	[RequestSizeLimit(512_000_000)]

Conversation

SignalRT commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindevans commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindevans commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

SignalRT commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SignalRT commented Jan 19, 2026 •

edited

Loading

martindevans commented Mar 20, 2026 •

edited

Loading