End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316
End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316SignalRT wants to merge 7 commits intoSciSharp:masterfrom
Conversation
Initial version
- Reworked MTMD prompt handling to preserve text/media ordering and evaluate multimodal input incrementally. - Disabled unsupported multimodal features such as session persistence and context shifting. - Added standalone MTMD media loading and synchronized MTMD weight operations. - Updated MTMD example and tests to cover prompt ordering, guards, and opt-in NoCI execution. - Fixed web model/session defaults for multimodal models, including template-derived stop markers and unspecified pooling. - Improved LLama.Web audio attachment/recording flow, Qwen audio prompt handling, and chat composer UX. - Removed the broken browser script include and added a safe markdown fallback.
Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure
Stop and load the model on change Solve issue with the ENTER
|
One thing that I'm not sure about is the media queue in the Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper. |
|
Other than that one comment, looks good to me! |
There was a problem hiding this comment.
Pull request overview
This PR modernizes LLamaSharp’s multimodal support by migrating from LLava to MTMD, and substantially upgrades LLama.Web to support end-to-end multimodal chat (attachments, uploads, streaming markdown rendering) plus automatic model downloads with progress reporting.
Changes:
- Replace LLava types/APIs/docs with MTMD equivalents (
MtmdWeights,SafeMtmd*handles, executor multimodal plumbing). - Add LLama.Web pipeline: attachment upload + extraction (PDF/DOCX), media embeddings (image/audio), streaming UI rendering with markdown/mermaid, and capability-aware behavior.
- Add model auto-download service with SignalR progress updates and corresponding UI/status wiring.
Reviewed changes
Copilot reviewed 77 out of 78 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Updates documentation navigation to MTMD docs (removes LLava entries). |
| docs/xmldocs/llama.statelessexecutor.md | Docs update for MTMD properties (ClipModel, Embeds). |
| docs/xmldocs/llama.native.safemtmdmodelhandle.md | New generated docs for MTMD safe handle API. |
| docs/xmldocs/llama.native.safemtmdinputchunks.md | New generated docs for MTMD input chunks wrapper. |
| docs/xmldocs/llama.native.safemtmdinputchunk.md | New generated docs for MTMD input chunk wrapper. |
| docs/xmldocs/llama.native.safemtmdembed.md | New generated docs for MTMD embed wrapper. |
| docs/xmldocs/llama.native.nativelibraryconfigcontainer.md | Docs: rename LLava params to MTMD, fix AVX wording, update DryRun signature docs. |
| docs/xmldocs/llama.native.mtmdcontextparams.md | New generated docs for MTMD context params. |
| docs/xmldocs/llama.mtmdweights.md | New generated docs for MtmdWeights. |
| docs/xmldocs/llama.interactiveexecutor.md | Docs update: MTMD fields, cancellation tokens, antiprompt processor, state limitations, embeds. |
| docs/xmldocs/llama.instructexecutor.md | Docs update mirroring interactive executor changes for MTMD + cancellation tokens. |
| docs/xmldocs/llama.batched.conversation.md | Docs update: add MTMD prompt overloads and remove LLava image embed overload. |
| docs/xmldocs/llama.batched.batchedexecutor.md | Docs update: add MTMD clip model support. |
| docs/xmldocs/llama.abstractions.illamaexecutor.md | Docs update: ClipModel/Embeds now MTMD types. |
| docs/xmldocs/index.md | Docs index updated for MTMD types and removes LLava references. |
| docs/Tutorials/NativeLibraryConfig.md | Tutorial updated for MTMD library configuration. |
| docs/Tutorials/Executors.md | Tutorial updated for MTMD fields + state persistence limitations for multimodal executors. |
| docs/QuickStart.md | QuickStart updated with MTMD example and embed loading flow. |
| docs/Examples/MtmdInteractiveModeExecute.md | Example docs updated from SafeMtmdWeights/single-brace paths to MtmdWeights/double-brace paths. |
| LLama/Native/SafeMtmdModelHandle.cs | Adds standalone embed creation APIs and refactors load methods to use them. |
| LLama/Native/Load/NativeLibraryConfig.cs | Fixes DryRun out params initialization/behavior and documents outputs. |
| LLama/MtmdWeights.cs | Adds locking and new standalone media load APIs; wraps tokenize/eval calls for thread safety. |
| LLama/LLamaInteractExecutor.cs | MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes. |
| LLama/LLamaInstructExecutor.cs | MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes. |
| LLama/ChatSession.cs | Blocks session persistence APIs for multimodal sessions, refactors stateful executor access. |
| LLama/AntipromptProcessor.cs | Uses StringComparison.Ordinal for antiprompt matching. |
| LLama.Web/wwwroot/js/sessionConnectionChat.js | Adds attachment uploads, download status UI, and streaming markdown rendering. |
| LLama.Web/libman.json | Adds offline web libs for markdown rendering (markdown-it plugins, katex, mermaid). |
| LLama.Web/appsettings.json | Updates model list to downloadable models and adds mmproj paths/URLs + new defaults. |
| LLama.Web/_Imports.razor | New shared imports for Blazor components/services. |
| LLama.Web/Shared/MainLayout.razor | Adds Blazor main layout wrapper. |
| LLama.Web/Services/ModelSessionService.cs | Adds attachment-aware prompt preparation + embeds, capabilities API, history handling. |
| LLama.Web/Services/ModelService.cs | Integrates model download readiness checks and normalizes UBatchSize/BatchSize. |
| LLama.Web/Services/ModelLoaderService.cs | Starts model downloads at startup and loads models after downloads complete. |
| LLama.Web/Services/ModelDownloadService.cs | New background download service with SignalR progress + local storage management. |
| LLama.Web/Services/IModelSessionService.cs | Updates Infer API to PromptRequest and adds capabilities method. |
| LLama.Web/Services/IModelService.cs | Documentation/wording cleanups. |
| LLama.Web/Services/IModelDownloadService.cs | New interface for model download management. |
| LLama.Web/Services/IAttachmentService.cs | New interface for attachment storage/extraction lifecycle. |
| LLama.Web/Services/AttachmentService.cs | New attachment pipeline: validation, storage, PDF/DOCX extraction, cleanup. |
| LLama.Web/README.md | Documents local asset storage, LibMan restore, and attachment/model download locations. |
| LLama.Web/Program.cs | Adds Blazor + controllers, registers new services, maps endpoints, logs storage paths. |
| LLama.Web/Pages/_Host.cshtml | Adds Blazor server host page. |
| LLama.Web/Pages/Shared/_Parameters.cshtml | Updates parameter binding to sampling pipeline fields. |
| LLama.Web/Pages/Shared/_Layout.cshtml | Updates layout to load offline markdown/diagram libs and Blazor runtime. |
| LLama.Web/Pages/Shared/_ChatTemplates.cshtml | Templates updated for markdown styling + attachment display. |
| LLama.Web/Pages/Index.cshtml.cs | Removed legacy Razor Pages index model. |
| LLama.Web/Pages/Index.cshtml | Removed legacy Razor Pages chat UI. |
| LLama.Web/Models/StorageInfo.cs | New model for storage path UI info. |
| LLama.Web/Models/PromptRequest.cs | New prompt request model including attachment IDs. |
| LLama.Web/Models/ModelSession.cs | Major session refactor: template-based prompts, history, multimodal capability exposure, logging. |
| LLama.Web/Models/ModelDownloadStatus.cs | New download snapshot/progress models and enums. |
| LLama.Web/Models/ModelCapabilities.cs | New model capability DTO. |
| LLama.Web/Models/MemoryBrowserFile.cs | In-memory IBrowserFile implementation. |
| LLama.Web/Models/LLamaModel.cs | Loads MTMD mmproj weights when configured and disposes them. |
| LLama.Web/Models/AttachmentInfo.cs | New attachment metadata + upload result models. |
| LLama.Web/LLama.Web.csproj | Adds LibMan build integration and PdfPig dependency. |
| LLama.Web/Hubs/SessionConnectionHub.cs | Adds download snapshot + storage info broadcasts; prompt now accepts PromptRequest; cleans up attachments on disconnect. |
| LLama.Web/Hubs/ISessionClient.cs | Adds SignalR client methods for download progress/snapshots and storage info. |
| LLama.Web/Extensions.cs | Comment/formatting cleanups for CSV/list helpers. |
| LLama.Web/Controllers/AttachmentController.cs | New attachments API endpoints for upload + download. |
| LLama.Web/Common/ModelOptions.cs | Adds model/mmproj download URL fields and default pooling type. |
| LLama.Web/Common/ModelLoadType.cs | Comment cleanup. |
| LLama.Web/Async/AsyncLock.cs | Comment cleanup. |
| LLama.Web/Async/AsyncGuard.cs | Comment cleanup. |
| LLama.Web/App.razor | New Blazor router app shell. |
| LLama.Unittest/NativeLibraryConfigContainerTests.cs | Adds unit test to ensure DryRun preserves loaded library outputs. |
| LLama.Unittest/MtmdWeightsTests.cs | Refactors MTMD tests to use fixture/collection and context-per-test. |
| LLama.Unittest/MtmdNoCiCollection.cs | Adds shared MTMD fixture and disables parallelization for these tests. |
| LLama.Unittest/MtmdExecutorTests.cs | Refactors and adds MTMD executor behavior tests (prompt ordering, chunk handling). |
| LLama.Unittest/MtmdContextGuardTests.cs | Adds MTMD context guard + “no state/session persistence” behavior tests. |
| LLama.Examples/Examples/MtmdInteractiveModeExecute.cs | Updates sample for MTMD standalone embed loads and template marker antiprompt handling. |
| .gitignore | Ignores LLama.Web offline libs and downloaded models directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| [HttpPost] | ||
| [RequestSizeLimit(256_000_000)] |
There was a problem hiding this comment.
[RequestSizeLimit(256_000_000)] caps uploads to ~256MB, but AttachmentService uses a MaxUploadSize of 512MB for browser uploads. This mismatch can lead to confusing failures (client thinks 512MB is allowed, server rejects at 256MB). Align these limits (and document the effective max).
| [RequestSizeLimit(256_000_000)] | |
| // Align request size limit with AttachmentService.MaxUploadSize (512 MB). | |
| [RequestSizeLimit(512_000_000)] |
| <script src="~/lib/katex/dist/katex.min.js"></script> | ||
| <script src="~/lib/markdown-it/dist/markdown-it.min.js"></script> | ||
| <script src="~/lib/markdown-it-task-lists/dist/markdown-it-task-lists.min.js"></script> | ||
| <script src="~/lib/markdown-it-footnote/dist/markdown-it-footnote.min.js"></script> | ||
| <script src="~/lib/markdown-it-deflist/dist/markdown-it-deflist.min.js"></script> | ||
| <script src="~/lib/markdown-it-sub/dist/markdown-it-sub.min.js"></script> | ||
| <script src="~/lib/markdown-it-sup/dist/markdown-it-sup.min.js"></script> | ||
| <script src="~/lib/markdown-it-mark/dist/markdown-it-mark.min.js"></script> | ||
| <script src="~/lib/markdown-it-emoji/dist/markdown-it-emoji.min.js"></script> | ||
| <script src="~/lib/mermaid/dist/mermaid.min.js"></script> |
There was a problem hiding this comment.
libman.json adds markdown-it-katex, and the JS renderer conditionally calls md.use(window.markdownitKatex), but _Layout.cshtml doesn’t load the markdown-it-katex script. As a result, KaTeX/LaTeX rendering will never activate. Either include the plugin script here or remove the unused dependency/conditional logic.
| var result = new AttachmentUploadResult(); | ||
| var storage = _attachments.GetOrAdd(connectionId, _ => new ConcurrentDictionary<string, AttachmentInfo>()); | ||
| var root = Path.Combine(_uploadsRoot, connectionId); | ||
| Directory.CreateDirectory(root); |
There was a problem hiding this comment.
connectionId is used directly to build filesystem paths (e.g., Path.Combine(_uploadsRoot, connectionId)), but the controller accepts connectionId from the client. This allows path traversal / writing outside the uploads root if a malicious value contains path separators or ... Sanitize/validate connectionId (e.g., restrict to a safe character set and reject path separators) and/or resolve the combined path and verify it stays under _uploadsRoot before creating directories or writing files.
| if (_attachments.TryRemove(connectionId, out _)) | ||
| { | ||
| var root = Path.Combine(_uploadsRoot, connectionId); | ||
| if (Directory.Exists(root)) | ||
| Directory.Delete(root, recursive: true); | ||
| } |
There was a problem hiding this comment.
CleanupAsync deletes Path.Combine(_uploadsRoot, connectionId) recursively. Since connectionId is client-controlled, this can be abused to delete arbitrary directories if path traversal is possible. After validating connectionId (as noted earlier), also ensure the computed directory is under _uploadsRoot before deleting.
| private static void ValidateUploads(IEnumerable<IFormFile> files) | ||
| { | ||
| var invalid = files | ||
| .Where(file => file != null) | ||
| .Where(file => !IsAllowedUpload(file.ContentType?.ToLowerInvariant() ?? string.Empty, Path.GetExtension(file.FileName).ToLowerInvariant())) | ||
| .Select(file => file.FileName) | ||
| .ToList(); | ||
|
|
||
| if (invalid.Count == 0) | ||
| return; | ||
|
|
||
| throw new InvalidOperationException($"Unsupported files: {string.Join(", ", invalid)}. Use PDF, DOCX, or images."); | ||
| } |
There was a problem hiding this comment.
The upload validation error message says "Use PDF, DOCX, or images." but audio files are also allowed by IsAllowedUpload (audio/* and common audio extensions). Update the message so it matches the actual accepted file types (and consider listing audio explicitly).
| foreach (var file in files) | ||
| { | ||
| if (file == null || file.Length == 0) | ||
| continue; | ||
|
|
||
| var id = Guid.NewGuid().ToString("N"); | ||
| var safeName = Path.GetFileName(file.FileName); | ||
| var filePath = Path.Combine(root, $"{id}-{safeName}"); | ||
|
|
||
| await using (var stream = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.None, 81920, useAsync: true)) | ||
| { | ||
| await file.CopyToAsync(stream, cancellationToken); | ||
| } |
There was a problem hiding this comment.
MaxUploadSize is enforced for IBrowserFile uploads via OpenReadStream(maxAllowedSize: MaxUploadSize), but IFormFile uploads are not size-limited (beyond whatever server limits apply). To avoid unexpected large uploads/DoS, enforce file.Length <= MaxUploadSize for IFormFile as well (either in ValidateUploads or inside the foreach).
| [HttpPost] | ||
| [RequestSizeLimit(256_000_000)] | ||
| public async Task<ActionResult<AttachmentUploadResult>> Upload([FromForm] string connectionId, [FromForm] List<IFormFile> files, CancellationToken cancellationToken) | ||
| { | ||
| if (string.IsNullOrWhiteSpace(connectionId)) | ||
| return BadRequest("Missing connectionId."); | ||
|
|
||
| if (files is null || files.Count == 0) | ||
| return BadRequest("No files provided."); | ||
|
|
||
| try | ||
| { | ||
| var result = await _attachmentService.SaveAsync(connectionId, files, cancellationToken); | ||
| return Ok(result); |
There was a problem hiding this comment.
The attachments API trusts a client-supplied connectionId to decide where files are stored. Without validating that this connectionId actually belongs to the caller (or is even well-formed), a client can upload into another session’s namespace. Consider deriving the session identifier server-side (e.g., from auth/session state) or issuing a per-connection upload token, and at minimum validate/sanitize the connectionId value before passing it to the attachment service.
That's a convenience API. So my preference would be:
|
Summary:
This PR delivers a full multimodal chat pipeline in LLama.Web: PDF and Word document ingestion with text extraction, image and audio uploads, native in‑browser audio recording (preview/attach/discard), plus streaming response
rendering with Markdown support.
Key Features:
Implementation Highlights
Capability to upload images and ask about the images
Model auto-download + Capability to upload files and ask about the files
