Added ability for vision capable modal to talk to #146

HaarisIqubal · 2025-04-23T17:15:12Z

🚀 Feature: Vision Model Integration & Image Handling Enhancements

✨ Summary

This pull request introduces major enhancements to support interaction with vision-capable models. The following key features have been added:

✅ What's New

Vision Model Support:
Integrated support to chat with models capable of processing images (e.g., llava, "gemma:4b").
Image Uploading:
Users can now upload one or multiple images to send as part of their message payload.
Image Deletion:
Added the ability to remove selected images before sending a message to the model.
Persistent Storage:
Images are now stored persistently so that they're available across app sessions.

📦 Implementation Details

Base64 encoding is used to serialize image data before sending it to the model.
Uploaded images are resized and converted to JPEG format to reduce payload size.
State management for image selection and deletion is handled using @State/@StateObject (if applicable).
Persistent storage is implemented using FileManager or similar API.

📝 Notes

Currently supports up to X images (if there's a limit we can put if needed need to test how much RAM needed when more photos are uploaded).
Future enhancement: support drag & drop or camera capture (if desired).

I hope you consider integrating this into the main codebase. Let me know if any changes are needed or if you'd like additional enhancements! Below are some screenshots that are captured from application hope you get some idea how it works.

1. Added ModelsView: - Added way to check downloaded model through ollama and download model from application.

1. Multimodal language model can now work natively - Added capability for removing image from upload image section

Haaris added 3 commits April 13, 2025 13:03

Here are the changes made:

fbdad15

1. Added ModelsView: - Added way to check downloaded model through ollama and download model from application.

Added ability to chat with images

6c66903

Here are the changes made:

bbb36dc

1. Multimodal language model can now work natively - Added capability for removing image from upload image section

HaarisIqubal mentioned this pull request Apr 23, 2025

Need Features： Image upload #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added ability for vision capable modal to talk to #146

Added ability for vision capable modal to talk to #146

Uh oh!

HaarisIqubal commented Apr 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Added ability for vision capable modal to talk to #146

Are you sure you want to change the base?

Added ability for vision capable modal to talk to #146

Uh oh!

Conversation

HaarisIqubal commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Feature: Vision Model Integration & Image Handling Enhancements

✨ Summary

✅ What's New

📦 Implementation Details

📝 Notes

Uh oh!

Uh oh!

HaarisIqubal commented Apr 23, 2025 •

edited

Loading