Skip to content

Added ability for vision capable modal to talk to #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

HaarisIqubal
Copy link

@HaarisIqubal HaarisIqubal commented Apr 23, 2025

🚀 Feature: Vision Model Integration & Image Handling Enhancements

✨ Summary

This pull request introduces major enhancements to support interaction with vision-capable models. The following key features have been added:

✅ What's New

  • Vision Model Support:
    Integrated support to chat with models capable of processing images (e.g., llava, "gemma:4b").

  • Image Uploading:
    Users can now upload one or multiple images to send as part of their message payload.

  • Image Deletion:
    Added the ability to remove selected images before sending a message to the model.

  • Persistent Storage:
    Images are now stored persistently so that they're available across app sessions.


📦 Implementation Details

  • Base64 encoding is used to serialize image data before sending it to the model.
  • Uploaded images are resized and converted to JPEG format to reduce payload size.
  • State management for image selection and deletion is handled using @State/@StateObject (if applicable).
  • Persistent storage is implemented using FileManager or similar API.

📝 Notes

  • Currently supports up to X images (if there's a limit we can put if needed need to test how much RAM needed when more photos are uploaded).
  • Future enhancement: support drag & drop or camera capture (if desired).

I hope you consider integrating this into the main codebase. Let me know if any changes are needed or if you'd like additional enhancements! Below are some screenshots that are captured from application hope you get some idea how it works.

Screenshot 2025-04-23 at 18 59 48 Screenshot 2025-04-23 at 18 59 59

Haaris added 3 commits April 13, 2025 13:03
1. Added ModelsView:
   - Added way to check downloaded model through ollama and download model from application.
1. Multimodal language model can now work natively
    - Added capability for removing image from upload image section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant