-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support Using non Image/PDF files with Gemini models #9416
Comments
Hey @ishaan-jaff, I'd like to get assigned and work on this issue. I will update the Code, add test cases and Documentation for this. Let me know if this sounds good!! |
yes @NiharP31 , please send the proposed litellm interface / request on this issue before building out the feature |
Based on my understanding, here is the solution: 1. File Type Classification and HandlingIn
2. Transformation Logic UpdatesIn
3. Implementation Benefits
If this approach aligns with your expectations, I'd be happy to proceed with implementation. |
Hi @NiharP31 can you define a clear success criteria for this issue ? Ideally 3-4 test cases you're hoping to pass |
Test cases to validate the implementation & Success Criteria: Test Case 1: Markdown File Processing
Test Case 2: JSON Data File
Test Case 3: Mixed Content Input
Test Case 4: Large Text File Handling
All the above test cases would verify that the system got correctly integrated with MIME type classification and conditional base64 encoding, ensuring Gemini can properly access the full range of file types it supports. More cases can be CSV files (Structured format), calling data from Google-Cloud Storage, etc. |
ok, you can go ahead on implementing this @NiharP31 @johann-petrak what do you think of the test cases ? |
Looks like a really good plan to me, thank you for proposing this! One thought I have is that given the LiteLLM package provides a very consistent API for using a wide range of models, Sorry if all this has been clear anyways ... For me the most important aspect right now is support for Gemini, but I think especially with models very likely supporting much larger contexts and also with models that support multimodal prompting, sending files as part of a prompt will become much more widely used soon. |
@johann-petrak got your point. I'm currently exploring the code-base. The current PR should provide a foundation for a more consistent file handling approach across different platforms. Once I'm done with the existing PR, I'd love to work on further integration if that would be helpful for the project. |
Hey @johann-petrak openai just added support for a new files message content type, it maps quite similarly to vertex's FileData part type - which would make sending gs://, etc. url's much easier. I'm using it in our Gemini audio file input implementation as well ![]() |
The Feature
There is a description of how to upload images or pdfs to gemini models in the docs:
https://docs.litellm.ai/docs/providers/vertex#gemini-15-pro-and-vision
But Gemini can use many more mime types in prompts, see https://ai.google.dev/gemini-api/docs/document-processing?lang=python#technical-details
However it does not seem to be possible to send any other file types. One Issue is that for some mime types, no base64 encoding is necessary (e.g. text/md), but not appending ";base64" to the mime-type results in an exception as the code exepcts ";base64" to be present always.
Trying to load the Markdown file as bytes, base64 encoding that and using this resulted in a weird error where the model was complaining about the input token limit being exceeded (it reported more than 5M tokens) even though the Markdown file is only a few thousand words.
Motivation, pitch
Using large fiels with Gemini is one of the specific use cases for the Google model and all of the other mime types supported are extremely useful for analysis purposes
Are you a ML Ops Team?
No
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: