[Feature]: Support Using non Image/PDF files with Gemini models #9416

johann-petrak · 2025-03-20T17:31:16Z

The Feature

There is a description of how to upload images or pdfs to gemini models in the docs:
https://docs.litellm.ai/docs/providers/vertex#gemini-15-pro-and-vision

But Gemini can use many more mime types in prompts, see https://ai.google.dev/gemini-api/docs/document-processing?lang=python#technical-details

However it does not seem to be possible to send any other file types. One Issue is that for some mime types, no base64 encoding is necessary (e.g. text/md), but not appending ";base64" to the mime-type results in an exception as the code exepcts ";base64" to be present always.

Trying to load the Markdown file as bytes, base64 encoding that and using this resulted in a weird error where the model was complaining about the input token limit being exceeded (it reported more than 5M tokens) even though the Markdown file is only a few thousand words.

Motivation, pitch

Using large fiels with Gemini is one of the specific use cases for the Google model and all of the other mime types supported are extremely useful for analysis purposes

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

NiharP31 · 2025-03-26T13:11:03Z

Hey @ishaan-jaff, I'd like to get assigned and work on this issue. I will update the Code, add test cases and Documentation for this. Let me know if this sounds good!!

ishaan-jaff · 2025-03-26T15:18:31Z

yes @NiharP31 , please send the proposed litellm interface / request on this issue before building out the feature

NiharP31 · 2025-03-27T14:47:41Z

Based on my understanding, here is the solution:

1. File Type Classification and Handling

In litellm/types/files.py, I'll add comprehensive support for all Gemini-supported MIME types by:

Creating a classification system to distinguish between:
- Binary file types (images, PDFs, videos, etc.) - requiring base64 encoding
- Text file types (text/markdown, application/json, text/csv, etc.) - no base64 encoding needed
Expanding the is_gemini_1_5_accepted_file_type() function to include all MIME types supported by Gemini's API
Adding a helper function requires_base64_encoding() to determine proper handling

2. Transformation Logic Updates

In litellm/llms/vertex_ai/gemini/transformation.py, I'll:

Rename _process_gemini_image() to _process_gemini_file() to reflect its broader purpose
Modify the file processing logic to conditionally apply base64 encoding based on MIME type
Fix token counting for text-based files by properly handling them without unnecessary encoding

3. Implementation Benefits

No UI changes required - users will use the existing interface
Fixes the token miscalculation issue by preventing unnecessary base64 encoding of text files
Enables support for all Gemini-supported MIME types as documented in Google's API

If this approach aligns with your expectations, I'd be happy to proceed with implementation.

ishaan-jaff · 2025-03-27T15:11:28Z

Hi @NiharP31 can you define a clear success criteria for this issue ? Ideally 3-4 test cases you're hoping to pass

NiharP31 · 2025-03-27T16:48:56Z

Test cases to validate the implementation & Success Criteria:

Test Case 1: Markdown File Processing

Input: A Markdown file (lets assume 2-3KB in size)
Expected Output:
- File processed without ";base64" appended to MIME type
- Token count accurately reflects actual content size (appx. ~500-1000 tokens, not millions)
- Gemini model successfully processes and responds to the Markdown content

Test Case 2: JSON Data File

Input: A JSON file (structured data is assumed here)
Expected Output:
- File properly processed with "application/json" MIME type without base64 encoding
- Model can correctly reference and analyze the JSON structure in its response

Test Case 3: Mixed Content Input

Input: A request containing both text (Markdown) and binary (image) files
Expected Output:
- Text files processed without base64 encoding
- Binary files properly encoded with base64
- Model successfully references both content types in its response

Test Case 4: Large Text File Handling

Input: A larger text file (lets assume ~50KB or even more in size)
Expected Output:
- File processed without token count errors
- Response acknowledges content without hitting artificial token limits caused by encoding issues

All the above test cases would verify that the system got correctly integrated with MIME type classification and conditional base64 encoding, ensuring Gemini can properly access the full range of file types it supports. More cases can be CSV files (Structured format), calling data from Google-Cloud Storage, etc.

ishaan-jaff · 2025-03-27T16:56:04Z

ok, you can go ahead on implementing this @NiharP31

@johann-petrak what do you think of the test cases ?

johann-petrak · 2025-03-27T20:40:28Z

Looks like a really good plan to me, thank you for proposing this!

One thought I have is that given the LiteLLM package provides a very consistent API for using a wide range of models,
if there is a way to make using file somewhat consistent across the models?
This is complicated by the fact that there exist basically two approaches for doing this: 1) upload a file in a separate request, get an ID, then send the prompt and reference the file ID in a special kind of prompt message and 2) directly send the file as part of the prompt message
As far as I can see the details of these approaches differ between providers (and also the kind of files supported) e.g. here is how this works with OpenAI and PDF https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Sorry if all this has been clear anyways ...

For me the most important aspect right now is support for Gemini, but I think especially with models very likely supporting much larger contexts and also with models that support multimodal prompting, sending files as part of a prompt will become much more widely used soon.

NiharP31 · 2025-03-28T18:18:31Z

@johann-petrak got your point. I'm currently exploring the code-base. The current PR should provide a foundation for a more consistent file handling approach across different platforms. Once I'm done with the existing PR, I'd love to work on further integration if that would be helpful for the project.
@ishaan-jaff let me know your take on this.

krrishdholakia · 2025-03-30T07:10:41Z

if there is a way to make using file somewhat consistent across the models?

Hey @johann-petrak openai just added support for a new files message content type, it maps quite similarly to vertex's FileData part type - which would make sending gs://, etc. url's much easier. I'm using it in our Gemini audio file input implementation as well

johann-petrak added the enhancement New feature or request label Mar 20, 2025

NiharP31 linked a pull request Mar 27, 2025 that will close this issue

Added support to additional MIME types for Gemini model #9590

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support Using non Image/PDF files with Gemini models #9416

[Feature]: Support Using non Image/PDF files with Gemini models #9416

johann-petrak commented Mar 20, 2025

NiharP31 commented Mar 26, 2025

ishaan-jaff commented Mar 26, 2025

NiharP31 commented Mar 27, 2025

ishaan-jaff commented Mar 27, 2025

NiharP31 commented Mar 27, 2025

ishaan-jaff commented Mar 27, 2025

johann-petrak commented Mar 27, 2025

NiharP31 commented Mar 28, 2025

krrishdholakia commented Mar 30, 2025

[Feature]: Support Using non Image/PDF files with Gemini models #9416

[Feature]: Support Using non Image/PDF files with Gemini models #9416

Comments

johann-petrak commented Mar 20, 2025

The Feature

Motivation, pitch

Are you a ML Ops Team?

Twitter / LinkedIn details

NiharP31 commented Mar 26, 2025

ishaan-jaff commented Mar 26, 2025

NiharP31 commented Mar 27, 2025

1. File Type Classification and Handling

2. Transformation Logic Updates

3. Implementation Benefits

ishaan-jaff commented Mar 27, 2025

NiharP31 commented Mar 27, 2025

Test Case 1: Markdown File Processing

Test Case 2: JSON Data File

Test Case 3: Mixed Content Input

Test Case 4: Large Text File Handling

ishaan-jaff commented Mar 27, 2025

johann-petrak commented Mar 27, 2025

NiharP31 commented Mar 28, 2025

krrishdholakia commented Mar 30, 2025