Skip to content

feat(tools): allow readFile to read images for supported models #539

@wsxiaoys

Description

@wsxiaoys

Is your feature request related to a problem? Please describe.
Currently, the readFile tool only supports reading text files. It would be beneficial to extend its functionality to allow reading image files for models that support image input (e.g., Gemini, Anthropic).

Describe the solution you'd like
I propose updating the readFile tool to detect the file type based on its extension or MIME type. If the file is an image, the tool should read it as a base64-encoded string and return it in a format that can be consumed by multimedia-capable models.

The implementation could be similar to how image outputs are handled in packages/livekit/src/chat/mcp-utils.ts.

Specifically, the outputSchema of the readFile tool in packages/tools/src/read-file.ts could be updated to support a content union type, similar to the ContentOutput in mcp-utils.ts:

const ContentOutput = z.union([
  z.object({
    ...
  }),
]);

Describe alternatives you've considered
An alternative would be to create a new tool specifically for reading images, but extending the existing readFile tool seems more intuitive and efficient.

Additional context
This feature would enhance Pochi's ability to work with multimodal models and enable more complex interactions involving images.

Relevant files:

🤖 Generated with Pochi

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions