Skip to content

[Feature]: Add CometAPI as Model Provider for Enhanced Multimodal Capabilities #1692

@TensorNull

Description

@TensorNull

Which destkop app does this feature request relate to?

  • Select a project 👇
  • Agent-TARS (cli, server, agent, tool etc.)
  • UI-TARS Desktop

What problem does this feature solve?

UI-TARS Desktop currently supports various AI model providers for its multimodal GUI agent capabilities. Adding CometAPI as a model provider would give users access to 500+ different AI models and enhanced multimodal generation capabilities (text, images, video, audio), expanding the automation possibilities for both desktop and browser operations.

What does the proposed features look like?

Integrate CometAPI as a configurable model provider option in both Agent-TARS CLI and UI-TARS Desktop applications. This would allow users to:

  • Configure CometAPI endpoints (base URL: https://api.cometapi.com/v1/)
  • Access 500+ AI models through OpenAI-compatible APIs
  • Leverage multimodal capabilities for GUI automation tasks
  • Use CometAPI models for vision-language tasks in computer use scenarios

CometAPI Resources

Implementation Offer
We'd be glad to help integrate CometAPI into your project. If you're interested, we can submit a pull request that aligns with your project's coding standards and guidelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FeatureNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions