Skip to content

Send Token Usage to Frontend for Each Request #656

@tanweersalah

Description

@tanweersalah

Description:

We would like to add support for sending token usage details to the frontend as part of the response for each API request. This will enable the frontend to inform users when a request has consumed a significant number of tokens.

Motivation

Currently, users have no visibility into how many tokens each of their queries consumes. By exposing this information, the frontend can:

  • Display token usage per message.
  • Warn users when a query uses a large number of tokens.
  • Help users optimize their prompts and usage patterns.
  • Promote transparency and prevent unexpected limits or slowdowns.

Proposed Solution

  • Include token usage information (prompt tokens, completion tokens, total tokens) in the API response metadata.

  • Example:

    {
      "response": "Your assistant reply...",
      "usage": {
        "prompt_tokens": 120,
        "completion_tokens": 450,
        "total_tokens": 570
      }
    }
  • Frontend can use this data to display warnings or usage insights in the chat UI.

Considerations

  • Should be behind a feature flag or config option if needed.
  • Should handle error cases where usage information might not be available.

Additional Context

This is especially useful for services with token-based quotas or billing. Giving users more visibility helps manage usage more effectively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions