generated from kyma-project/template-repository
-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
Description
Description:
We would like to add support for sending token usage details to the frontend as part of the response for each API request. This will enable the frontend to inform users when a request has consumed a significant number of tokens.
Motivation
Currently, users have no visibility into how many tokens each of their queries consumes. By exposing this information, the frontend can:
- Display token usage per message.
- Warn users when a query uses a large number of tokens.
- Help users optimize their prompts and usage patterns.
- Promote transparency and prevent unexpected limits or slowdowns.
Proposed Solution
-
Include token usage information (prompt tokens, completion tokens, total tokens) in the API response metadata.
-
Example:
{ "response": "Your assistant reply...", "usage": { "prompt_tokens": 120, "completion_tokens": 450, "total_tokens": 570 } }
-
Frontend can use this data to display warnings or usage insights in the chat UI.
Considerations
- Should be behind a feature flag or config option if needed.
- Should handle error cases where usage information might not be available.
Additional Context
This is especially useful for services with token-based quotas or billing. Giving users more visibility helps manage usage more effectively.