Skip to content

Enable Streaming LLM Output #80

Open
@bibryam

Description

@bibryam

Problem Statement:
Currently, Dapr Agents waits for the full LLM response before returning results. This creates latency and degrades user experience, especially for long-form completions. LLMs inherently generate output token-by-token, and many use cases benefit from streaming responses as they’re produced.

Objective:
Implement streaming support for LLM output in Dapr Agents to improve perceived latency and responsiveness. Ensure consistent behavior across:
Agent types and specifically DaprChatClient (and its internal dependency Dapr Conversation API)

Acceptance Criteria:

  • Token-by-token streaming supported for all agent types.
  • DaprChatClient exposes streaming API or callback mechanism.
  • Dapr Conversation API supports HTTP and gRPC streaming (as applicable).
  • Examples and docs updated to demonstrate streaming usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions