Skip to content

feat: @budget decorator for Flows — cost, token & request limits with HITL approval#4837

Open
alex-clawd wants to merge 7 commits intocrewAIInc:mainfrom
alex-clawd:feat/flow-cost-governor
Open

feat: @budget decorator for Flows — cost, token & request limits with HITL approval#4837
alex-clawd wants to merge 7 commits intocrewAIInc:mainfrom
alex-clawd:feat/flow-cost-governor

Conversation

@alex-clawd
Copy link

@alex-clawd alex-clawd commented Mar 13, 2026

Summary

This PR introduces a native Budget decorator for CrewAI Flows, enabling cost, token, and request limit enforcement with human-in-the-loop (HITL) approval when limits are exceeded.

Key Features

  • @budget decorator for Flow methods with configurable limits:

    • max_cost: Maximum cost in USD
    • max_tokens: Maximum total tokens
    • max_requests: Maximum LLM request count (NEW)
    • on_exceed: Action when limits are exceeded ('pause', 'stop', 'warn')
  • Custom pricing support (NEW):

    • cost_per_prompt_token / cost_per_completion_token: Flat per-token pricing
    • cost_map: Per-model pricing overrides
    • Priority: flat pricing > cost_map > DEFAULT_MODEL_COSTS
  • Comprehensive token tracking:

    • Registers event listener on LLMCallStartedEvent to count LLM requests
    • Extracts usage from CrewOutput, LiteAgentOutput, and any object with token_usage or usage_metrics attributes
    • Works with direct LLM calls, agent kickoffs, and crew kickoffs
  • Three enforcement modes:

    • 'pause' (default): Uses existing HITL infrastructure to request human approval to continue
    • 'stop': Raises BudgetExceededError immediately
    • 'warn': Logs warning and continues execution
  • Better HITL approval UX:

    • Shows current spend breakdown (cost, tokens, requests)
    • Indicates which limit(s) were hit
    • Supports approving specific additional amounts

API

@budget(
    max_cost: float | None = None,           # USD cap
    max_tokens: int | None = None,           # total token cap
    max_requests: int | None = None,         # LLM request count cap (NEW)
    on_exceed: 'pause' | 'stop' | 'warn' = 'pause',
    cost_per_prompt_token: float | None = None,      # custom flat pricing (NEW)
    cost_per_completion_token: float | None = None,  # custom flat pricing (NEW)
    cost_map: dict | None = None,            # per-model pricing override
)

Usage Example

from crewai.flow import Flow, start, listen, budget

class BudgetedFlow(Flow):
    @start()
    @budget(max_cost=5.00, max_requests=10, on_exceed='pause')
    def run_expensive_task(self):
        crew = MyCrew()
        return crew.kickoff()

    @listen(run_expensive_task)
    def process_results(self, result):
        # Access budget summary
        print(f"Total cost: ${self.budget_summary['estimated_cost']:.2f}")
        print(f"LLM requests: {self.budget_summary['total_requests']}")
        return result

# Custom pricing for negotiated rates:
@budget(
    max_cost=10.00,
    cost_per_prompt_token=0.000003,   # $3/1M tokens
    cost_per_completion_token=0.000015,  # $15/1M tokens
)
def custom_priced_task(self):
    ...

Breaking Changes

  • Renamed @cost_governor@budget
  • Renamed CostGovernorConfigBudgetConfig
  • Renamed CostTrackerBudgetTracker
  • Renamed cost_summarybudget_summary
  • Renamed parameters: budget_limitmax_cost, token_limitmax_tokens

Backwards-compatible aliases are provided: cost_governor, CostGovernorConfig, CostTracker

New Exports from crewai.flow

  • budget: The decorator
  • BudgetConfig: Configuration dataclass
  • BudgetTracker: Internal tracking class (for advanced use)
  • BudgetExceededError: Exception raised when limits exceeded and denied/stopped

Test Plan

  • Test max_cost triggers pause/stop/warn
  • Test max_tokens triggers pause/stop/warn
  • Test max_requests triggers pause/stop/warn (NEW)
  • Test request counting via event listener (NEW)
  • Test custom flat per-token pricing (NEW)
  • Test cost_map per-model overrides (NEW)
  • Test flat pricing overrides cost_map (NEW)
  • Test combined limits (cost + tokens + requests — first one hit triggers)
  • Test budget_summary includes request data (NEW)
  • Test cost accumulation across multiple method calls
  • Test approved continuation increases limits
  • Test denied continuation stops flow
  • Test with async flow methods
  • Test decorator preserves flow attributes (@start, @listen, etc.)
  • 63 comprehensive tests pass
  • 67 existing flow tests pass

🤖 Generated with Claude Code


Note

Medium Risk
Adds new opt-in runtime governance around LLM usage by hooking into the global event bus and Flow method wrapping, which could affect execution timing and accounting (especially with concurrent flows) if misconfigured.

Overview
Introduces a new @budget decorator for Flow methods that tracks token usage, estimated cost, and LLM request count, then enforces configurable limits via warn, stop (BudgetExceededError), or pause (HITL approval) behavior.

Budget tracking is integrated into Flow via a per-instance tracker and a new budget_summary property, and Flow method wrappers now preserve __budget_config__ metadata. The implementation adds event-bus listeners (LLMCallStartedEvent/LLMCallCompletedEvent) to count requests and capture per-call token deltas, with support for custom pricing (cost_map or per-token rates) and comprehensive new tests covering sync/async paths and approval/denial flows.

Written by Cursor Bugbot for commit 3ac17a4. This will update automatically on new commits. Configure here.

Add a @cost_governor decorator for Flow methods that enables budget and
token limit enforcement. Key features:

- Budget limits in USD with per-model pricing (GPT-4o, Claude, Gemini, etc.)
- Token limits for hard caps on total token usage
- Three on_exceed modes:
  - 'pause': Uses existing HITL infrastructure to ask human approval
  - 'stop': Raises BudgetExceededError immediately
  - 'warn': Logs warning and continues
- Cumulative cost tracking across flow execution via flow.cost_summary
- Custom cost_map support for non-standard model pricing
- Works with both sync and async flow methods

New exports from crewai.flow:
- cost_governor: The decorator
- CostGovernorConfig: Configuration dataclass
- CostTracker: Internal tracking class
- BudgetExceededError: Exception for stop/denied scenarios

Example usage:
    @start()
    @cost_governor(budget_limit=5.00, on_exceed='pause')
    def expensive_task(self):
        return crew.kickoff()

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix prefix matching to prefer longest match (e.g., gpt-4o-mini over gpt-4o)
- Use word-boundary matching for denial detection to avoid false positives
- Add approved_tokens tracking for token limit continuation
- Add effective_token_limit property to track total allowed tokens
- Update cost_summary to include new token tracking fields
- Add tests for prefix matching and token limit approval

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…nsive token tracking

BREAKING CHANGE: Renamed @cost_governor to @Budget decorator

Renamed:
- cost_governor.py → budget.py
- @cost_governor() → @Budget()
- CostGovernorConfig → BudgetConfig
- CostTracker → BudgetTracker
- cost_summary → budget_summary
- _cost_tracker → _budget_tracker
- budget_limit → max_cost
- token_limit → max_tokens

New features:
- max_requests: Limit total LLM requests (tracked via event bus)
- cost_per_prompt_token / cost_per_completion_token: Custom flat per-token pricing
- cost_map now supports per-model override pricing
- Priority: flat pricing > cost_map > DEFAULT_MODEL_COSTS
- Request counting via LLMCallStartedEvent listener
- Enhanced HITL approval message shows which limit was hit
- Extracts usage from LiteAgentOutput.usage_metrics

New BudgetTracker fields:
- total_requests: LLM request count
- approved_requests: Additional approved requests
- is_request_limit_exceeded property
- effective_request_limit property

Tests:
- 63 comprehensive tests covering all new functionality
- Tests for request limits, custom pricing, combined limits
- Async flow method support verified

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@alex-clawd alex-clawd changed the title feat: native cost governor for Flows feat: @budget decorator for Flows — cost, token & request limits with HITL approval Mar 13, 2026
…pshots

- Listen to LLMCallCompletedEvent to capture tokens from ALL LLM calls
  (raw LLM.call(), Agent.kickoff(), Crew.kickoff())
- Use pre/post snapshot of BaseLLM._token_usage for per-call deltas
- Avoid double-counting: skip result extraction when events captured tokens
- Wait for async event handler completion before checking limits
- Verified with real OpenAI API calls across all three scenarios

63 unit tests + 67 flow tests passing.
…ion snapshots, sleep removal

- Issue 3: Guard format strings for effective_budget/token_limit/request_limit
  that could be None with "N/A" fallback
- Issue 4: Only increase approved_tokens/approved_requests when their respective
  limits are exceeded (was always increasing approved_budget)
- Issue 5: Check approval patterns (yes, approve, continue, go ahead, proceed,
  ok, okay) BEFORE denial patterns to avoid false positives like "no problem"
  Also check for structured HITL emit responses first (approved/denied)
- Issue 6: Make budget regex require $ prefix, token regex require explicit
  "tokens" suffix or k/K suffix to avoid cross-parsing same numbers
- Issue 7: Add docstring note about concurrent flow limitation with global
  event bus
- Issue 8: Replace hardcoded 100ms sleep with polling loop (50ms max, 5ms
  intervals) that checks if tokens arrived via events, skipping immediately
  if no LLM requests were made
- Issue 9: Move _llm_snapshots dict inside wrapper functions so each method
  invocation gets its own dict, avoiding cross-instance interference

Co-Authored-By: Claude Opus 4.5 <[email protected]>
limits_str = ", ".join(limits_hit)

# Build message for human
budget_str = f"${tracker.max_cost:.2f}" if tracker.max_cost else "unlimited"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HITL approval message shows wrong budget limit

Medium Severity

The HITL approval message displays tracker.max_cost (the original budget) instead of tracker.effective_budget (which includes previously approved amounts). After a first approval (e.g., max_cost=$5, approved_budget=$5, effective_budget=$10), if the budget is exceeded again at $12, the message says "cost ($12.00 >= $5.00)" and "$5.00 budget" — implying the limit was $5, when the actual enforced limit was $10. This gives the human reviewer incorrect information to base their approval decision on.

Additional Locations (1)
Fix in Cursor Fix in Web

if re.search(approval_pattern, feedback_lower):
is_approved = True
elif re.search(denial_pattern, feedback_lower):
is_denied = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negated approval phrases falsely match as approved

Medium Severity

The feedback text parsing checks approval patterns before denial patterns, so negated phrases like "not ok", "not okay", or "no, ok fine" incorrectly match the approval regex (\bok\b / \bokay\b) and are treated as approval. Since the approval pattern is checked first and short-circuits, the denial keyword is never evaluated. This could cause unintended budget continuation with real cost implications.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

def _create_completion_tracker(
tracker: BudgetTracker,
cfg: BudgetConfig,
llm_snapshots: dict[int, dict[str, int]],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused cfg parameter in _create_completion_tracker

Low Severity

The cfg: BudgetConfig parameter of _create_completion_tracker is never referenced in the function body or its inner handler closure. It's passed at both call sites (lines 883 and 905) but serves no purpose, adding unnecessary noise to the interface.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants