Skip to content

Bug Report: Simple Cache Incorrectly Converted to Semantic Cache Causing Platform Outage #1237

@leofmarciano

Description

@leofmarciano

What Happened?

Bug Report: Simple Cache Incorrectly Converted to Semantic Cache Causing Platform Outage

Severity: Critical
Impact: Platform-wide outage affecting all customers
Environment: Production

Description

A critical issue occurred where a simple cache implementation was inadvertently converted to semantic caching, resulting in a cache hit rate approaching 100%. This caused a complete platform failure while all customers were actively using the system.

Expected Behavior

  • Simple cache should maintain exact key matching
  • Cache hit rates should reflect normal usage patterns
  • Platform should remain operational under standard load

Actual Behavior

  • Cache began matching semantically similar requests instead of exact keys
  • Cache hit rate increased to nearly 100%
  • Platform became unresponsive and stopped functioning
  • All active customer operations were disrupted

Impact

  • Complete platform outage
  • All customers unable to access services
  • Business operations halted

Steps to Reproduce

  1. System was operating normally with simple cache configuration
  2. Cache behavior changed to semantic matching (root cause to be investigated)
  3. Cache hit rate rapidly increased to ~100%
  4. Platform performance degraded until complete failure

Priority

This requires immediate attention as it resulted in a production outage affecting all customers.

We are on portkey cloud, started with these PR: #1236

What Should Have Happened?

No response

Relevant Code Snippet

No response

Your Twitter/LinkedIn

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions