Skip to content

Conversation

@gladjohn
Copy link
Contributor

@gladjohn gladjohn commented Oct 8, 2025

Added detailed caching strategy and resilience plan for Managed Identity v2, including problem identification, proposed solutions, call sequence, cache renewal matrix, invalidation rules, and security considerations.

Added detailed caching strategy and resilience plan for Managed Identity v2, including problem identification, proposed solutions, call sequence, cache renewal matrix, invalidation rules, and security considerations.
@gladjohn gladjohn requested a review from a team as a code owner October 8, 2025 15:43
## Solution (What’s Changing)
1. **Probe once** (link-local) to detect **MSI v2** → cache result **in-proc**.
2. Treat the **binding certificate** (from IMDS `/issuecredential`) as the **primary anchor** (~7-day validity); use it to get ATs.
3. **Proactive renewal at half-life (+ small jitter)** to rotate well before expiry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls be precise. Specify:

  • jitter (e.g. 5 min)
  • if renewal should happen on front-end or back-end thread. I think front-end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is jitter calculated? Is it randomized per host/process or globally coordinated? Could jitter introduce any unintended renewal delays?

Copy link
Member

@bgavrilMS bgavrilMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not enough details.

Copy link
Contributor

@Robbie-Microsoft Robbie-Microsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For each cache and renewal step, document what happens if the cache is missing, invalid, or corrupted.
  • Outline (even briefly) the implementation details of the single-writer system.

# Managed Identity v2 (Attested TB) — Resilience & Caching Plan

## TL;DR
We reduce cold-start latency and dependency risk for MSI v2 by caching safe, long-lived artifacts, coordinating renewal across processes, and keeping the hot path in memory. **MAA is used only to (re)issue the binding certificate**; bound AT acquisition relies on that cert. Result: fewer failures, less churn, smoother CX.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the fallback if the binding cert is lost or corrupted? Is there any emergency recovery path?

## Solution (What’s Changing)
1. **Probe once** (link-local) to detect **MSI v2** → cache result **in-proc**.
2. Treat the **binding certificate** (from IMDS `/issuecredential`) as the **primary anchor** (~7-day validity); use it to get ATs.
3. **Proactive renewal at half-life (+ small jitter)** to rotate well before expiry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is jitter calculated? Is it randomized per host/process or globally coordinated? Could jitter introduce any unintended renewal delays?

Updated the caching strategy for MSI v2 to enhance resilience and reduce cold-start latency. Key changes include improved certificate renewal processes and better caching mechanisms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants