-
Notifications
You must be signed in to change notification settings - Fork 250
Description
Microsoft.Identity.Web Library
Microsoft.Identity.Web.TokenAcquisition
Microsoft.Identity.Web version
3.12.0 (main branch, post-PR #3430)
Web app
Not Applicable
Web API
Not Applicable
Token cache serialization
Not Applicable
Description
Summary:
After PR #3430 ("Reload certificates for all client credential based issues"), the token acquisition retry handler now attempts a certificate reload and retry for any InvalidClient error during client credentials token acquisition, not just specific certificate problems. This change introduces the risk of infinite retry loops when using .WithAgentIdentities() and a misconfigured client credential (e.g., wrong ClientID, expired ClientSecret, etc).
Root cause analysis:
- The PR broadened the retry condition in
IsInvalidClientCertificateOrSignedAssertionErrorto allInvalidClienterrors. - The Agent Identities flow (see
WithAgentIdentities/WithAgentUserIdentity/AgentUserIdentityMsalAddIn) chains token requests: it first performs a client credential token request for the agent application, then another for the agent identity, and finally for the target resource. Each layer uses signed assertions and can propagate errors upward. - If the first request fails with InvalidClient (wrong ClientID/Secret or expired keys), the retry machinery resets certs and recurses, but the same error reoccurs in every loop, as the core problem is not a certificate rotation but a configuration error (e.g., client not found, secret invalid, etc). This leads to an infinite loop and the call never returns.
Confirmed symptoms:
- Observed with
.WithAgentIdentities()token acquisitions. - Easy to repro by setting a bad client id/secret/key for the agent application.
- Application hangs indefinitely (no exception escapes, call never returns).
- High CPU may be observed due to excessive retries, and throttling from ESTS
See full code analysis and stack below for more details.
Reproduction steps
Minimum reproduction:
- Configure downstream web API to use
.WithAgentIdentities()and valid FIC trust, but set the agent application'sClientIDorClientSecretincorrect or expired. - Attempt to acquire a token via
IAuthorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync()or...ForUserAsync()with the options from.WithAgentIdentities(). - The call will never return (infinite loop), and no exception escapes.
Example code:
var options = new AuthorizationHeaderProviderOptions().WithAgentIdentity("bad-client-id");
await authorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync("https://resource/.default", options);See also [README code](https://github.com/AzureAD/microsoft-identity-web/blob/master/src/Microsoft.Identity. Web.AgentIdentities/README.AgentIdentities.md) for usage scenarios.
Error message
No exception escapes; application hangs on token acquisition. When debugging, repeated calls to GetAuthenticationResultForAppAsync are observed (infinite recursion/loop). If logs are enabled, same error with identical stack is repeated (and helps understand / debug)
Errors that will trigger retry (and thus hang):
- Wrong ClientID (AADSTS700016: Application Not Found)
- Wrong/expired ClientSecret
- Expired certificate (legitimate rotation should retry)
See below for affected code and proposed detection improvements.
Id Web logs
Repeated attempts to acquire token for agent application identity. Logs show repeated MSAL error with error code InvalidClient and no progress.
Set MSAL log level to Verbose to capture repeated error and trace for innermost call stack. Attach log output if available.
Relevant code snippets
var options = new AuthorizationHeaderProviderOptions().WithAgentIdentity("bad-client-id");
await authorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync("https://resource/.default", options);
And/or for agent user identity:
var options = new AuthorizationHeaderProviderOptions().WithAgentUserIdentity("bad-client-id", "[email protected]");
await authorizationHeaderProvider.CreateAuthorizationHeaderForUserAsync(["https://resource/.default"], options, userPrincipal);Regression
3.11.0 (prior to PR #3430)
Expected behavior
- Token acquisition fails with an explicit error or exception ("InvalidClient" or AADSTS700016), does not retry infinitely.
- Application startup or downstream API call returns a handled exception, not a hang.
- The retry logic only applies to transient certificate errors, not misconfiguration.
Investigation
Affected retry logic
The retry logic in TokenAcquisition is overly broad after PR #3430 and can result in infinite loops when configuration errors (e.g., wrong ClientID/secret) occur, especially with .WithAgentIdentities():
// src/Microsoft.Identity.Web.TokenAcquisition/TokenAcquisition.cs#L889-L898
private bool IsInvalidClientCertificateOrSignedAssertionError(MsalServiceException exMsal)
{
return !_retryClientCertificate &&
string.Equals(exMsal.ErrorCode, Constants. InvalidClient, StringComparison. OrdinalIgnoreCase) &&
!exMsal.ResponseBody.Contains("AADSTS7000215" // No retry when wrong client secret.
#if NET6_0_OR_GREATER
, StringComparison.OrdinalIgnoreCase
#endif
);
}This triggers in all client credential error cases except for AADSTS7000215 (wrong client secret). For wrong/expired client IDs, and many other errors, this recursively retries inside:
// src/Microsoft.Identity. Web.TokenAcquisition/TokenAcquisition.cs#L685-L710
catch (MsalServiceException exMsal) when (IsInvalidClientCertificateOrSignedAssertionError(exMsal))
{
string applicationKey = GetApplicationKey(mergedOptions);
NotifyCertificateSelection(mergedOptions, application, CerticateObserverAction.Deselected, exMsal);
DefaultCertificateLoader.ResetCertificates(mergedOptions.ClientCredentials);
_applicationsByAuthorityClientId[applicationKey] = null;
// Retry
_retryClientCertificate = true;
return await GetAuthenticationResultForAppAsync(
scope,
authenticationScheme: authenticationScheme,
tenant: tenant,
tokenAcquisitionOptions: tokenAcquisitionOptions);
}
catch (MsalException ex)
{
Logger.TokenAcquisitionError(_logger, ex. Message, ex);
throw;
}
finally
{
_retryClientCertificate = false;
}How .WithAgentIdentities() exacerbates the problem
The agent identity flow (WithAgentIdentity/WithAgentUserIdentity) chains multiple token acquisitions. Here's how it's configured:
// src/Microsoft.Identity.Web. AgentIdentities/AgentIdentitiesExtension.cs#L39-L55
public static AuthorizationHeaderProviderOptions WithAgentIdentity(
this AuthorizationHeaderProviderOptions options,
string agentApplicationId)
{
// It's possible to start with no options, so we initialize it if it's null.
if (options == null)
options = new AuthorizationHeaderProviderOptions();
// AcquireTokenOptions holds the information needed to acquire a token for the Agent Identity
options.AcquireTokenOptions ??= new AcquireTokenOptions();
options.AcquireTokenOptions.ForAgentIdentity(agentApplicationId);
return options;
}During the request, the AgentUserIdentityMsalAddIn sets up an OnBeforeTokenRequestHandler which calls GetFicTokenAsync for both the agent application and agent identity:
// src/Microsoft.Identity.Web.AgentIdentities/AgentUserIdentityMsalAddIn.cs#L35-L60
OnBeforeTokenRequestHandler = async (request) =>
{
// Get the services from the service provider.
ITokenAcquirerFactory tokenAcquirerFactory = serviceProvider.GetRequiredService<ITokenAcquirerFactory>();
Abstractions.IAuthenticationSchemeInformationProvider authenticationSchemeInformationProvider =
serviceProvider.GetRequiredService<Abstractions.IAuthenticationSchemeInformationProvider>();
IOptionsMonitor<MicrosoftIdentityApplicationOptions> optionsMonitor =
serviceProvider.GetRequiredService<IOptionsMonitor<MicrosoftIdentityApplicationOptions>>();
// Get the FIC token for the agent application.
string authenticationScheme = authenticationSchemeInformationProvider. GetEffectiveAuthenticationScheme(options.AuthenticationOptionsName);
ITokenAcquirer agentApplicationTokenAcquirer = tokenAcquirerFactory.GetTokenAcquirer(authenticationScheme);
// ⚠️ THIS CALL USES REGULAR CLIENT CREDENTIALS AND TRIGGERS THE RETRY LOOP
AcquireTokenResult aaFic = await agentApplicationTokenAcquirer.GetFicTokenAsync(
new() { Tenant = options.Tenant, FmiPath = agentIdentity }
);
string? clientAssertion = aaFic. AccessToken;
// Get the FIC token for the agent identity.
MicrosoftIdentityApplicationOptions microsoftIdentityApplicationOptions = optionsMonitor.Get(authenticationScheme);
ITokenAcquirer agentIdentityTokenAcquirer = tokenAcquirerFactory.GetTokenAcquirer(new MicrosoftIdentityApplicationOptions
{
ClientId = agentIdentity,
Instance = microsoftIdentityApplicationOptions. Instance,
Authority = microsoftIdentityApplicationOptions. Authority,
TenantId = options.Tenant ?? microsoftIdentityApplicationOptions.TenantId
});
AcquireTokenResult aidFic = await agentIdentityTokenAcquirer.GetFicTokenAsync(
options: new() { Tenant = options.Tenant },
clientAssertion: clientAssertion
);
...
};The infinite loop occurs because:
- User calls
.WithAgentIdentities()with misconfigured agent application credentials AgentUserIdentityMsalAddIn. OnBeforeTokenRequestHandlerexecutes- First
GetFicTokenAsynccall (line 47) fails withInvalidClientdue to bad config - Exception caught by
IsInvalidClientCertificateOrSignedAssertionError→ returnstrue - Code resets certificates, sets
_retryClientCertificate = true, and recursively calls itself - On retry, same error occurs, but
_retryClientCertificateis nowtrue, so check returnsfalse - However, the outer calling layer still has
_retryClientCertificate = false, triggering another retry - Infinite loop - the configuration error never resolves
Proposed Fixes
Option 1: Tighten error filtering ✅ (Recommended)
Expand IsInvalidClientCertificateOrSignedAssertionError to avoid retrying in known cases of permanent config errors, not just transient certificate issues:
private bool IsInvalidClientCertificateOrSignedAssertionError(MsalServiceException exMsal)
{
return !_retryClientCertificate
&& string.Equals(exMsal.ErrorCode, Constants. InvalidClient, StringComparison. OrdinalIgnoreCase)
&& !exMsal.ResponseBody.Contains("AADSTS7000215", StringComparison.OrdinalIgnoreCase) // Wrong client secret
&& !exMsal.ResponseBody.Contains("AADSTS700016", StringComparison.OrdinalIgnoreCase); // Application not found (bad clientId)
}Pros:
- Surgical fix - only affects the specific error detection logic
- Preserves legitimate certificate rotation retry behavior
- Minimal code change
Cons:
- Requires maintaining list of error codes as new cases are discovered
Option 2: Add retry counter (safety net) 🛡️
Introduce a per-call retry counter to prevent infinite recursion under all error scenarios:
private int _retryCount = 0;
private const int MaxRetries = 1;
// In the catch block (line 685):
catch (MsalServiceException exMsal) when (IsInvalidClientCertificateOrSignedAssertionError(exMsal))
{
if (_retryCount >= MaxRetries)
{
Logger.TokenAcquisitionError(_logger, "Max certificate retry attempts reached", exMsal);
throw; // Don't retry again
}
_retryCount++;
string applicationKey = GetApplicationKey(mergedOptions);
NotifyCertificateSelection(mergedOptions, application, CerticateObserverAction.Deselected, exMsal);
DefaultCertificateLoader.ResetCertificates(mergedOptions.ClientCredentials);
_applicationsByAuthorityClientId[applicationKey] = null;
// Retry
_retryClientCertificate = true;
return await GetAuthenticationResultForAppAsync(
scope,
authenticationScheme: authenticationScheme,
tenant: tenant,
tokenAcquisitionOptions: tokenAcquisitionOptions);
}
finally
{
_retryClientCertificate = false;
_retryCount = 0; // Reset for next call
}Pros:
- Fail-safe mechanism prevents infinite loops regardless of error type
- Works for both known and unknown error scenarios
- Easy to adjust
MaxRetriesbased on real-world needs
Cons:
- Adds state management complexity (
_retryCountneeds thread safety consideration) - Could mask legitimate multi-retry certificate rotation scenarios
Option 3: Combined approach (Option 1 + Option 2) 🎯 (Most Robust)
Use both error filtering (Option 1) AND a retry counter (Option 2):
- Error filtering prevents known config errors from retrying
- Retry counter acts as a safety net for unknown/future error cases
This provides defense in depth.
Option 4: Suppress retries in nested token acquisition contexts
Add a flag/context parameter indicating we're in a nested token acquisition (e.g., agent identities FIC flow), and skip certificate retry logic in those cases:
// In TokenAcquisitionOptions
public bool SuppressCertificateRetry { get; set; }
// In GetAuthenticationResultForAppAsync catch block:
catch (MsalServiceException exMsal) when (
IsInvalidClientCertificateOrSignedAssertionError(exMsal)
&& !(tokenAcquisitionOptions?. SuppressCertificateRetry ?? false))
{
// ... retry logic
}
// In AgentUserIdentityMsalAddIn when calling GetFicTokenAsync:
AcquireTokenResult aaFic = await agentApplicationTokenAcquirer.GetFicTokenAsync(
new() {
Tenant = options. Tenant,
FmiPath = agentIdentity,
SuppressCertificateRetry = true // Skip retry in nested context
}
);Pros:
- Context-aware - doesn't affect normal certificate rotation scenarios
- Clean separation between top-level and nested calls
Cons:
- More invasive change to API surface
- Requires updates in multiple places
Recommendation
I recommend Option 3 (Combined Option 1 + 2) for maximum robustness:
- ✅ Add
AADSTS700016to error filtering (catches 90% of misconfig cases immediately) - ✅ Add retry counter with
MaxRetries = 1(safety net for unknown errors and nested calls)
This provides immediate relief for the reported issue while protecting against future similar scenarios.
Testing Strategy
Unit Tests
- Extend
WithClientCredentialsTests. csto verify retry behavior with variousInvalidClienterror codes - Add test in
AgentIdentitiesExtensionTests.csfor agent identity flow with bad client config
Integration Tests
- Add E2E test simulating misconfigured agent application in
TokenAcquirerTests - Verify exception is thrown (not infinite loop) within reasonable timeout
Example Test
[Fact]
public async Task AgentIdentity_WithInvalidClientId_ThrowsAfterOneRetry()
{
// Arrange: Configure with non-existent client ID
var options = new AuthorizationHeaderProviderOptions()
.WithAgentIdentity("00000000-0000-0000-0000-000000000000");
// Act & Assert: Should throw, not hang
var exception = await Assert.ThrowsAsync<MsalServiceException>(async () =>
{
await authorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync(
"https://graph.microsoft.com/. default",
options
);
});
Assert.Contains("AADSTS700016", exception.Message); // Application not found
}