Skip to content

Infinite retry loop in Agent Identities flow when client credential config is invalid (regression from PR #3430) #3654

@jmprieur

Description

@jmprieur

Microsoft.Identity.Web Library

Microsoft.Identity.Web.TokenAcquisition

Microsoft.Identity.Web version

3.12.0 (main branch, post-PR #3430)

Web app

Not Applicable

Web API

Not Applicable

Token cache serialization

Not Applicable

Description

Summary:

After PR #3430 ("Reload certificates for all client credential based issues"), the token acquisition retry handler now attempts a certificate reload and retry for any InvalidClient error during client credentials token acquisition, not just specific certificate problems. This change introduces the risk of infinite retry loops when using .WithAgentIdentities() and a misconfigured client credential (e.g., wrong ClientID, expired ClientSecret, etc).

Root cause analysis:

  • The PR broadened the retry condition in IsInvalidClientCertificateOrSignedAssertionError to all InvalidClient errors.
  • The Agent Identities flow (see WithAgentIdentities/WithAgentUserIdentity/AgentUserIdentityMsalAddIn) chains token requests: it first performs a client credential token request for the agent application, then another for the agent identity, and finally for the target resource. Each layer uses signed assertions and can propagate errors upward.
  • If the first request fails with InvalidClient (wrong ClientID/Secret or expired keys), the retry machinery resets certs and recurses, but the same error reoccurs in every loop, as the core problem is not a certificate rotation but a configuration error (e.g., client not found, secret invalid, etc). This leads to an infinite loop and the call never returns.

Confirmed symptoms:

  • Observed with .WithAgentIdentities() token acquisitions.
  • Easy to repro by setting a bad client id/secret/key for the agent application.
  • Application hangs indefinitely (no exception escapes, call never returns).
  • High CPU may be observed due to excessive retries, and throttling from ESTS

See full code analysis and stack below for more details.

Reproduction steps

Minimum reproduction:

  1. Configure downstream web API to use .WithAgentIdentities() and valid FIC trust, but set the agent application's ClientID or ClientSecret incorrect or expired.
  2. Attempt to acquire a token via IAuthorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync() or ...ForUserAsync() with the options from .WithAgentIdentities().
  3. The call will never return (infinite loop), and no exception escapes.

Example code:

var options = new AuthorizationHeaderProviderOptions().WithAgentIdentity("bad-client-id");
await authorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync("https://resource/.default", options);

See also [README code](https://github.com/AzureAD/microsoft-identity-web/blob/master/src/Microsoft.Identity. Web.AgentIdentities/README.AgentIdentities.md) for usage scenarios.

Error message

No exception escapes; application hangs on token acquisition. When debugging, repeated calls to GetAuthenticationResultForAppAsync are observed (infinite recursion/loop). If logs are enabled, same error with identical stack is repeated (and helps understand / debug)

Errors that will trigger retry (and thus hang):

  • Wrong ClientID (AADSTS700016: Application Not Found)
  • Wrong/expired ClientSecret
  • Expired certificate (legitimate rotation should retry)

See below for affected code and proposed detection improvements.

Id Web logs

Repeated attempts to acquire token for agent application identity. Logs show repeated MSAL error with error code InvalidClient and no progress.

Set MSAL log level to Verbose to capture repeated error and trace for innermost call stack. Attach log output if available.

Relevant code snippets

var options = new AuthorizationHeaderProviderOptions().WithAgentIdentity("bad-client-id");
await authorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync("https://resource/.default", options);


And/or for agent user identity: 

var options = new AuthorizationHeaderProviderOptions().WithAgentUserIdentity("bad-client-id", "[email protected]");
await authorizationHeaderProvider.CreateAuthorizationHeaderForUserAsync(["https://resource/.default"], options, userPrincipal);

Regression

3.11.0 (prior to PR #3430)

Expected behavior

  • Token acquisition fails with an explicit error or exception ("InvalidClient" or AADSTS700016), does not retry infinitely.
  • Application startup or downstream API call returns a handled exception, not a hang.
  • The retry logic only applies to transient certificate errors, not misconfiguration.

Investigation

Affected retry logic

The retry logic in TokenAcquisition is overly broad after PR #3430 and can result in infinite loops when configuration errors (e.g., wrong ClientID/secret) occur, especially with .WithAgentIdentities():

// src/Microsoft.Identity.Web.TokenAcquisition/TokenAcquisition.cs#L889-L898
private bool IsInvalidClientCertificateOrSignedAssertionError(MsalServiceException exMsal)
{
    return !_retryClientCertificate &&
        string.Equals(exMsal.ErrorCode, Constants. InvalidClient, StringComparison. OrdinalIgnoreCase) &&
        !exMsal.ResponseBody.Contains("AADSTS7000215" // No retry when wrong client secret. 
#if NET6_0_OR_GREATER
        , StringComparison.OrdinalIgnoreCase
#endif
        );
}

This triggers in all client credential error cases except for AADSTS7000215 (wrong client secret). For wrong/expired client IDs, and many other errors, this recursively retries inside:

// src/Microsoft.Identity. Web.TokenAcquisition/TokenAcquisition.cs#L685-L710
catch (MsalServiceException exMsal) when (IsInvalidClientCertificateOrSignedAssertionError(exMsal))
{
    string applicationKey = GetApplicationKey(mergedOptions);
    NotifyCertificateSelection(mergedOptions, application, CerticateObserverAction.Deselected, exMsal);
    DefaultCertificateLoader.ResetCertificates(mergedOptions.ClientCredentials);
    _applicationsByAuthorityClientId[applicationKey] = null;

    // Retry
    _retryClientCertificate = true;
    return await GetAuthenticationResultForAppAsync(
        scope,
        authenticationScheme:  authenticationScheme,
        tenant: tenant,
        tokenAcquisitionOptions: tokenAcquisitionOptions);
}
catch (MsalException ex)
{
    Logger.TokenAcquisitionError(_logger, ex. Message, ex);
    throw;
}
finally
{
    _retryClientCertificate = false;
}

How .WithAgentIdentities() exacerbates the problem

The agent identity flow (WithAgentIdentity/WithAgentUserIdentity) chains multiple token acquisitions. Here's how it's configured:

// src/Microsoft.Identity.Web. AgentIdentities/AgentIdentitiesExtension.cs#L39-L55
public static AuthorizationHeaderProviderOptions WithAgentIdentity(
    this AuthorizationHeaderProviderOptions options, 
    string agentApplicationId)
{
    // It's possible to start with no options, so we initialize it if it's null. 
    if (options == null)
        options = new AuthorizationHeaderProviderOptions();

    // AcquireTokenOptions holds the information needed to acquire a token for the Agent Identity
    options.AcquireTokenOptions ??= new AcquireTokenOptions();
    options.AcquireTokenOptions.ForAgentIdentity(agentApplicationId);

    return options;
}

During the request, the AgentUserIdentityMsalAddIn sets up an OnBeforeTokenRequestHandler which calls GetFicTokenAsync for both the agent application and agent identity:

// src/Microsoft.Identity.Web.AgentIdentities/AgentUserIdentityMsalAddIn.cs#L35-L60
OnBeforeTokenRequestHandler = async (request) =>
{
    // Get the services from the service provider. 
    ITokenAcquirerFactory tokenAcquirerFactory = serviceProvider.GetRequiredService<ITokenAcquirerFactory>();
    Abstractions.IAuthenticationSchemeInformationProvider authenticationSchemeInformationProvider =
        serviceProvider.GetRequiredService<Abstractions.IAuthenticationSchemeInformationProvider>();
    IOptionsMonitor<MicrosoftIdentityApplicationOptions> optionsMonitor =
        serviceProvider.GetRequiredService<IOptionsMonitor<MicrosoftIdentityApplicationOptions>>();

    // Get the FIC token for the agent application.
    string authenticationScheme = authenticationSchemeInformationProvider. GetEffectiveAuthenticationScheme(options.AuthenticationOptionsName);
    ITokenAcquirer agentApplicationTokenAcquirer = tokenAcquirerFactory.GetTokenAcquirer(authenticationScheme);
    
    // ⚠️ THIS CALL USES REGULAR CLIENT CREDENTIALS AND TRIGGERS THE RETRY LOOP
    AcquireTokenResult aaFic = await agentApplicationTokenAcquirer.GetFicTokenAsync(
        new() { Tenant = options.Tenant, FmiPath = agentIdentity }
    ); 
    string?  clientAssertion = aaFic. AccessToken;

    // Get the FIC token for the agent identity.
    MicrosoftIdentityApplicationOptions microsoftIdentityApplicationOptions = optionsMonitor.Get(authenticationScheme);
    ITokenAcquirer agentIdentityTokenAcquirer = tokenAcquirerFactory.GetTokenAcquirer(new MicrosoftIdentityApplicationOptions
    {
        ClientId = agentIdentity,
        Instance = microsoftIdentityApplicationOptions. Instance,
        Authority = microsoftIdentityApplicationOptions. Authority,
        TenantId = options.Tenant ??  microsoftIdentityApplicationOptions.TenantId
    });
    AcquireTokenResult aidFic = await agentIdentityTokenAcquirer.GetFicTokenAsync(
        options: new() { Tenant = options.Tenant }, 
        clientAssertion:  clientAssertion
    );
    ... 
};

The infinite loop occurs because:

  1. User calls .WithAgentIdentities() with misconfigured agent application credentials
  2. AgentUserIdentityMsalAddIn. OnBeforeTokenRequestHandler executes
  3. First GetFicTokenAsync call (line 47) fails with InvalidClient due to bad config
  4. Exception caught by IsInvalidClientCertificateOrSignedAssertionError → returns true
  5. Code resets certificates, sets _retryClientCertificate = true, and recursively calls itself
  6. On retry, same error occurs, but _retryClientCertificate is now true, so check returns false
  7. However, the outer calling layer still has _retryClientCertificate = false, triggering another retry
  8. Infinite loop - the configuration error never resolves

Proposed Fixes

Option 1: Tighten error filtering ✅ (Recommended)

Expand IsInvalidClientCertificateOrSignedAssertionError to avoid retrying in known cases of permanent config errors, not just transient certificate issues:

private bool IsInvalidClientCertificateOrSignedAssertionError(MsalServiceException exMsal)
{
    return !_retryClientCertificate
        && string.Equals(exMsal.ErrorCode, Constants. InvalidClient, StringComparison. OrdinalIgnoreCase)
        && !exMsal.ResponseBody.Contains("AADSTS7000215", StringComparison.OrdinalIgnoreCase) // Wrong client secret
        && !exMsal.ResponseBody.Contains("AADSTS700016", StringComparison.OrdinalIgnoreCase); // Application not found (bad clientId)
}

Pros:

  • Surgical fix - only affects the specific error detection logic
  • Preserves legitimate certificate rotation retry behavior
  • Minimal code change

Cons:

  • Requires maintaining list of error codes as new cases are discovered

Option 2: Add retry counter (safety net) 🛡️

Introduce a per-call retry counter to prevent infinite recursion under all error scenarios:

private int _retryCount = 0;
private const int MaxRetries = 1;

// In the catch block (line 685):
catch (MsalServiceException exMsal) when (IsInvalidClientCertificateOrSignedAssertionError(exMsal))
{
    if (_retryCount >= MaxRetries)
    {
        Logger.TokenAcquisitionError(_logger, "Max certificate retry attempts reached", exMsal);
        throw; // Don't retry again
    }
    
    _retryCount++;
    string applicationKey = GetApplicationKey(mergedOptions);
    NotifyCertificateSelection(mergedOptions, application, CerticateObserverAction.Deselected, exMsal);
    DefaultCertificateLoader.ResetCertificates(mergedOptions.ClientCredentials);
    _applicationsByAuthorityClientId[applicationKey] = null;

    // Retry
    _retryClientCertificate = true;
    return await GetAuthenticationResultForAppAsync(
        scope,
        authenticationScheme: authenticationScheme,
        tenant:  tenant,
        tokenAcquisitionOptions: tokenAcquisitionOptions);
}
finally
{
    _retryClientCertificate = false;
    _retryCount = 0; // Reset for next call
}

Pros:

  • Fail-safe mechanism prevents infinite loops regardless of error type
  • Works for both known and unknown error scenarios
  • Easy to adjust MaxRetries based on real-world needs

Cons:

  • Adds state management complexity (_retryCount needs thread safety consideration)
  • Could mask legitimate multi-retry certificate rotation scenarios

Option 3: Combined approach (Option 1 + Option 2) 🎯 (Most Robust)

Use both error filtering (Option 1) AND a retry counter (Option 2):

  • Error filtering prevents known config errors from retrying
  • Retry counter acts as a safety net for unknown/future error cases

This provides defense in depth.


Option 4: Suppress retries in nested token acquisition contexts

Add a flag/context parameter indicating we're in a nested token acquisition (e.g., agent identities FIC flow), and skip certificate retry logic in those cases:

// In TokenAcquisitionOptions
public bool SuppressCertificateRetry { get; set; }

// In GetAuthenticationResultForAppAsync catch block: 
catch (MsalServiceException exMsal) when (
    IsInvalidClientCertificateOrSignedAssertionError(exMsal) 
    && !(tokenAcquisitionOptions?. SuppressCertificateRetry ?? false))
{
    // ...  retry logic
}

// In AgentUserIdentityMsalAddIn when calling GetFicTokenAsync:
AcquireTokenResult aaFic = await agentApplicationTokenAcquirer.GetFicTokenAsync(
    new() { 
        Tenant = options. Tenant, 
        FmiPath = agentIdentity,
        SuppressCertificateRetry = true  // Skip retry in nested context
    }
);

Pros:

  • Context-aware - doesn't affect normal certificate rotation scenarios
  • Clean separation between top-level and nested calls

Cons:

  • More invasive change to API surface
  • Requires updates in multiple places

Recommendation

I recommend Option 3 (Combined Option 1 + 2) for maximum robustness:

  1. ✅ Add AADSTS700016 to error filtering (catches 90% of misconfig cases immediately)
  2. ✅ Add retry counter with MaxRetries = 1 (safety net for unknown errors and nested calls)

This provides immediate relief for the reported issue while protecting against future similar scenarios.


Testing Strategy

Unit Tests

  • Extend WithClientCredentialsTests. cs to verify retry behavior with various InvalidClient error codes
  • Add test in AgentIdentitiesExtensionTests.cs for agent identity flow with bad client config

Integration Tests

  • Add E2E test simulating misconfigured agent application in TokenAcquirerTests
  • Verify exception is thrown (not infinite loop) within reasonable timeout

Example Test

[Fact]
public async Task AgentIdentity_WithInvalidClientId_ThrowsAfterOneRetry()
{
    // Arrange:  Configure with non-existent client ID
    var options = new AuthorizationHeaderProviderOptions()
        .WithAgentIdentity("00000000-0000-0000-0000-000000000000");
    
    // Act & Assert:  Should throw, not hang
    var exception = await Assert.ThrowsAsync<MsalServiceException>(async () =>
    {
        await authorizationHeaderProvider.CreateAuthorizationHeaderForAppAsync(
            "https://graph.microsoft.com/. default", 
            options
        );
    });
    
    Assert.Contains("AADSTS700016", exception.Message); // Application not found
}

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingregressionregression between Microsoft Identity Web versionstoken acquisition

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions