Skip to content

[BUG/Question] Race Condition in OpenID Configuration Refresh #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidhuser opened this issue Feb 24, 2025 · 1 comment
Open

[BUG/Question] Race Condition in OpenID Configuration Refresh #233

davidhuser opened this issue Feb 24, 2025 · 1 comment
Labels
question Further information is requested

Comments

@davidhuser
Copy link
Contributor

Race Condition in OpenID Configuration Refresh

Describe the bug

There's a potential race condition in the OpenID configuration refresh mechanism. When multiple concurrent requests try to refresh the configuration simultaneously, all requests trigger separate calls to the configuration and keys endpoints, even though only one refresh is necessary.

To Reproduce

  1. Create a scenario with expired or uninitialized OpenID configuration
  2. Make multiple concurrent requests that would trigger a configuration refresh
  3. Observe multiple identical calls to the configuration and keys endpoints

Example test case:

async def test_concurrent_refresh_requests():
    """Test that concurrent refreshes are handled correctly"""
    with respx.mock(assert_all_called=True) as mock:

        async def slow_config_response(*args, **kwargs):
            await asyncio.sleep(0.2)
            return httpx.Response(200, json=openid_configuration())

        async def slow_keys_response(*args, **kwargs):
            await asyncio.sleep(0.2)
            return httpx.Response(200, json=build_openid_keys())

        config_route = mock.get(openid_config_url()).mock(side_effect=slow_config_response)
        keys_route = mock.get(keys_url()).mock(side_effect=slow_keys_response)

        azure_scheme.openid_config._config_timestamp = None

        tasks = [azure_scheme.openid_config.load_config() for _ in range(5)]
        await asyncio.gather(*tasks)

        assert len(config_route.calls) == 1, "Config endpoint called multiple times"
        assert len(keys_route.calls) == 1, "Keys endpoint called multiple times"
        assert len(azure_scheme.openid_config.signing_keys) == 2

Leading to:

FAILED tests/test_provider_config.py::test_concurrent_refresh_requests - AssertionError: Config endpoint called multiple times

Your configuration

n/a

Root cause analysis

The current implementation doesn't protect against concurrent refresh requests. When the configuration is expired or uninitialized and multiple requests arrive simultaneously, each request independently determines a refresh is needed and initiates its own HTTP calls.

Suggested fix

Use an asyncio.Lock() to ensure only one refresh operation happens at a time:

import asyncio

class OpenIdConfig:
    def __init__(
        self,
        ...
    ) -> None:
       ...
        self._lock = asyncio.Lock() # add this to the global config object

    async def load_config(self) -> None:
        """
        Loads config from the Intility openid-config endpoint if it's over 24 hours old (or don't exist)
        """
        async with self._lock:  # acquire lock and refresh
            # do the refresh

This implementation would lead the above test to pass.

Now I'm wondering if such a race condition is actually a problem or is it just theoretical? I must admit I am no expert in asyncio but stumbled upon this issue when testing.

@davidhuser davidhuser added the question Further information is requested label Feb 24, 2025
@JonasKs
Copy link
Member

JonasKs commented Feb 24, 2025

Good catch! If no requests has been done to the endpoint in 24 hours, we refresh. This can be done 5 times, async, where each request will override the previous, as your test proves. Since this is something that very, very rarely changes, it won't have a security impact.

With that said, I am very much in favor of adding a lock. It makes it easier to reason about the code, and any external call we can avoid is good. If there are 5000 requests hitting this endpoint at the same time, the current implementation would struggle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants