Skip to content

Factor out ORCID-specific code #9751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jul 29, 2025
Merged

Factor out ORCID-specific code #9751

merged 31 commits into from
Jul 29, 2025

Conversation

seanh
Copy link
Member

@seanh seanh commented Jul 22, 2025

Depends on https://github.com/hypothesis/devdata/pull/121.

Refactor various unnecessarily ORCID-specific parts of the code to remove ORCID-specific naming and to parametrize hardcoded ORCID-specific values. This paves the way for adding Google and Facebook OIDC support to the same code in future.

Testing

See #9763.

Summary

This is quite a large diff. Summary of the main changes:

Renamed settings and added new settings

I've renamed some of the existing ORCID_* envvars to follow a naming style consistent with how we've been naming other OIDC things (e.g. URLs, route names). Also added some missing settings to replace previously hardcoded ORCID-specific values. The full set of ORCID-related settings is now:

Environment Variable Description
ORCID_HOST This is not an OIDC thing. It's just an ORCID thing. It's only used to generate links to users' public ORCID profile pages, e.g. https://orcid.org/0000-0002-6373-1308. In dev and on staging this will use sandbox.orcid.org.
OIDC_CLIENTID_ORCID Our ORCID client ID. For dev this comes from the devdata repo.
OIDC_CLIENTSECRET_ORCID Our ORCID client secret. For dev this comes from the devdata repo.
OIDC_AUTHORIZATIONURL_ORCID The ORCID URL that we redirect the browser to in order to kick off the authorization flow. There was previously no setting for this URL, it was constructed by combining ORCID_HOST with a hard-coded and ORCID-specific path.
OIDC_TOKENURL_ORCID The URL that we make server-to-server requests to in order to get ID tokens. Again there was previously no setting for this URL, it was constructed by combining ORCID_HOST with a hard-coded and ORCID-specific path.
OIDC_KEYSETURL_ORCID The URL that we retrieve ORCID's JWT signing public keys from. Again there was previously no setting for this URL, it was constructed by combining ORCID_HOST with a hard-coded and ORCID-specific path.

In future we'll need OIDC_*_GOOGLE and OIDC_*_FACEBOOK versions of all of these.

Services refactoring

On main there are two services ORCIDClientService and OpenIDClientService. ORCIDClientService doesn't actually contain any ORCID-specific code, just ORCID-specific naming and some hard-coded ORCID-specific values that're easily parametrizable. Meanwhile OpenIDClientService contains only one method get_id_token() that's called only once, from ORCIDClientService. This method can easily be inlined into ORCIDClientService and doing so makes the code much clearer.

These two services have been replaced with a new services/oidc.py::OIDCService. The naming is consistent with other parts of the code such as schemas/oidc.py and views/oidc.py. The new OIDCService's interface is as follows:

@dataclass
class OIDCServiceSettings:
    client_id: str
    client_secret: str
    redirect_uri: str
    token_url: str
    keyset_url: str

class OIDCService:
    def __init__(..., settings: dict[IdentityProvider, OIDCServiceSettings], ...):
        ...

    def get_provider_unique_id(self, provider, authorization_code):
        # Makes the server-to-server request to exchange the authorization_code for an
        # ID token, decodes the ID token, and returns the user's ORCID iD
        # (called provider_unique_id to avoid being ORCID-specific).

    def add_identity(self, user, provider, provider_unique_id):
       # Connects the given user's account to the given ORCID iD (provider_unique_id)
       # by adding a UserIdentity to the DB.
       
    def get_identity(self, user, provider):
        # Returns the existing UserIdentity from the DB for user and provider, or None.

def factory(context, request):
    return OIDCService(
        ...,
        settings={
            IdentityProvider.ORCID: OIDCServiceSettings(
                client_id=settings["oidc_clientid_orcid"],
                client_secret=settings["oidc_clientsecret_orcid"],
                redirect_uri=request.route_url("oidc.redirect.orcid"),
                token_url=settings["oidc_tokenurl_orcid"],
                keyset_url=settings["oidc_keyseturl_orcid"],
            ),
            ...
        },
        ...
    )

The settings object passed to OIDCService.__init__() maps providers to their provider-specific settings objects. Each of the three methods OIDCService.get_provider_unique_id(), add_identity(), and get_identity() then takes a provider argument which it uses to look up the provider's settings in self._settings.

Remove route_name predicates from @view_defaults

The code currently groups multiple views and exception views into classes and uses Pyramid's @view_defaults to avoid having to repeat view predicates including route_name:

@view_defaults(..., route_name="foo")
class MyViews:
    @view_config(...) # No need to repeat route_name="foo" here.
    def view1(self):
        ...
    @view_config(...) # No need to repeat route_name="foo" here.
    def view2(self):
        ...

In preparation for adding more OIDC/social login routes to the same views (e.g. "signup.google" and "signup.facebook" in addition to "signup.orcid") remove the route_name predicates from the @view_defaults because it's not possible to have multiple route names in @view_defaults. This doesn't work as you'd hope:

@view_defaults(..., route_name="foo")
@view_defaults(..., route_name="bar")
class MyViews:
    @view_config(...)
    def my_view(self):
        ...

You can't stack multiple @view_defaults decorators on a class. Instead you have to do this:

@view_defaults(...)  # No route_name in the view defaults.
class MyViews:
    @view_config(..., route_name="foo")
    @view_config(..., route_name="bar")
    def my_view(self):
        ...

This does unfortunately lead to some duplication in the @view_config's which could be error prone. A way to avoid the duplication would be to use the add_view() method to configure the views imperatively: https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/viewconfig.html#adding-view-configuration-using-add-view

But I don't think this would be worth the loss of the readability and locality afforded by the @view_defaults and @view_config decorators.

An ideal solution (possibly for a future PR) might be to add our own custom view predicate that allows something like this:

@view_defaults(..., route_names=["foo", "bar"])

https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/hooks.html#adding-a-custom-view-route-or-subscriber-predicate

Views refactoring

views/account_signup.py::ORCIDSignupViews is renamed to SocialSignupViews. (This view class is not an OIDC thing: it's our "Signup with <PROVIDER>" interstitial signup page, which is a custom page that we redirect to after the OIDC flow has finished.)

views/oidc.py::ORCIDConnectAndLoginViews is renamed to OIDCConnectAndLoginViews and views/oidc.py::ORCIDRedirectViews is renamed to OIDCRedirectViews.

Each of these three view classes will handle multiple URLs/routes, one for each provider:

Route URL View
oidc.connect.orcid and oidc.login.orcid /oidc/connect/orcid and /oidc/login/orcid OIDCConnectAndLoginViews.connect_or_login()
oidc.redirect.orcid /oidc/redirect/orcid OIDCRedirectViews.redirect()
signup.orcid /signup/orcid SocialSignupViews.get() and post()

In future new routes will be added for new providers, e.g. oidc.connect.google (/oidc/connect/google), oidc.login.google (/oidc/login/google), oidc.redirect.google (/oidc/redirect/google) and signup.google (/signup/google). These *.google and *.facebook routes will be handled by the same views that handle the *.orcid routes but the views will vary the settings they use depending on which route they're handling.

It's important to have a complete set of separate routes for each provider: it's a simple way to ensure that we never get any crossed-wires between providers, and is how the views know which provider they're handling. But separate routes can be handled by the same views.

This has been implemented by having a settings object for each view that contains all provider-specific values needed by that view. Different instances of these settings objects are constructed for different providers depending on the route, and the settings objects are then used by the view code:

@dataclass
class OIDCRedirectViewsSettings:
    """Per-route settings for OIDCRedirectViews."""

    provider_name: str
    ...

@view_defaults(request_method="GET", route_name="oidc.redirect.orcid")
class OIDCRedirectViews:
    def __init__(self, context, request: Request):
        self.context = context
        self.request = request
        ...

    @property
    def settings(self) -> OIDCRedirectViewsSettings:
        """Return the settings for the current route."""
        match self.request.matched_route.name:
            case "oidc.redirect.orcid":
                return OIDCRedirectViewsSettings(
                    provider_name="ORCID",
                    ...
                )
            case _:
                raise UnexpectedRouteName(route_name)

    @view_config()
    def redirect(self):
        # The view uses `self.settings` whenever it needs a route/provider-specific setting.
        state = self.request.session.pop(self.settings.state_session_key)
        ...

@seanh seanh force-pushed the factor-out-orcid-specific-code branch 2 times, most recently from 4f6d2ca to 81e8853 Compare July 23, 2025 15:31
@seanh seanh requested review from robertknight and Copilot July 23, 2025 16:16
Copilot

This comment was marked as outdated.

@hypothesis hypothesis deleted a comment from Copilot AI Jul 23, 2025
@seanh seanh force-pushed the factor-out-orcid-specific-code branch 2 times, most recently from 90c6292 to 39325d1 Compare July 24, 2025 11:08
@seanh seanh changed the base branch from main to add-instanceof-matcher July 24, 2025 11:08
@seanh seanh force-pushed the add-instanceof-matcher branch from 35b4792 to 4e7c932 Compare July 24, 2025 11:11
@seanh seanh force-pushed the factor-out-orcid-specific-code branch from 39325d1 to 60a36e3 Compare July 24, 2025 11:13
@seanh seanh force-pushed the factor-out-orcid-specific-code branch from dafcc0b to 1c0a21e Compare July 24, 2025 11:44
Base automatically changed from add-instanceof-matcher to main July 25, 2025 12:21
seanh added 12 commits July 25, 2025 13:22
There's nothing ORCID-specific in this schema.
And similarly rename the various CSS classes within. There's nothing
ORCID-specific about this CSS.
Replace ORCIDClientService and OpenIDClientService with OIDCService, a
redesigned service that contains all the functionality of the previous
two services.
There's nothing ORCID-specific about these views other than a few
values: IdentityProvider.ORCID,
JWTIssuers.SIGNUP_VALIDATION_FAILURE_ORCID,
audience=JWTAudiences.SIGNUP_ORCID. Gather these into a settings object
whose values vary by route (currently "signup.orcid" is the only route
supported).

This commit does also change the format of the view's js-config object
from:

    {"identity": {"orcid": {"id": orcid_id}}}

to:

    {"identity": {"provider_unique_id": provider_unique_id}}
Replace ORCIDConnectAndLoginViews and ORCIDRedirectViews with
SSOConnectAndLoginViews and SSORedirectViews and move all ORCID-specific
values into settings objects that vary by route.
This constant is only used in one file, move it into that file.
@seanh seanh marked this pull request as ready for review July 25, 2025 14:50
@seanh seanh requested a review from Copilot July 25, 2025 14:51
Copilot

This comment was marked as resolved.

@@ -1,11 +1,11 @@

.orcid-wrapper {
.sso-wrapper {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the CSS for the ORCID and Google buttons on the /account/settings page (which use the old-style frontend stuff). The CSS isn't ORCID-specific so I just renamed it.

TODO: Replace "sso" with "sociallogin" here, for consistency with names used in the Python code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Replace "sso" with "sociallogin" here, for consistency with names used in the Python code.

Done: a8609db

@hypothesis hypothesis deleted a comment from Copilot AI Jul 25, 2025
@hypothesis hypothesis deleted a comment from Copilot AI Jul 25, 2025
@robertknight
Copy link
Member

I verified that functionally the whole flow is working by implementing the changes in #9764 on top of this branch.

Copy link
Member

@robertknight robertknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an issue with the orcid_url generation due to how ORCID_HOST is set in setenv compared to the tests. Otherwise everything worked as expected and I was able to test with the changes in #9764.

client_secret: str
redirect_uri: str
token_url: str
keyset_url: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring would be useful to indicate the meaning of these. This could be a class-level docstring that just references the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: bf533db

"""Per-route settings for SocialLoginSignupViews."""

provider: IdentityProvider
issuer: JWTIssuers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdentityProvider is singular and JWTIssuers is plural, but both are enums. To me it would feel more conventional to always use the singular for an enum (as in "{enum_value} is an {enum_name}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: a1d406f

if orcid_id:
# The URL to the user's public ORCID profile page
# (for example: https://orcid.org/0000-0002-6373-1308).
orcid_url = urlunparse(urlparse(orcid_host)._replace(path=orcid_id))
Copy link
Member

@robertknight robertknight Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought for the future: If we have a public URL for the identity, it might be helpful to include that in the configuration for the /signup/orcid view. Then the ID reference in Connected: {ID} could link to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h/views/oidc.py Outdated
client_id: str
authorization_url: str
redirect_uri: str
action: str
Copy link
Member

@robertknight robertknight Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the possible values of action here? Just login and connect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and actually there's already a literal type for this used elsewhere in the same file, fixed: e6ff9fe

class OIDCConnectAndLoginViewsSettings:
"""Per-route settings for OIDCConnectAndLoginViews."""

state_sessionkey: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring would be useful for this field, as the meaning is less obvious than the others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: a5a5014

h/views/oidc.py Outdated
idinfo_jwtaudience: JWTAudiences
provider: IdentityProvider
success_message: str
signup_route: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is presumably a route name rather than a route URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Renamed the variable to clarify: 2dd77b1

@@ -29,13 +29,13 @@
</div>

{% if orcid %}
<a href="{{ orcid_url }}" class="orcid-connect" target="_blank" rel="nofollow noopener">
<a href="{{ orcid_url }}" class="sso-connect" target="_blank" rel="nofollow noopener">
Copy link
Member

@robertknight robertknight Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link took me to http://hypothesis.local:5000/account/0000-0002-1961-4147 in testing. This is related to ORCID_HOST being set to a hostname in the dev env, but the tests set it to a URL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 327fb1f

@@ -11,6 +11,10 @@ dev: H_API_AUTH_COOKIE_SECRET_KEY = {env:H_API_AUTH_COOKIE_SECRET_KEY:"dev_h_api
dev: H_API_AUTH_COOKIE_SALT = {env:H_API_AUTH_COOKIE_SALT:"dev_h_api_auth_cookie_salt"}
dev: REPLICA_DATABASE_URL = {env:DATABASE_URL:postgresql://postgres@localhost/postgres}
dev: MAILCHIMP_USER_ACTIONS_SUBACCOUNT = {env:MAILCHIMP_USER_ACTIONS_SUBACCOUNT:devdata}
dev: ORCID_HOST = sandbox.orcid.org
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets ORCID_HOST to a hostname, but the test_get_returns_orcid_data test sets the value to a URL with a scheme. As a result the orcid_url is generated correctly in tests but doesn't work in the app.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 327fb1f

seanh added 7 commits July 29, 2025 10:37
The singular naming style is consistent with other enums (e.g.
IdentityProvider) and reads better.
For consistency with the nomenclature used in the Python code.
@seanh seanh requested a review from robertknight July 29, 2025 10:09
robertknight
robertknight previously approved these changes Jul 29, 2025
client_secret: str
"""Our OAuth/OIDC client_id for this provider."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Our OAuth/OIDC client_id for this provider."""
"""Our OAuth/OIDC client_secret for this provider."""

Co-authored-by: Robert Knight <[email protected]>
@seanh seanh merged commit f25d9ae into main Jul 29, 2025
11 checks passed
@seanh seanh deleted the factor-out-orcid-specific-code branch July 29, 2025 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants