-
Notifications
You must be signed in to change notification settings - Fork 7
allow custom cors headers for stac #599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3755/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/491/NOTEBOOK TEST RESULTS |
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing it. Looks about right with minor tweaks.
| include /etc/nginx/conf.d/cors.include; | ||
| proxy_hide_header 'Access-Control-Allow-Origin'; | ||
| add_header Access-Control-Allow-Origin $stac_origin_allowed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cors.include should still be there since it adds other definitions (allowed headers, allowed methods, etc.) which are also required (and extra ones might be needed as well since it was actually very limited, while STAC API allows PUT, DELETE, etc.).
I think proxy_hide_header 'Access-Control-Allow-Origin'; will remove it.
Does add_header automatically reapply? I was under the impression explicit proxy_pass_header was needed to undo the hide.
On top of this header, an add_header Vary Origin; would be required to advertise (as required by the standard) that it is Origin-dependant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cors.include should still be there since it adds other definitions
I think proxy_hide_header 'Access-Control-Allow-Origin'; will remove it.
I agree that we should include the rest of cors.include but proxy_hide_header wasn't working for me. I tried a few combinations of things but maybe I did it wrong. I gave up in the end but I can try again with fresh eyes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that proxy_hide_header will hide a header that is set from the upstream server (ie. the stac container) but it won't unset the header that is explicitly set by nginx.
To get around this, I modified cors.include slightly to allow for different behaviour depending on whether a variable was set.
| # This must match the "stac_version" value in the current STAC catalog. | ||
| export PYSTAC_STAC_VERSION_OVERRIDE=1.0.0 | ||
|
|
||
| export _STAC_ADDITIONAL_CORS_ORIGINS='$(for origin in $STAC_ADDITIONAL_CORS_ORIGINS; do echo "$origin \$http_origin;"; done;)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure, but I think the first part ($origin) is some kind of regex to match a given origin. If using https://geojson.io, is it sufficient to match the full request that has other path/query/anchors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can either be a regex (if it starts with ~) or just a string to match. Theoretically we could use either one.
| # - https://github.com/radiantearth/stac-browser/blob/main/docs/docker.md | ||
| export STAC_BROWSER_PATH_PREFIX='/stac-browser/' | ||
|
|
||
| export STAC_ADDITIONAL_CORS_ORIGINS='https://geojson.io' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume space delimited based on the for loop?
Would need to be indicated since ,/; are common variants when dealing with headers.
I think it might also need the ${BIRDHOUSE_FQDN_PUBLIC} since stac-browser self-pings the stac API.
Should this parameter be in stac/default.env, or this one should update the variable additively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll need to update comments and documentation explaining the format of everything.
I think it might also need the ${BIRDHOUSE_FQDN_PUBLIC} since stac-browser self-pings the stac API.
${BIRDHOUSE_FQDN_PUBLIC} is always allowed because it is the default value for the nginx map. So by default, requests from all domains (including ${BIRDHOUSE_FQDN_PUBLIC}) will allow it.
Should this parameter be in stac/default.env, or this one should update the variable additively?
It doesn't need to be. The default (withouth stac-browser) is that it is empty/unset.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3758/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/493/NOTEBOOK TEST RESULTS |
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly editing stuff to review. Feature wise looks good.
| Note that this breaks backwards compatibility slightly since previously all origins were allowed for `/stac` by | ||
| default. To keep the backwards compatible behaviour you can set: | ||
|
|
||
| ``` | ||
| export STAC_CORS_ORIGINS='~.*' | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't export STAC_CORS_ORIGINS='~.*' be set in components/stac/default.env to make it work like the other components?
I think the https://geojson.io case could simply be documented in comment around the default value to indicate that, if * is not desired to be more strict, that specific Origin might need to be considered for stac-browser. It is not required given that stac-browser is not combined with stac anymore.
The backward incompatibility note might be slightly misleading due to the subtlety of CORS. Before, Access-Control-Allow-Origin was omitted entirely, rather than Access-Control-Allow-Origin: *, so CORS was not applied at all. If * was the default, it would behave the same for clients that don't consider it, but it impacts where CORS are actually considered, which is relevant for some Origins like https://geojson.io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can update that to keep the default. I've also updated the STAC and STAC Browser documentation since it was out of date.
The backward incompatibility note might be slightly misleading due to the subtlety of CORS
I don't think so. Previously, the cors.include file was enabled so the Access-Control-Allow-Origin was already set to * (it was not omitted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it was included and set with *, wasn't it omitted because of ˋproxy_hide_header`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that was the intention originally but it wasn't actually hidden. proxy_hide_header only hides header values set by the upstream server, not by nginx (#599 (comment)).
Would you rather that we remain backwards compatible with the intended behaviour (no Access-Control-Allow-Origin) or with the actual behaviour (Access-Control-Allow-Origin=*)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine with Access-Control-Allow-Origin=*.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3767/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/499/NOTEBOOK TEST RESULTS |
birdhouse/components/README.rst
Outdated
| read-only access to STAC API resources while members of the `stac-admin` group can create and modify | ||
| resources if the ``optional-components/stac-public-access`` component is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording makes it seem like the optional component is needed for stac-admin to create/modify resources, but they always have access to do it. The component mention should be placed after the "Unauthenticated users will have read-only access" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed. The magpie permissions set by components/stac give write permissions to the administrators group. The stac-admin group is created by the optional-components/stac-public-access component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh true! It should be moved to components/stac. The stac-admin group could be relevant even when used for fully-protected API, while restricting the member's permission scope only to editing the STAC API resources.
As the comment mentions, administrators is used just because a user/group must be provided (for the config to be valid and work), but the intent is just to make sure the resource exists. The admins do not need the explicit permission to access, they always can anyway.
birdhouse-deploy/birdhouse/components/stac/config/magpie/config.yml.template
Lines 20 to 28 in 4a897ce
| permissions: | |
| # create a default 'stac' resource under 'stac' service | |
| # because of the '/stac/stac' path prefix required to resolve the API links properly, | |
| # all permissions must be nested under this 'stac' resource for requests and permissions to be resolved accordingly | |
| - service: stac | |
| resource: /stac | |
| permission: read | |
| group: administrator # they already have access, just using admins to create the resource by default | |
| action: create |
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with all this CORS headers stuff although I had to deal with it in the past, but I forgot the details.
So I trust you guys with all this. Just please ensure the behavior is backward-compatible with other components already using proxy/conf.d/cors.include.
As for how this CORS header should behave for stac, I'll follow your lead.
I have a few very minor change requested. The rest looks good.
| @@ -0,0 +1,7 @@ | |||
| map $http_origin $stac_origin_allowed { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me what the Nginx map directive do again?
This block means if the http_origin variable exists, create stac_origin_allowed variable with value ${BIRDHOUSE_PROXY_SCHEME}://${BIRDHOUSE_FQDN_PUBLIC}; ${STAC_ADDITIONAL_CORS_ORIGINS} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's mostly right...
This block creates a new variable stac_origin_allowed whose value is based on the value of the http_origin variable.
If the value of http_origin matches any of the patterns defined in ${STAC_ADDITIONAL_CORS_ORIGINS} then the stac_origin_allowed value will be set to http_origin.
Otherwise stac_origin_allowed will be set to ${BIRDHOUSE_PROXY_SCHEME}://${BIRDHOUSE_FQDN_PUBLIC} (the default).
| proxy_buffering off; | ||
| set $access_control_allow_origin $stac_origin_allowed; | ||
| include /etc/nginx/conf.d/cors.include; | ||
| add_header Vary Origin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the Vary header do again? Why we need to add it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tells the client that the Origin header influences the content of the response (see https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Vary and https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Access-Control-Allow-Origin#cors_and_caching)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting so as soon as we do not return Access-Control-Allow-Origin with value * but with a specific list of domain, then this Vary header is needed ! So should this Vary be moved to proxy/conf.d/cors.include? I fear we will forget to add this manually for other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing though is that Vary should be added only if the Origin (or any other header) actually affects the response (assuming no other header already distinguishes the response like the Content-Type). Adding it systematically (eg: when Access-Control-Allow-Origin: *) could result in duplicate cache entries for each domain accessing the server although nothing changes between them. It is used purposely to generate a "diff" in the cached response according to its domain-specific value, so that each domain is managed separately. I think we don't have a choice to apply it on case-by-case basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding it systematically (eg: when
Access-Control-Allow-Origin: *)
I was suggesting to add Vary only when Access-Control-Allow-Origin is not *. Hopefully the if-statement in Nginx allows this. Otherwise we should put a big reminder somewhere so other users notice it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmigneault @tlvu What about 942cc07 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about 942cc07 ?
Looks good too. So I assume when $vary_origin is "" (empty), we return add_header Vary $vary_origin; ==> add_header Vary ""; the empty value won't cause any side-effect and is a legit value allowed by HTTP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_header Vary "";the empty value won't cause any side-effect and is a legit value allowed by HTTP?
I just re-read Francis response earlier (#599 (comment)) ? "nginx omits header if empty". Could you add this comment to the change so it is clear for other readers too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure! Added in 2aa0cae
| \$STAC_MIGRATION_DOCKER | ||
| \$STAC_MIGRATION_IMAGE | ||
| \$STAC_ADDITIONAL_CORS_ORIGINS | ||
| \$STAC_CORS_ORIGIN_ALLOW_ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STAC_CORS_ORIGIN_ALLOW_ALL not used anywhere in this PR, ancien relic of some renaming in this PR that you forgot to remove?
| Calls to the STAC API pass through Twitcher in order to validate authorization. Unauthenticated users will have | ||
| read-only access by default to STAC API resources while members of the `stac-admin` group can create and modify | ||
| resources. STAC Browser is not protected by any authorization mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"STAC Browser is not protected by any authorization mechanism", this part is not true anymore, that's why it is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically true but it's misleading. Access to the stac-browser endpoint is not checked by twitcher but if the user doesn't have permissions to read the stac data (because the stac endpoint is protected by twitcher) then the stac-browser page shows a 403 error anyway.
So from the user's perspective it is protected by an authorization mechanism, it's the same one that protects the stac endpoint.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3776/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/504/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3793/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/518/NOTEBOOK TEST RESULTS |
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. There is one comment remaining about the stac-admin magpie group, but that can be done in another PR if desired.
|
Thanks @fmigneault I missed that last comment I guess. Do you mind checking out the change in 2aa0cae before I merge this. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3798/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/524/NOTEBOOK TEST RESULTS |
All good. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3800/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : stac-cors-flexible DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca
|
Overview
Allow each service to specify values for
Access-Control-Allow-OriginPreviously, if a
locationblock in thenginxconfiguration for a given service included the cors helperconfiguration (with
include /etc/nginx/conf.d/cors.include;) then all origins were allowed by default.This was done by setting the header
Access-Control-Allow-Origin: *which works well but is a bit too permissivesince it allowed all origins.
This change introduces a mechanism to specify specific additional allowed origins by setting the
$access_control_allow_originnginx variable in thelocationblock before including thecors.includefile.For example:
will set the value of the
Access-Control-Allow-Originresponse header tohttp://example.com.By default, the header value will be
*if$access_control_allow_originis not set (to maintain backwardscompatibility).
To specify multiple allowed origins, use a
mapdirective (see the implementation forcomponents/stacfor anexample).
Set allowed CORS origins for
stacthrough an environment variableThis change implements this flexibility for the
components/staccomponent. By setting theSTAC_CORS_ORIGINSvariable a user can specify allowed origins for responses from the
components/staccomponent.For example, setting the following:
then requests from https://example.com and http://other.example.com will get a response with the
Access-Control-Allow-Origin headerset to their origin, but http://example.ca will not.Note that this breaks backwards compatibility slightly since previously all origins were allowed for
/stacbydefault. To keep the backwards compatible behaviour you can set:
to match all origins.
Changes
Non-breaking changes
Breaking changes
/stacno longer setAccess-Control-Allow-Origin: *by defaultRelated Issue / Discussion
Additional Information
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false