Skip to content

Conversation

@tlvu
Copy link
Collaborator

@tlvu tlvu commented Apr 26, 2024

Overview

This PR is rather to start a discuss than ready to merge. That's why there is no CHANGES.md update.

So I needed to add CORS allow headers for Thredds, so our partner javascript webapps running on other domains than pavics.ouranos.ca can hit our Thredds, so we act as the backend for their frontend.

  1. Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

  2. By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header (optional-components/x-robots-tag-header) ! Is that expected? Or the way I add headers to Twitcher is wrong? I was just doing the same thing as all the existing services. The X-Robots-Tag header is important to avoid being hit by crawlers.

birdhouse/config/magpie/config/proxy/conf.extra-service.d/magpie.conf.template
5:        include /etc/nginx/conf.d/cors.include;

birdhouse/deprecated-components/ncwms2/config/proxy/conf.extra-service.d/ncwms2.conf.template
5:    #    include /etc/nginx/conf.d/cors.include;

birdhouse/components/cowbird/config/proxy/conf.extra-service.d/cowbird.conf.template
8:        include /etc/nginx/conf.d/cors.include;

birdhouse/components/weaver/config/proxy/conf.extra-service.d/weaver.conf.template
16:        include /etc/nginx/conf.d/cors.include;

birdhouse/components/stac/config/proxy/conf.extra-service.d/stac.conf.template
12:        include /etc/nginx/conf.d/cors.include;
  1. Is our current CORS headers way too permissive?

This is what we return https://github.com/bird-house/birdhouse-deploy/blob/97ee8da24821391aeef52b13ea9adda28f919085/birdhouse/components/proxy/conf.d/cors.include

Access-Control-Allow-Origin: *                                                                                                                                                           
Access-Control-Allow-Methods: GET, POST, OPTIONS                                                                                                                                         
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range
Access-Control-Expose-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range

This is already enough

Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Authorization
Access-Control-Allow-Methods: POST, GET, OPTIONS

I think perhaps we should not allow-origin * but a list of known partners domain? And trim down the allow-headers list?

I am not security expert so I want to hear from you guys.

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false

@huard
Copy link
Collaborator

huard commented Apr 26, 2024

Proposal to white list https://raven.uwaterloo.ca/

@mishaschwartz
Copy link
Collaborator

Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

Unfortunately I don't think it will even do that.

If we want to apply the cors headers everywhere, why not apply it at the proxy level instead of on twitcher. Not all requests will go through the twitcher proxy necessarily.

Some components (geoserver, jupyterhub, all the monitoring components, secure-data-auth) just use twitcher's verify endpoint to check whether a user has access, in those cases, the cors headers wouldn't be included.

Jupyterhub and the monitoring components probably won't matter for cross-site scripts but the others will matter I'm sure.

By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header

Is this when making a cross-origin request or everywhere?

You might have to add the X-Robots-Tag to the headers listed in Access-Control-Allow-Headers and Access-Control-Expose-Headers otherwise they won't be sent (but we'll have to test it to be sure)

Is our current CORS headers way too permissive?

I think perhaps we should not allow-origin * but a list of known partners domain?

This would have to be configurable as not every deployment would want the same domains.
We should also think about whether we would need to set Access-Control-Allow-Credentials as well.

And trim down the allow-headers list?

This one will require some thought. Do you know if there are recommended best-practices for setting these?

@tlvu
Copy link
Collaborator Author

tlvu commented Apr 26, 2024

Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

Unfortunately I don't think it will even do that.

Sorry I meant adding CORS headers to the proxy location /twitcher/ for Twitcher. So the change is actually in Nginx and not in Twitcher but for Twitcher and all services behind it. See my initial code change.

By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header

Is this when making a cross-origin request or everywhere?

Everywhere. But I think I understood how this works. X-Robots-Tag: noindex, nofollow was added to the root location. But all child location directive do not inherit but override all added headers so to keep any headers added by the parent, the child location will have to repeat those headers.

So probably each time that cors.include file was included, we lost X-Robots-Tag header ! So we've been losing that header for magpie, weaver, cowbird, stac and we didn't even know !

I think perhaps we should not allow-origin * but a list of known partners domain?

This would have to be configurable as not every deployment would want the same domains. We should also think about whether we would need to set Access-Control-Allow-Credentials as well.

Agreed it has to be configurable.

I was reading how to make it configurable, it's not so simple. I didn't know "if is evil" in Nginx config and our current cors.include file uses if !

Some interesting read I found:

https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/

http://agentzh.blogspot.com/2011/03/how-nginx-location-if-works.html

https://www.juannicolas.eu/how-to-set-up-nginx-cors-multiple-origins/

And trim down the allow-headers list?

This one will require some thought. Do you know if there are recommended best-practices for setting these?

Have not had time but yes we should follow some newer best practices. That cors.include file was there even before I started at Ouranos.

@fmigneault
Copy link
Member

fmigneault commented Apr 26, 2024

Unfortunately I don't think it will even do that.

If we want to apply the cors headers everywhere, why not apply it at the proxy level instead of on twitcher. Not all requests will go through the twitcher proxy necessarily.

That is my current understanding of what is already applied by the following, which includes cors.include:

include /etc/nginx/conf.d/*.conf;

Setting it under /twitcher/ "should" only redefine it the same way (it WON'T though, see next).

1. Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

2. By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header (optional-components/x-robots-tag-header) ! Is that expected? Or the way I add headers to Twitcher is wrong? I was just doing the same thing as all the existing services. The X-Robots-Tag header is important to avoid being hit by crawlers.

if inside location definitions are big no-no: https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/
It's not a thing about "overriding headers or not", it is just undefined behavior.
if are perfectly valid at the server level (as they are currently applied).

Is our current CORS headers way too permissive?

Maybe?
But good consideration must be given about which set of Origins are defined if set explicitly.
If only the partner's Origin is set, any following partner or browser JavaScript using the server as a backend will suddenly be refused.
Since the purpose of the platform is to provide access to services, I'm not sure adding strict origin control will change much other than make our maintenance harder.
If configurable, I don't mind, as long as the default remains as the current value.

For the Access-Control-Allow-Headers, I think currents ones are good. Some of them are for caching or prechecks, which can help reduce request/response time/content-size if supported by a service. They shouldn't hurt. The only one I don't know is X-CustomHeader.

@mishaschwartz
Copy link
Collaborator

Sorry I meant adding CORS headers to the proxy location /twitcher/ for Twitcher. So the change is actually in Nginx and not in Twitcher but for Twitcher and all services behind it. See my initial code change.

I understand... what I was saying is that there are components that are not behind twitcher that will not be affected by this change.

@fmigneault
Copy link
Member

fmigneault commented Oct 16, 2025

@tlvu @mishaschwartz
I come back to this issue after a while from encountering a need for it as well.
In my case, I wish to provide /stac/ response with Access-Control-Allow-Origin: *.
The main reason being that STAC browser offers a nice functionality:

image

But CORS blocks the response from our server :

image

The solution I found is actually better than the proposed change.
I simply add proxy_pass_header 'Access-Control-Allow-Origin'; (https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass_header) to the location:

location /stac {
# We need the first `/stac` for service resolution.
# We need the second `/stac` for API redirect in STAC (see `root-path` and `ROUTER_PREFIX`).
# See https://github.com/stac-utils/stac-fastapi/issues/427
# See https://github.com/crim-ca/stac-app/blob/main/stac_app.py#L60
proxy_pass ${BIRDHOUSE_PROXY_SCHEME}://${BIRDHOUSE_FQDN_PUBLIC}${TWITCHER_PROTECTED_PATH}/stac/stac/;
proxy_set_header Host $host;

This effectively "undo" the server-wide rule:

proxy_hide_header 'Access-Control-Allow-Origin';

And my responses contain the desired headers:
{A609CC17-C302-4D41-8927-D8C8B673885A}

This solution is "safer" since it is only an API endpoint, and there is little (if any) data that could be extracted via XSS because no JS is involved. There could also be a specific variable to enable only the BIRDHOUSE_FQDN_PUBLIC and geojson.io by default rather than *, specifically for STAC API (or any other service).

What do you think of this approach?

@mishaschwartz
Copy link
Collaborator

mishaschwartz commented Oct 20, 2025

I agree that it's a nice feature and I like the flexibility.

I would prefer to add an option to just allow same-origin and geojson.io for the stac endpoint (instead of turning off cors protections for stac).

My suggestion would be to create a variable that allows geojson.io from stac and only enable that variable if the stac-populator component is enabled as well.

@fmigneault
Copy link
Member

I have tried setting

location /stac {  
  proxy_pass_header 'Access-Control-Allow-Origin';
  add_header Access-Control-Allow-Origin "${BIRDHOUSE_FQDN},geojson.io";
  [...]
}

That does not work. They are inserted as-is, instead of multiple entries. The servers accessing it cannot parse the header appropriately.

Instead, one of the solutions listed in https://serverfault.com/questions/958965/nginx-enabling-cors-for-multiple-subdomains using map must be used. It makes it somewhat more tricky to configure from a env variable.

IMO, it should not be specific to geojson.io. The API could be used by others that I wouldn't mind them accessing, so it needs to be a configurable list.

@mishaschwartz
Copy link
Collaborator

@fmigneault what about something like #599 ?

@fmigneault
Copy link
Member

@fmigneault what about something like #599 ?

Looks promising. I'll comment directly on it.

mishaschwartz added a commit that referenced this pull request Nov 12, 2025
## Overview

- Allow each service to specify values for `Access-Control-Allow-Origin`

Previously, if a `location` block in the `nginx` configuration for a
given service included the cors helper
configuration (with `include /etc/nginx/conf.d/cors.include;`) then all
origins were allowed by default.

This was done by setting the header `Access-Control-Allow-Origin: *`
which works well but is a bit too permissive
  since it allowed __all__ origins.

This change introduces a mechanism to specify specific additional
allowed origins by setting the
`$access_control_allow_origin` nginx variable in the `location` block
before including the `cors.include` file.

  For example:

  ```
  set $access_control_allow_origin http://example.com;
  include /etc/nginx/conf.d/cors.include;
  ```

will set the value of the `Access-Control-Allow-Origin` response header
to `http://example.com`.

By default, the header value will be `*` if
`$access_control_allow_origin` is not set (to maintain backwards
  compatibility).

To specify multiple allowed origins, use a `map` directive (see the
implementation for `components/stac` for an
  example).

- Set allowed CORS origins for `stac` through an environment variable

This change implements this flexibility for the `components/stac`
component. By setting the `STAC_CORS_ORIGINS`
variable a user can specify allowed origins for responses from the
`components/stac` component.

  For example, setting the following:
  
  ```
export STAC_CORS_ORIGINS='https://example.com
~^https?://(www\.)?other\.example\.com$'
  ```

then requests from https://example.com and http://other.example.com will
get a response with the
`Access-Control-Allow-Origin header` set to their origin, but
http://example.ca will not.

Note that this breaks backwards compatibility slightly since previously
all origins were allowed for `/stac` by
  default. To keep the backwards compatible behaviour you can set:

  ```
  export STAC_CORS_ORIGINS='~.*' 
  ```

  to match all origins.

## Changes

**Non-breaking changes**
- Adds mechanism to allow services to have more control over CORS
headers

**Breaking changes**
- responses from `/stac` no longer set `Access-Control-Allow-Origin: *`
by default

## Related Issue / Discussion

- As discussed in #450

## Additional Information

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.

Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
Such commit command can be used to override the PR description behavior
for a specific commit update.
However, a commit message cannot 'force run' a PR which the description
turns off the CI.
To run the CI, the PR should instead be updated with a ``true`` value,
and a running message can be posted in following PR comments to trigger
tests once again.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants