-
Notifications
You must be signed in to change notification settings - Fork 7
Add OGC API DGGS service + associated utility components #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3636/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/428/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3639/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/429/NOTEBOOK TEST RESULTS |
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if secure-data-proxy was meant to be used by multiples services, we should be dropping more config files instead of appending to SECURE_DATA_PROXY_LOCATIONS in env.local.
The rest LGTM I guess.
| proxy_set_header X-Forwarded-Host $host:$server_port; | ||
| } | ||
|
|
||
| ${SECURE_DATA_PROXY_LOCATIONS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit lost, why hooked this new location into in existing secure-data-proxy component? I might have forgotten what secure-data-proxy was meant for initially. Why not create a new dggs-data-proxy?
Or to keep secure-data-proxy, instead of using a template expansion of var ${SECURE_DATA_PROXY_LOCATIONS} which means complex escaping of special char, how about replace ${SECURE_DATA_PROXY_LOCATIONS} with include /etc/nginx/conf.extra-service.d/secure-data-proxy/conf.d-*/*.conf and you mount the new file under /etc/nginx/conf.extra-service.d/secure-data-proxy/conf.d-dggs/dggs-data-proxy.conf.
Subsequent services will just drop more files. No appending endlessly to ${SECURE_DATA_PROXY_LOCATIONS} via env.local which then you need to handle duplicate append since env.local is read multiple times by read_configs.
Right now you only have one usage for ${SECURE_DATA_PROXY_LOCATIONS} so you are not appending to ${SECURE_DATA_PROXY_LOCATIONS} in env.local so you do not see the duplicate read/append problem yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create a new
dggs-data-proxy?
We could do that as well. Would that be preferred? The result is the same, just more explicit/specialized to a single service rather than using the generic approach (see details below).
The way secure-data-proxy is defined, anything can be mounted in the proxy, under any directory path, and served on the web interface from any location. The only thing to include is the ${SECURE_DATA_PROXY_AUTH_INCLUDE} as shown below. The rest is flexible as desired.
Lines 1 to 5 in 6143ac3
| location ${STAC_DATA_PROXY_URL_PATH}/ { | |
| ${SECURE_DATA_PROXY_AUTH_INCLUDE} | |
| alias /stac-data-proxy/; | |
| } |
When optional-components/secure-data-proxy is enabled, the SECURE_DATA_PROXY_AUTH_INCLUDE variable gets defined, and that introduces the auth_request /secure-data-auth; definition. That auth-request refers to the following config, which performs a pre-check with Magpie/Twitcher for authorization of the nested resource (matching the request path) defined under secure-data-proxy, before granting/refusing access to the file.
Lines 2 to 7 in 6143ac3
| location = /secure-data-auth { | |
| internal; | |
| # note: using 'TWITCHER_VERIFY_PATH' path to avoid performing the request via 'proxy' endpoint | |
| # This ensures that the data access is validated for the user, but does not trigger its access/download twice. | |
| # Also, avoids getting an error as 'secure-data-proxy' private URL in Magpie doesn't resolve to a valid path. | |
| proxy_pass ${BIRDHOUSE_PROXY_SCHEME}://${BIRDHOUSE_FQDN_PUBLIC}${TWITCHER_VERIFY_PATH}/secure-data-proxy$request_uri; |
When optional-components/secure-data-proxy is NOT enabled, the SECURE_DATA_PROXY_AUTH_INCLUDE variable is undefined, and therefore omitted. This results in the data location mounted in the proxy to become publicly available, since no pre-auth-request takes effect.
In other words, activating the component allows quickly toggling between protected and open access. The component is purposely designed to be available to many other components to manage all static data access from a central Magpie service. Currently, stac-data-proxy and wps-output_volume components use it, but more could do as well (like dggs here).
The customization is flexible, for example, I could define the following.
location /my-custom-location/some-path {
${SECURE_DATA_PROXY_AUTH_INCLUDE}
alias /some-random-dir;
}services:
proxy:
volumes:
- /wherever-i-want/data:/some-random-dir:roNow, for this PR change specifically.
I introduce SECURE_DATA_PROXY_LOCATIONS that can embed the above Nginx config, such that I do not need to introduce a new file like stac-proxy-data.conf.template for each extra directory to be protected. I just need to manage this variable and the custom proxy volume mount on my end, and I can extend it with any amount of custom directories.
This is what I have done to provide this sample file, without introducing another "optinal-components/samples-data-proxy".
birdhouse-deploy/birdhouse/components/dggs/config/dggs/pydggsapi-config.json.template
Line 49 in 7d93fec
| "filepath": "https://hirondelle.crim.ca/data/samples/dggs/H3/L7/2025-08-31/rl/RCM2_OK3556292_PK3777138_2_SC30MCPD_20250831_123917_RL.parquet", |
how about replace
${SECURE_DATA_PROXY_LOCATIONS}with include/etc/nginx/conf.extra-service.d/secure-data-proxy/conf.d-*/*.confand you mount the new file under/etc/nginx/conf.extra-service.d/secure-data-proxy/conf.d-dggs/dggs-data-proxy.conf
That is certainly an option. You are allowed to have explicit location blocks or embedded include directives within SECURE_DATA_PROXY_LOCATIONS however desired, as long as the relevant include files are mounted in proxy as well.
I like the idea of adding the mechanism for auto-include of the files. Will add that.
I will still leave SECURE_DATA_PROXY_LOCATIONS though. The idea would be that, if extending specific services, their /etc/nginx/conf.extra-service.d/secure-data-proxy/conf.d-*/*.conf files should be used. The SECURE_DATA_PROXY_LOCATIONS would be there to extend even more custom data locations, which are not managed by any specific service. So, components are not expected to append to SECURE_DATA_PROXY_LOCATIONS. It is a one-of override by the server maintainer to plug extra data sources as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of something.
Using the /etc/nginx/conf.extra-service.d/secure-data-proxy/... approach with a file defined in ./components/<service>/config/secure-data-proxy/conf.extra-service.d/ works only if secure-data-proxy is enabled.
That means that data served from that location will not be available publicly when disabling secure-data-proxy. We should instead have a generic optional-components/data-proxy which employs SECURE_DATA_PROXY_AUTH_INCLUDE (like stac-data-proxy does) to allow secure/public toggle when combining the two or not.
If a optional-components/data-proxy gets defined, then there is not really any advantage to have per-service nginx configs. The server can simply structure the data however it wants under the data directory and it would be served as is, and would be protected with the corresponding structure in Magpie if using optional-components/secure-data-proxy as well.
The fact that /data/dggs would be used for example is only semantics. I could have a random /data/somewhere-else configured within the components/dggs/config/data-proxy/... and it would still require both components to be active, without any actual indication that somewhere-else maps to dggs service. I could also just as well reference STAC data from that /data/somewhere-else location, although it would be defined by dggs, and it would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I let you duke out the public/secure toogle, seems like you understand it more than me. Does that means var SECURE_DATA_PROXY_AUTH_INCLUDE will be used by both secure-data-proxy and data-proxy? Which component is supposed to set it then?
About SECURE_DATA_PROXY_LOCATIONS for ad-hoc usage in env.local, it is fine to keep but please add a comment next to that var saying components should never use it but use the "drop a file" mechanism instead.
And explain in the same comment that as long as components do not use that var, then in env.local we do not need to append and we won't get duplicate problems because read_configs will read env.local multiple times. This is the key point to avoid problems with env.local later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it would look like if nesting on each service (only dggs done for now)
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3641/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/431/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3645/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/433/NOTEBOOK TEST RESULTS |
|
@mishaschwartz For unittest, I need DACCS-Climate/Marble-node-registry#39 and the addition of a new I have not removed the |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3732/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/479/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3733/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/480/NOTEBOOK TEST RESULTS |
@fmigneault There's a PR to resolve that in place now (DACCS-Climate/Marble-node-registry#43), I've added you as a reviewer so please have a look when you can |
## Overview - updated schema to version 1.3.0 - remove enclosing list (it can just be a dict if there's only one) ## Changes **Non-breaking changes** - schema update **Breaking changes** ## Related Issue / Discussion - fixes tests for #583 ## Additional Information ## CI Operations <!-- The test suite can be run using a different DACCS config with ``birdhouse_daccs_configs_branch: branch_name`` in the PR description. To globally skip the test suite regardless of the commit message use ``birdhouse_skip_ci`` set to ``true`` in the PR description. Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the commit message will override ``birdhouse_skip_ci`` from the PR description. Such commit command can be used to override the PR description behavior for a specific commit update. However, a commit message cannot 'force run' a PR which the description turns off the CI. To run the CI, the PR should instead be updated with a ``true`` value, and a running message can be posted in following PR comments to trigger tests once again. --> birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3796/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/521/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3824/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/542/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3827/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : ogc-api-dggs DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/543/NOTEBOOK TEST RESULTS |
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing my round of PR, found this old one, I am not sure if something is waiting after me for this PR.
Quick re-read again and I only found one comment that has not been addressed:
"Can you document in the appropriate README and CHANGES that components should not use SECURE_DATA_PROXY_LOCATIONS var directly and that var is reserved exclusively for use in env.local?"
Please request a review again once done so I "see" it in my list.
…n block overrides
…eploy into add-cors-origin
It was not blocked by you.
Is it sufficient to have it as a quick "note" comment in the
Will do. |
Overview
DGGS (Discrete Global Grid Systems) allow to return data in other formats than the typical
(lat,lon), by following a predefined discrete grid to contain the data.This PR adds support of an OGC API - DGGS open source implementation that was augmented with Dockerized packaging. In the long run, further integration with other services is planed (notably Weaver and STAC), but this prepares only the minimal setup to have the service "running".
For reference, it is running here: https://hirondelle.crim.ca/dggs-api/
API Docs: https://hirondelle.crim.ca/dggs-api/docs
Changes
Non-breaking changes
DGGS: Add the new
components/dggsproviding an OGC API for Discrete Global Grid Systems./dggs-apipath (default, configurable viaDGGS_API_PATH)./ogcapi/dggs/...and/ogcapi/collections/.../dggs/....feature of
optional-components/secure-data-proxyon CRIM's Hirondelle server.Data: Allow
optional-components/secure-data-proxyto define generic and flexible locations.SECURE_DATA_PROXY_ROOTcan be defined as mount directory inside theproxyservice.SECURE_DATA_PROXY_LOCATIONScan be defined with any amount of custom locations.secure-data-proxyservice for access control.wps_output-volume,stac-data-proxy) that can optionally use this security middlewarevia
SECURE_DATA_PROXY_AUTH_INCLUDEcan still do so. Their mount points are handled separately.Weaver: Modified
/ogcapi/...redirections strategy viaWEAVER_ALT_PREFIX_PROXY_LOCATION.Breaking changes
Related Issue / Discussion
Additional To Do
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false