Skip to content

Conversation

@fmigneault
Copy link
Member

Overview

Concept idea discussed in #583 (comment)

Adds a generic optional-components/data-proxy that combines with optional-components/secure-data-proxy to offer similar behavior to optional-components/stac-data-proxy, but in a generic fashion.

Using components/dggs as example. Similar could be done with other services.

Changes

Non-breaking changes

  • Adds a generic optional-components/data-proxy.

Breaking changes

  • n/a

Related Issue / Discussion

CI Operations

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: true

@fmigneault fmigneault self-assigned this Sep 12, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 12, 2025
@@ -0,0 +1,3 @@
# Ensure the component is detected when added to 'BIRDHOUSE_EXTRA_CONF_DIRS'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but I don't think this will make data-proxy "detected". I think for a component to be detected it needs to either have default.env or docker-compose-extra.yml file.

But if you want a DATA_PROXY_LOCATIONS var that can be used ad-hoc in env.local, then you have your default.env and docker-compose-extra.yml for this component :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure it needs more testing. I've only pushed what I had drafted up to realizing #583 (comment).

- ``<SERVICE>_DATA_PROXY_DIR_PATH``: host machine directory to the data

By default, all services will employ ``/data/data-proxy/<service>`` as the host directory and ``/data/<service>``
as web serving location. They can be configured globally or per service using relevant configuration variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see what is enforcing or forcing this default "all services will employ /data/data-proxy/<service> as the host directory and /data/<service> as web serving location"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is actually enforced. If we move forward with this concept, I think we would just make it an agreed upon default when writing the corresponding vars for all default.env of each service.


Enabling ``components/<SERVICE>`` with ``optional-components/data-proxy`` will make the following variables available:
- ``<SERVICE>_DATA_PROXY_URL_PATH``: web access location to the data
- ``<SERVICE>_DATA_PROXY_DIR_PATH``: host machine directory to the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what is enforcing or forcing the pattern <SERVICE>_DATA_PROXY_URL_PATH and <SERVICE>_DATA_PROXY_DIR_PATH?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlvu
Copy link
Collaborator

tlvu commented Sep 12, 2025

Wait a minute, I think we might over-engineer this one.

data-proxy is basically secure-data-proxy without SECURE_DATA_PROXY_AUTH_INCLUDE !

So if we want to globally toogle public/secure, why can't we just empty the var SECURE_DATA_PROXY_AUTH_INCLUDE in env.local !

If we want to toogle per component that use it, we could define DGGS_SECURE_DATA_PROXY_AUTH_INCLUDE='$SECURE_DATA_PROXY_AUTH_INCLUDE' (and add to DELAY_EVAL in components/ddgs), and use DGGS_SECURE_DATA_PROXY_AUTH_INCLUDE in the .conf file. Then to only toogle public/secure for DGGS only, we empty only DGGS_SECURE_DATA_PROXY_AUTH_INCLUDE in env.local.

Basically we do not need to create data-proxy that is 95% similar to secure-data-proxy.

We should wait for @mishaschwartz opinion on this one.

@fmigneault
Copy link
Member Author

Wait a minute, I think we might over-engineer this one.

I agree. Goes back to #583 (comment).
I really don't think it is necessary to have it defined for each service since it is not like every service actually needs to have some hosted data. If data is hosted, it can be using any endpoint, and the services using those URI can refer to them, regardless of a corresponding "service" prefix or not.

data-proxy is basically secure-data-proxy without SECURE_DATA_PROXY_AUTH_INCLUDE

Exactly.

However, it is important to have both of them defined separately. If proxy aliases were defined only in secure-data-proxy component, not including it (to disable security) would remove access to data all together (not "only" make it public).

One alternative would be to have only data-proxy with a variable like DATA_PROXY_SECURE=true|false that would set the value currently defined by SECURE_DATA_PROXY_AUTH_INCLUDE. That being said, since secure-data-proxy is already defined and used by some platforms (at least Hirondelle does), I prefer to leave it as is and avoid maintaining 2 separate methods to achieve the same goal.

So if we want to globally toogle public/secure, why can't we just empty the var SECURE_DATA_PROXY_AUTH_INCLUDE in env.local !

This is another option, but it is prone to break if some tweak or more advanced definitions needs to be done to SECURE_DATA_PROXY_AUTH_INCLUDE or any other logic involved within these components. I used the "just enable optional-components/secure-data-proxy" approach to make any underlying tweaks as seamless as possible. Sadly, I didn't think about the DATA_PROXY_SECURE=true|false method before...

DGGS_SECURE_DATA_PROXY_AUTH_INCLUDE='$SECURE_DATA_PROXY_AUTH_INCLUDE'

That actually adds another level of control, to have various protected or not locations per services.

The data-proxy/secure-data-proxy strategy (assuming it would be applied to any relevant component) would allow per-service custom config of the data/web-host location. However, they would still all be fully-open or protected depending on secure-data-proxy inclusion or not. This would allow some of them to be protected while others are not.

Maybe this is a personal preference on my end, but I think that if you want any form of secured data access by enabling secure-data-proxy, you can easily open access to the specific files/locations using Magpie permissions instead. It is faster (and finer grain control) to go via Magpie over these per-service data-proxy options (ie: access/protected would still only apply over the entire service's data), while Magpie can do any level of recursive/match include/exclude combination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants