Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add method to fetch required DuckDB extensions #668

Closed
wants to merge 9 commits into from

Conversation

ceholden
Copy link
Collaborator

@ceholden ceholden commented Mar 3, 2025

Closes

Description

This PR attempts to address some user experience goals related to DuckDB extensions by adding a way for the Client to install whatever DuckDB extensions are required for use. This extension list can be categorized into,

  • "core" dependencies that stac-rs needs ("icu" and "spatial")
  • "optional" dependencies that stac-rs might need depending on the location of the STAC GeoParquet assets. For example,
    • If on local disk we don't need AWS or Azure or HTTPFS
    • If on a remote server or GCP we need HTTPFS but not AWS or Azure
    • If on AWS and we want to use the credential provider, we need AWS extension AND HTTPFS
    • If on Azure we need Azure but not HTTPFS or AWS

Once installed DuckDB will "autoload" (see "autoloadable" column) the extensions for AWS/Azure/HTTPFS as needed, so we don't need to invoke that in our Client::new. By contrast "spatial" is NOT autoloadable, so that still has to live in Client::new

The formatting at least for docs looks good to me (feel free to suggest wording / example cases),
image

Checklist

Delete any checklist items that do not apply (e.g. if your change is minor, it may not require documentation updates).

  • Documentation, including doctests
  • Git history is linear
  • Commit messages are descriptive
  • Code is formatted (cargo fmt)
  • cargo test
  • Changes are added to the CHANGELOG

Comment on lines 599 to 600
Client::fetch_extensions(true, true, true, None).unwrap();
Client::new().unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want to require this every time someone makes a Client ... should we make a config value to optionally fetch or not fetch, and default to fetch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I hadn't thought about putting this in the Client::new, but it seems like a good default to opt-in to fetching as part of new() or with_config() for the "batteries included" experience

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 if we do this we'll need to mix some of the extension configuration choices. Maybe something like,

struct ExtensionConfig {
    pub support_https: bool,
    pub support_aws: bool,
    pub support_azure: bool,
    pub custom_extension_repository: Option<&str>,
}
impl Default for ExtensionConfig {
    fn default() -> ExtensionConfig {
        support_https: true,
        support_aws: true,
        support_azure: true,
        custom_extension_repository: None,
    }
}

struct Config {
    ...,
    /// If None, do not fetch extensions when creating a Client. Otherwise
    /// fetch extensions as defined in the provided ExtensionConfig
    fetch_extension_config: Option<ExtensionConfig>
}
impl Default for Config {
    ...,
    fetch_extension_config: ExtensionConfig::default(),
}

Happy to get suggestions, it feels a little tricky to have a nice way of including setup and runtime configuration together

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels fine to me, though maybe (since we already have use_s3_credential_chain) we should akchewly go to vendor specific sub-config structures, e.g.:

struct Config {
    aws: AwsConfig,
    azure: AzureConfig,
    ...
}

struct AwsConfig {
    install_extension: bool,
    use_s3_credential_chain: bool,
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'll work. At first I was wondering how much that would start mixing "runtime" and "install time" configurations, which would make it a little awkward to just install extensions during a build process, because I could see the AwsConfig growing. Since we're using the CREDENTIAL_CHAIN approach for the auth however I'm pretty sure any runtime settings (region / profile / etc) can be driven via environment variables, so the config won't grow in that sense

I do wish DuckDB could be configured via environment variables more than it can be currently though, e.g., this discussion, duckdb/duckdb#9881

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the whole "use env vars to authz yourself" pattern is both nice and heinous from a code path perspective. But here we are.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not quite sure where the choice for the HTTPFS extension would live in this AwsConfig or AzureConfig idea. I'll add two commits, one for each idea, and mull over what seems like a nicer usage

I realized I never actually use the #[once] fixture to install the extensions during tests, so fixed that. It was one of those non-repeatable local cases

@codecov-commenter
Copy link

codecov-commenter commented Mar 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 41.10%. Comparing base (b9051b9) to head (b5cc02d).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #668      +/-   ##
==========================================
- Coverage   41.12%   41.10%   -0.03%     
==========================================
  Files          48       48              
  Lines        2779     2788       +9     
==========================================
+ Hits         1143     1146       +3     
- Misses       1636     1642       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gadomski
Copy link
Member

OBE #681

@gadomski gadomski closed this Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants