-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add method to fetch required DuckDB extensions #668
Conversation
crates/duckdb/src/lib.rs
Outdated
Client::fetch_extensions(true, true, true, None).unwrap(); | ||
Client::new().unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to require this every time someone makes a Client
... should we make a config value to optionally fetch or not fetch, and default to fetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I hadn't thought about putting this in the Client::new
, but it seems like a good default to opt-in to fetching as part of new()
or with_config()
for the "batteries included" experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 if we do this we'll need to mix some of the extension configuration choices. Maybe something like,
struct ExtensionConfig {
pub support_https: bool,
pub support_aws: bool,
pub support_azure: bool,
pub custom_extension_repository: Option<&str>,
}
impl Default for ExtensionConfig {
fn default() -> ExtensionConfig {
support_https: true,
support_aws: true,
support_azure: true,
custom_extension_repository: None,
}
}
struct Config {
...,
/// If None, do not fetch extensions when creating a Client. Otherwise
/// fetch extensions as defined in the provided ExtensionConfig
fetch_extension_config: Option<ExtensionConfig>
}
impl Default for Config {
...,
fetch_extension_config: ExtensionConfig::default(),
}
Happy to get suggestions, it feels a little tricky to have a nice way of including setup and runtime configuration together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels fine to me, though maybe (since we already have use_s3_credential_chain
) we should akchewly go to vendor specific sub-config structures, e.g.:
struct Config {
aws: AwsConfig,
azure: AzureConfig,
...
}
struct AwsConfig {
install_extension: bool,
use_s3_credential_chain: bool,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that'll work. At first I was wondering how much that would start mixing "runtime" and "install time" configurations, which would make it a little awkward to just install extensions during a build process, because I could see the AwsConfig
growing. Since we're using the CREDENTIAL_CHAIN
approach for the auth however I'm pretty sure any runtime settings (region / profile / etc) can be driven via environment variables, so the config won't grow in that sense
I do wish DuckDB could be configured via environment variables more than it can be currently though, e.g., this discussion, duckdb/duckdb#9881
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the whole "use env vars to authz yourself" pattern is both nice and heinous from a code path perspective. But here we are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not quite sure where the choice for the HTTPFS extension would live in this AwsConfig
or AzureConfig
idea. I'll add two commits, one for each idea, and mull over what seems like a nicer usage
I realized I never actually use the #[once]
fixture to install the extensions during tests, so fixed that. It was one of those non-repeatable local cases
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #668 +/- ##
==========================================
- Coverage 41.12% 41.10% -0.03%
==========================================
Files 48 48
Lines 2779 2788 +9
==========================================
+ Hits 1143 1146 +3
- Misses 1636 1642 +6 ☔ View full report in Codecov by Sentry. |
OBE #681 |
Closes
Description
This PR attempts to address some user experience goals related to DuckDB extensions by adding a way for the
Client
to install whatever DuckDB extensions are required for use. This extension list can be categorized into,stac-rs
needs ("icu" and "spatial")stac-rs
might need depending on the location of the STAC GeoParquet assets. For example,Once installed DuckDB will "autoload" (see "autoloadable" column) the extensions for AWS/Azure/HTTPFS as needed, so we don't need to invoke that in our
Client::new
. By contrast "spatial" is NOT autoloadable, so that still has to live inClient::new
The formatting at least for docs looks good to me (feel free to suggest wording / example cases),

Checklist
Delete any checklist items that do not apply (e.g. if your change is minor, it may not require documentation updates).
cargo fmt
)cargo test