-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: robustify pre-signed URL checks #760
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #760 +/- ##
==========================================
+ Coverage 84.32% 84.34% +0.01%
==========================================
Files 81 81
Lines 19200 19227 +27
Branches 19200 19227 +27
==========================================
+ Hits 16191 16217 +26
- Misses 2204 2207 +3
+ Partials 805 803 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple nits and we should do a quick unit test for this to assert (and document!) behavior
kernel/src/engine/default/mod.rs
Outdated
|
||
impl UrlExt for Url { | ||
fn is_presigned(&self) -> bool { | ||
matches!(self.scheme(), "http" | "https") && self.query().is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so only material change is checking that the query is Some? maybe link docs to the guarantee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the checks to explicitly consider some well known cases. Initially i thought covering all cases would not be feasible, so just make the heuristic a bit more concrete. Then again, hopefully people will leet us know in case there are servers out there using different signing schemes.
kernel/src/engine/default/parquet.rs
Outdated
use crate::arrow::array::builder::{MapBuilder, MapFieldNames, StringBuilder}; | ||
use crate::arrow::array::{BooleanArray, Int64Array, RecordBatch, StringArray}; | ||
use crate::engine::arrow_data::ArrowEngineData; | ||
use crate::engine::arrow_utils::{fixup_parquet_read, generate_mask, get_requested_indices}; | ||
use crate::engine::default::executor::TaskExecutor; | ||
use crate::engine::parquet_row_group_skipping::ParquetRowGroupSkipping; | ||
use crate::parquet::arrow::arrow_reader::{ | ||
ArrowReaderMetadata, ArrowReaderOptions, ParquetRecordBatchReaderBuilder, | ||
}; | ||
use crate::parquet::arrow::arrow_writer::ArrowWriter; | ||
use crate::parquet::arrow::async_reader::{ParquetObjectReader, ParquetRecordBatchStreamBuilder}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually hesitate to move these crate::{arrow/parquet} items down here since they aren't really our crate (just re-exports). I generally group my imports into one of: std, external, and crate. I would advocate for having a new grouping just for arrow/parquet (or alternatively just leave in external group)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes a lot of sense. reverted for now to leave that for a future decision ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, thanks!
@@ -173,4 +202,22 @@ mod tests { | |||
let engine = DefaultEngine::new(store, Arc::new(TokioBackgroundExecutor::new())); | |||
test_arrow_engine(&engine, &url); | |||
} | |||
|
|||
#[test] | |||
fn test_pre_signed_url() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: unit test for case insensitive comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added :)
kernel/src/engine/default/mod.rs
Outdated
// note signed permission (sp) must always be present | ||
self | ||
.query_pairs().any(|(k, _)| k.eq_ignore_ascii_case("sp")) || | ||
// <https://cloud.google.com/storage/docs/authentication/signatures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: stray angled bracket <
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with some minor nits!
What changes are proposed in this pull request?
Current checks for a pre-signed url in our readers may mis-classify object store URLs that do not use store specific schemes (
s3://
maz://
...) as pre-signed urls. We extend the current checks, to also consider the presence of a query string and define a helper trait to ensure we can maintain the check in one place.How was this change tested?
current unit tests.