Skip to content

Commit

Permalink
Merge branch 'main' into expresions_visitor2
Browse files Browse the repository at this point in the history
  • Loading branch information
OussamaSaoudi-db committed Oct 24, 2024
2 parents 8530b8d + f5d0a42 commit 306c982
Show file tree
Hide file tree
Showing 27 changed files with 663 additions and 181 deletions.
27 changes: 27 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Bug report
description: Create a report to help us improve
labels: bug
body:
- type: textarea
attributes:
label: Describe the bug
placeholder: >
A clear and concise description of what the bug is.
validations:
required: true
- type: textarea
attributes:
label: To Reproduce
placeholder: >
Steps to reproduce the behavior:
- type: textarea
attributes:
label: Expected behavior
placeholder: >
What you expected to happen.
- type: textarea
attributes:
label: Additional context
placeholder: >
Add any other context about the problem here.
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Feature request
description: Suggest an idea for this project
labels: enhancement
body:
- type: textarea
attributes:
label: Please describe why this is necessary.
placeholder: >
A description of what the problem is, or what motivates the feature you are requesting
- type: textarea
attributes:
label: Describe the functionality you are proposing.
placeholder: >
A clear and concise description of what you want to happen.
- type: textarea
attributes:
label: Additional context
placeholder: >
Add any other context or screenshots about the feature request here.
33 changes: 33 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`.
3. Ensure you have added or run the appropriate tests for your PR.
4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'.
5. Be sure to keep the PR description updated to reflect all changes.
-->

## What changes are proposed in this pull request?
<!--
Please clarify what changes you are proposing and why the changes are needed.
The purpose of this section is to outline the changes, why they are needed, and how this PR fixes the issue.
If the reason for the change is already explained clearly in an issue, then it does not need to be restated here.
1. If you propose a new API or feature, clarify the use case for a new API or feature.
2. If you fix a bug, you can clarify why it is a bug.
-->

<!--
Uncomment this section if there are any changes affecting public APIs:
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->


## How was this change tested?
<!--
Please make sure to add test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested, ideally via a reproducible test documented in the PR description.
-->
88 changes: 88 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,93 @@
# Changelog

## [v0.4.0](https://github.com/delta-incubator/delta-kernel-rs/tree/v0.4.0/) (2024-10-23)

[Full Changelog](https://github.com/delta-incubator/delta-kernel-rs/compare/v0.3.1...v0.4.0)

**API Changes**

*Breaking*

1. `pub ScanResult.mask` field made private and only accessible as `ScanResult.raw_mask()` method [\#374]
2. new `ReaderFeatures` enum variant: `TypeWidening` and `TypeWideningPreview` [\#335]
3. new `WriterFeatures` enum variant: `TypeWidening` and `TypeWideningPreview` [\#335]
4. new `Error` enum variant: `InvalidLogPath` when kernel is unable to parse the name of a log path [\#347]
5. Module moved: `mod delta_kernel::transaction` -> `mod delta_kernel::actions::set_transaction` [\#386]
6. change `default-feature` to be none (removed `sync-engine` by default. If downstream users relied on this, turn on `sync-engine` feature or specific arrow-related feature flags to pull in the pieces needed) [\#339]
7. `Scan`'s `execute(..)` method now returns a lazy iterator instead of materializing a `Vec<ScanResult>`. You can trivially migrate to the new API (and force eager materialization by using `.collect()` or the like on the returned iterator) [\#340]
8. schema and expression FFI moved to their own `mod delta_kernel_ffi::schema` and `mod delta_kernel_ffi::expressions` [\#360]
9. Parquet and JSON readers in `Engine` trait now take `Arc<Expression>` (aliased to `ExpressionRef`) instead of `Expression` [\#364]
10. `StructType::new(..)` now takes an `impl IntoIterator<Item = StructField>` instead of `Vec<StructField>` [\#385]
11. `DataType::struct_type(..)` now takes an `impl IntoIterator<Item = StructField>` instead of `Vec<StructField>` [\#385]
12. removed `DataType::array_type(..)` API: there is already an `impl From<ArrayType> for DataType` [\#385]
13. `Expression::struct_expr(..)` renamed to `Expression::struct_from(..)` [\#399]
14. lots of expressions take `impl Into<Self>` or `impl Into<Expression>` instead of just `Self`/`Expression` now [\#399]
15. remove `log_replay_iter` and `process_batch` APIs in `scan::log_replay` [\#402]

*Additions*

1. remove feature flag requirement for `impl GetData` on `()` [\#334]
2. new `full_mask()` method on `ScanResult` [\#374]
3. `StructType::try_new(fields: impl IntoIterator<Item = StructField>)` [\#385]
4. `DataType::try_struct_type(fields: impl IntoIterator<Item = StructField>)` [\#385]
5. `StructField.metadata_with_string_values(&self) -> HashMap<String, String>` to materialize and return our metadata into a hashmap [\#331]

**Implemented enhancements:**

- support reading tables with type widening in default engine [\#335]
- add predicate to protocol and metadata log replay for pushdown [\#336] and [\#343]
- support annotation (macro) for nullable values in a container (for `#[derive(Schema)]`) [\#342]
- new `ParsedLogPath` type for better log path parsing [\#347]
- implemented row group skipping for default engine parquet readers and new utility trait for stats-based skipping logic [\#357], [\#362], [\#381]
- depend on wider arrow versions and add arrow integration testing [\#366] and [\#413]
- added semver testing to CI [\#369], [\#383], [\#384]
- new `SchemaTransform` trait and usage in column mapping and data skipping [\#395] and [\#398]
- arrow expression evaluation improvements [\#401]
- replace panics with `to_compiler_error` in macros [\#409]

**Fixed bugs:**

- output of arrow expression evaluation now applies/validates output schema in default arrow expression handler [\#331]
- add `arrow-buffer` to `arrow-expression` feature [\#332]
- fix bug with out-of-date last checkpoint [\#354]
- fixed broken sync engine json parsing and harmonized sync/async json parsing [\#373]
- filesystem client now always returns a sorted list [\#344]

[\#331]: https://github.com/delta-incubator/delta-kernel-rs/pull/331
[\#332]: https://github.com/delta-incubator/delta-kernel-rs/pull/332
[\#334]: https://github.com/delta-incubator/delta-kernel-rs/pull/334
[\#335]: https://github.com/delta-incubator/delta-kernel-rs/pull/335
[\#336]: https://github.com/delta-incubator/delta-kernel-rs/pull/336
[\#337]: https://github.com/delta-incubator/delta-kernel-rs/pull/337
[\#339]: https://github.com/delta-incubator/delta-kernel-rs/pull/339
[\#340]: https://github.com/delta-incubator/delta-kernel-rs/pull/340
[\#342]: https://github.com/delta-incubator/delta-kernel-rs/pull/342
[\#343]: https://github.com/delta-incubator/delta-kernel-rs/pull/343
[\#344]: https://github.com/delta-incubator/delta-kernel-rs/pull/344
[\#347]: https://github.com/delta-incubator/delta-kernel-rs/pull/347
[\#354]: https://github.com/delta-incubator/delta-kernel-rs/pull/354
[\#357]: https://github.com/delta-incubator/delta-kernel-rs/pull/357
[\#360]: https://github.com/delta-incubator/delta-kernel-rs/pull/360
[\#362]: https://github.com/delta-incubator/delta-kernel-rs/pull/362
[\#364]: https://github.com/delta-incubator/delta-kernel-rs/pull/364
[\#366]: https://github.com/delta-incubator/delta-kernel-rs/pull/366
[\#369]: https://github.com/delta-incubator/delta-kernel-rs/pull/369
[\#373]: https://github.com/delta-incubator/delta-kernel-rs/pull/373
[\#374]: https://github.com/delta-incubator/delta-kernel-rs/pull/374
[\#381]: https://github.com/delta-incubator/delta-kernel-rs/pull/381
[\#383]: https://github.com/delta-incubator/delta-kernel-rs/pull/383
[\#384]: https://github.com/delta-incubator/delta-kernel-rs/pull/384
[\#385]: https://github.com/delta-incubator/delta-kernel-rs/pull/385
[\#386]: https://github.com/delta-incubator/delta-kernel-rs/pull/386
[\#395]: https://github.com/delta-incubator/delta-kernel-rs/pull/395
[\#398]: https://github.com/delta-incubator/delta-kernel-rs/pull/398
[\#399]: https://github.com/delta-incubator/delta-kernel-rs/pull/399
[\#401]: https://github.com/delta-incubator/delta-kernel-rs/pull/401
[\#402]: https://github.com/delta-incubator/delta-kernel-rs/pull/402
[\#409]: https://github.com/delta-incubator/delta-kernel-rs/pull/409
[\#413]: https://github.com/delta-incubator/delta-kernel-rs/pull/413


## [v0.3.1](https://github.com/delta-incubator/delta-kernel-rs/tree/v0.3.1/) (2024-09-10)

[Full Changelog](https://github.com/delta-incubator/delta-kernel-rs/compare/v0.3.0...v0.3.1)
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ keywords = ["deltalake", "delta", "datalake"]
license = "Apache-2.0"
repository = "https://github.com/delta-incubator/delta-kernel-rs"
readme = "README.md"
version = "0.3.1"
version = "0.4.0"

[workspace.dependencies]
arrow = { version = ">=53, <54" }
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ asynchronous `Engine` implementation built with [Arrow] and [Tokio].
```toml
# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.3"
delta_kernel = "0.4"

# or turn on the default engine, based on arrow
delta_kernel = { version = "0.3", features = ["default-engine"] }
delta_kernel = { version = "0.4", features = ["default-engine"] }
```

### Feature flags
Expand Down Expand Up @@ -183,4 +183,4 @@ Some design principles which should be considered:
[cargo-llvm-cov]: https://github.com/taiki-e/cargo-llvm-cov
[FFI]: ffi/
[Arrow]: https://arrow.apache.org/rust/arrow/index.html
[Tokio]: https://tokio.rs/
[Tokio]: https://tokio.rs/
45 changes: 37 additions & 8 deletions derive-macros/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
use proc_macro2::{Ident, TokenStream};
use proc_macro2::{Ident, Span, TokenStream};
use quote::{quote, quote_spanned};
use syn::spanned::Spanned;
use syn::{parse_macro_input, Data, DataStruct, DeriveInput, Fields, Meta, PathArguments, Type};
use syn::{
parse_macro_input, Data, DataStruct, DeriveInput, Error, Fields, Meta, PathArguments, Type,
};

/// Parses a dot-delimited column name into an array of field names. See
/// [`delta_kernel::expressions::column_name::column_name`] macro for details.
#[proc_macro]
pub fn parse_column_name(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
let is_valid = |c: char| c.is_ascii_alphanumeric() || c == '_' || c == '.';
let err = match syn::parse(input) {
Ok(syn::Lit::Str(name)) => match name.value().chars().find(|c| !is_valid(*c)) {
Some(bad_char) => Error::new(name.span(), format!("Invalid character: {bad_char:?}")),
_ => {
let path = name.value();
let path = path.split('.').map(proc_macro2::Literal::string);
return quote_spanned! { name.span() => [#(#path),*] }.into();
}
},
Ok(lit) => Error::new(lit.span(), "Expected a string literal"),
Err(err) => err,
};
err.into_compile_error().into()
}

/// Derive a `delta_kernel::schemas::ToDataType` implementation for the annotated struct. The actual
/// field names in the schema (and therefore of the struct members) are all mandated by the Delta
Expand Down Expand Up @@ -63,7 +85,13 @@ fn gen_schema_fields(data: &Data) -> TokenStream {
fields: Fields::Named(fields),
..
}) => &fields.named,
_ => panic!("this derive macro only works on structs with named fields"),
_ => {
return Error::new(
Span::call_site(),
"this derive macro only works on structs with named fields",
)
.to_compile_error()
}
};

let schema_fields = fields.iter().map(|field| {
Expand All @@ -84,23 +112,24 @@ fn gen_schema_fields(data: &Data) -> TokenStream {
match &segment.arguments {
PathArguments::None => quote! { #segment_ident :: },
PathArguments::AngleBracketed(angle_args) => quote! { #segment_ident::#angle_args :: },
_ => panic!("Can only handle <> type path args"),
_ => Error::new(segment.arguments.span(), "Can only handle <> type path args").to_compile_error()
}
});
if have_schema_null {
if let Some(first_ident) = type_path.path.segments.first().map(|seg| &seg.ident) {
if first_ident != "HashMap" {
panic!("Can only use drop_null_container_values on HashMap fields, not {first_ident:?}");
return Error::new(
first_ident.span(),
format!("Can only use drop_null_container_values on HashMap fields, not {first_ident}")
).to_compile_error()
}
}
quote_spanned! { field.span() => #(#type_path_quoted),* get_nullable_container_struct_field(stringify!(#name))}
} else {
quote_spanned! { field.span() => #(#type_path_quoted),* get_struct_field(stringify!(#name))}
}
}
_ => {
panic!("Can't handle type: {:?}", field.ty);
}
_ => Error::new(field.span(), format!("Can't handle type: {:?}", field.ty)).to_compile_error()
}
});
quote! { #(#schema_fields),* }
Expand Down
2 changes: 1 addition & 1 deletion ffi/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ url = "2"
delta_kernel = { path = "../kernel", default-features = false, features = [
"developer-visibility",
] }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.3.1" }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.4.0" }

# used if we use the default engine to be able to move arrow data into the c-ffi format
arrow-schema = { version = "53.0", default-features = false, features = [
Expand Down
6 changes: 4 additions & 2 deletions ffi/src/expressions/engine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ use crate::{
ReferenceSet, TryFromStringSlice,
};
use delta_kernel::{
expressions::{BinaryOperator, Expression, UnaryOperator},
expressions::{BinaryOperator, ColumnName, Expression, UnaryOperator},
DeltaResult,
};

Expand Down Expand Up @@ -148,7 +148,9 @@ fn visit_expression_column_impl(
state: &mut KernelExpressionVisitorState,
name: DeltaResult<String>,
) -> DeltaResult<usize> {
Ok(wrap_expression(state, Expression::Column(name?)))
// TODO: FIXME: This is incorrect if any field name in the column path contains a period.
let name = ColumnName::new(name?.split('.')).into();
Ok(wrap_expression(state, name))
}

#[no_mangle]
Expand Down
2 changes: 1 addition & 1 deletion kernel/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ uuid = "1.10.0"
z85 = "3.0.5"

# bring in our derive macros
delta_kernel_derive = { path = "../derive-macros", version = "0.3.1" }
delta_kernel_derive = { path = "../derive-macros", version = "0.4.0" }

# used for developer-visibility
visibility = "0.1.1"
Expand Down
4 changes: 2 additions & 2 deletions kernel/examples/inspect-table/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -238,15 +238,15 @@ fn try_main() -> DeltaResult<()> {
}
}
Commands::Actions { forward } => {
let log_schema = Arc::new(get_log_schema().clone());
let log_schema = get_log_schema();
let actions = snapshot._log_segment().replay(
&engine,
log_schema.clone(),
log_schema.clone(),
None,
)?;

let mut visitor = LogVisitor::new(&log_schema);
let mut visitor = LogVisitor::new(log_schema);
for action in actions {
action?.0.extract(log_schema.clone(), &mut visitor)?;
}
Expand Down
18 changes: 14 additions & 4 deletions kernel/src/actions/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ use visitors::{AddVisitor, MetadataVisitor, ProtocolVisitor};
use self::deletion_vector::DeletionVectorDescriptor;
use crate::actions::schemas::GetStructField;
use crate::features::{ReaderFeatures, WriterFeatures};
use crate::{schema::StructType, DeltaResult, EngineData};
use crate::schema::{SchemaRef, StructType};
use crate::{DeltaResult, EngineData};

pub mod deletion_vector;
pub mod set_transaction;
Expand All @@ -28,7 +29,9 @@ pub(crate) const PROTOCOL_NAME: &str = "protocol";
pub(crate) const SET_TRANSACTION_NAME: &str = "txn";
pub(crate) const COMMIT_INFO_NAME: &str = "commitInfo";

static LOG_SCHEMA: LazyLock<StructType> = LazyLock::new(|| {
static LOG_ADD_SCHEMA: LazyLock<SchemaRef> =
LazyLock::new(|| StructType::new([Option::<Add>::get_struct_field(ADD_NAME)]).into());
static LOG_SCHEMA: LazyLock<SchemaRef> = LazyLock::new(|| {
StructType::new([
Option::<Add>::get_struct_field(ADD_NAME),
Option::<Remove>::get_struct_field(REMOVE_NAME),
Expand All @@ -40,14 +43,21 @@ static LOG_SCHEMA: LazyLock<StructType> = LazyLock::new(|| {
//Option::<Cdc>::get_struct_field(CDC_NAME),
//Option::<DomainMetadata>::get_struct_field(DOMAIN_METADATA_NAME),
])
.into()
});

#[cfg_attr(feature = "developer-visibility", visibility::make(pub))]
#[cfg_attr(not(feature = "developer-visibility"), visibility::make(pub(crate)))]
fn get_log_schema() -> &'static StructType {
fn get_log_schema() -> &'static SchemaRef {
&LOG_SCHEMA
}

#[cfg_attr(feature = "developer-visibility", visibility::make(pub))]
#[cfg_attr(not(feature = "developer-visibility"), visibility::make(pub(crate)))]
fn get_log_add_schema() -> &'static SchemaRef {
&LOG_ADD_SCHEMA
}

#[derive(Debug, Clone, PartialEq, Eq, Schema)]
pub struct Format {
/// Name of the encoding for files in this table
Expand Down Expand Up @@ -194,7 +204,7 @@ impl Add {
/// Since we always want to parse multiple adds from data, we return a `Vec<Add>`
pub fn parse_from_data(data: &dyn EngineData) -> DeltaResult<Vec<Add>> {
let mut visitor = AddVisitor::default();
data.extract(get_log_schema().project(&[ADD_NAME])?, &mut visitor)?;
data.extract(get_log_add_schema().clone(), &mut visitor)?;
Ok(visitor.adds)
}

Expand Down
Loading

0 comments on commit 306c982

Please sign in to comment.