Skip to content

Commit

Permalink
Merge branch 'main' into read-table-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
OussamaSaoudi-db authored Jan 14, 2025
2 parents 4ca2f36 + 12020d8 commit 86f70c8
Show file tree
Hide file tree
Showing 46 changed files with 416 additions and 80 deletions.
13 changes: 13 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ Thanks for sending a pull request! Here are some tips for you:
5. Be sure to keep the PR description updated to reflect all changes.
-->

<!--
PR title formatting:
This project uses conventional commits: https://www.conventionalcommits.org/
Each PR corresponds to a commit on the `main` branch, with the title of the PR (typically) being
used for the commit message on main. In order to ensure proper formatting in the CHANGELOG please
ensure your PR title adheres to the conventional commit specification.
Examples:
- new feature PR: "feat: new API for snapshot.update()"
- bugfix PR: "fix: correctly apply DV in read-table example"
-->

## What changes are proposed in this pull request?
<!--
Please clarify what changes you are proposing and why the changes are needed.
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
.idea/
.vscode/
.vim
.zed

# Rust
.cargo/
Expand Down
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,39 @@
# Changelog

## [v0.6.1](https://github.com/delta-io/delta-kernel-rs/tree/v0.6.1/) (2025-01-10)

[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.6.0...v0.6.1)


### 🚀 Features / new APIs

1. New feature flag `default-engine-rustls` ([#572])

### 🐛 Bug Fixes

1. Allow partition value timestamp to be ISO8601 formatted string ([#622])
2. Fix stderr output for handle tests ([#630])

### ⚙️ Chores/CI

1. Expand the arrow version range to allow arrow v54 ([#616])
2. Update to CodeCov @v5 ([#608])

### Other

1. Fix msrv check by pinning `home` dependency ([#605])
2. Add release script ([#636])


[#605]: https://github.com/delta-io/delta-kernel-rs/pull/605
[#608]: https://github.com/delta-io/delta-kernel-rs/pull/608
[#622]: https://github.com/delta-io/delta-kernel-rs/pull/622
[#630]: https://github.com/delta-io/delta-kernel-rs/pull/630
[#572]: https://github.com/delta-io/delta-kernel-rs/pull/572
[#616]: https://github.com/delta-io/delta-kernel-rs/pull/616
[#636]: https://github.com/delta-io/delta-kernel-rs/pull/636


## [v0.6.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.6.0/) (2024-12-17)

[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.5.0...v0.6.0)
Expand Down
28 changes: 16 additions & 12 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,24 @@ license = "Apache-2.0"
repository = "https://github.com/delta-io/delta-kernel-rs"
readme = "README.md"
rust-version = "1.80"
version = "0.6.0"
version = "0.6.1"

[workspace.dependencies]
arrow = { version = ">=53, <54" }
arrow-arith = { version = ">=53, <54" }
arrow-array = { version = ">=53, <54" }
arrow-buffer = { version = ">=53, <54" }
arrow-cast = { version = ">=53, <54" }
arrow-data = { version = ">=53, <54" }
arrow-ord = { version = ">=53, <54" }
arrow-json = { version = ">=53, <54" }
arrow-select = { version = ">=53, <54" }
arrow-schema = { version = ">=53, <54" }
parquet = { version = ">=53, <54", features = ["object_store"] }
# When changing the arrow version range, also modify ffi/Cargo.toml which has
# its own arrow version ranges witeh modified features. Failure to do so will
# result in compilation errors as two different sets of arrow dependencies may
# be sourced
arrow = { version = ">=53, <55" }
arrow-arith = { version = ">=53, <55" }
arrow-array = { version = ">=53, <55" }
arrow-buffer = { version = ">=53, <55" }
arrow-cast = { version = ">=53, <55" }
arrow-data = { version = ">=53, <55" }
arrow-ord = { version = ">=53, <55" }
arrow-json = { version = ">=53, <55" }
arrow-select = { version = ">=53, <55" }
arrow-schema = { version = ">=53, <55" }
parquet = { version = ">=53, <55", features = ["object_store"] }
object_store = { version = ">=0.11, <0.12" }
hdfs-native-object-store = "0.12.0"
hdfs-native = "0.10.0"
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Delta-kernel-rs is split into a few different crates:
- kernel: The actual core kernel crate
- acceptance: Acceptance tests that validate correctness via the [Delta Acceptance Tests][dat]
- derive-macros: A crate for our [derive-macros] to live in
- ffi: Functionallity that enables delta-kernel-rs to be used from `C` or `C++` See the [ffi](ffi)
- ffi: Functionality that enables delta-kernel-rs to be used from `C` or `C++` See the [ffi](ffi)
directory for more information.

## Building
Expand Down Expand Up @@ -43,10 +43,10 @@ consumer's own `Engine` trait, the kernel has a feature flag to enable a default
```toml
# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.6"
delta_kernel = "0.6.1"

# or turn on the default engine, based on arrow
delta_kernel = { version = "0.6", features = ["default-engine"] }
delta_kernel = { version = "0.6.1", features = ["default-engine"] }
```

### Feature flags
Expand All @@ -66,12 +66,12 @@ are still unstable. We therefore may break APIs within minor releases (that is,
we will not break APIs in patch releases (`0.1.0` -> `0.1.1`).

## Arrow versioning
If you enable the `default-engine` or `sync-engine` features, you get an implemenation of the
If you enable the `default-engine` or `sync-engine` features, you get an implementation of the
`Engine` trait that uses [Arrow] as its data format.

The [`arrow crate`](https://docs.rs/arrow/latest/arrow/) tends to release new major versions rather
quickly. To enable engines that already integrate arrow to also integrate kernel and not force them
to track a specific version of arrow that kernel depends on, we take as broad dependecy on arrow
to track a specific version of arrow that kernel depends on, we take as broad dependency on arrow
versions as we can.

This means you can force kernel to rely on the specific arrow version that your engine already uses,
Expand All @@ -96,7 +96,7 @@ arrow-schema = "53.0"
parquet = "53.0"
```

Note that unfortunatly patching in `cargo` requires that _exactly one_ version matches your
Note that unfortunately patching in `cargo` requires that _exactly one_ version matches your
specification. If only arrow "53.0.0" had been released the above will work, but if "53.0.1" where
to be released, the specification will break and you will need to provide a more restrictive
specification like `"=53.0.0"`.
Expand All @@ -111,7 +111,7 @@ and then checking what version of `object_store` it depends on.
## Documentation

- [API Docs](https://docs.rs/delta_kernel/latest/delta_kernel/)
- [arcitecture.md](doc/architecture.md) document describing the kernel architecture (currently wip)
- [architecture.md](doc/architecture.md) document describing the kernel architecture (currently wip)

## Examples

Expand Down
3 changes: 3 additions & 0 deletions acceptance/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ readme.workspace = true
version.workspace = true
rust-version.workspace = true

[package.metadata.release]
release = false

[dependencies]
arrow-array = { workspace = true }
arrow-cast = { workspace = true }
Expand Down
6 changes: 1 addition & 5 deletions acceptance/src/data.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ pub fn sort_record_batch(batch: RecordBatch) -> DeltaResult<RecordBatch> {
Ok(RecordBatch::try_new(batch.schema(), columns)?)
}

// Ensure that two schema have the same field names, and dict_id/ordering.
// Ensure that two schema have the same field names, and dict_is_ordered
// We ignore:
// - data type: This is checked already in `assert_columns_match`
// - nullability: parquet marks many things as nullable that we don't in our schema
Expand All @@ -72,10 +72,6 @@ fn assert_schema_fields_match(schema: &Schema, golden: &Schema) {
schema_field.name() == golden_field.name(),
"Field names don't match"
);
assert!(
schema_field.dict_id() == golden_field.dict_id(),
"Field dict_id doesn't match"
);
assert!(
schema_field.dict_is_ordered() == golden_field.dict_is_ordered(),
"Field dict_is_ordered doesn't match"
Expand Down
70 changes: 70 additions & 0 deletions cliff.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# git-cliff configuration file. see https://git-cliff.org/docs/configuration

[changelog]
header = """
# Changelog\n
"""
# Tera template
body = """
## [v{{ version }}](https://github.com/delta-io/delta-kernel-rs/tree/v{{ version }}/) ({{ timestamp | date(format="%Y-%m-%d") }})
[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/{{ previous.version }}...v{{ version }})
{% for group, commits in commits | group_by(attribute="group") %}
### {{ group | striptags | trim | upper_first }}
{% for commit in commits %}
{{ loop.index }}. {% if commit.scope %}*({{ commit.scope }})* {% endif %}\
{{ commit.message | split(pat="\n") | first | upper_first | replace(from="(#", to="([#")\
| replace(from="0)", to="0])")\
| replace(from="1)", to="1])")\
| replace(from="2)", to="2])")\
| replace(from="3)", to="3])")\
| replace(from="4)", to="4])")\
| replace(from="5)", to="5])")\
| replace(from="6)", to="6])")\
| replace(from="7)", to="7])")\
| replace(from="8)", to="8])")\
| replace(from="9)", to="9])") }}\
{% endfor %}
{% endfor %}
{% for commit in commits %}
{% set message = commit.message | split(pat="\n") | first %}\
{% set pr = message | split(pat="(#") | last | split(pat=")") | first %}\
[#{{ pr }}]: https://github.com/delta-io/delta-kernel-rs/pull/{{ pr }}\
{% endfor %}\n\n\n
"""
footer = """
"""
# remove the leading and trailing s
trim = true
postprocessors = []

[git]
# parse the commits based on https://www.conventionalcommits.org
conventional_commits = true
# filter out the commits that are not conventional
filter_unconventional = false
# process each line of a commit as an individual commit
split_commits = false
# regex for preprocessing the commit messages
commit_preprocessors = []
# regex for parsing and grouping commits. note that e.g. both doc and docs are matched since we have
# trim = true above.
commit_parsers = [
{ field = "github.pr_labels", pattern = "breaking-change", group = "<!-- 0 --> 🏗️ Breaking changes" },
{ message = "^feat", group = "<!-- 1 -->🚀 Features / new APIs" },
{ message = "^fix", group = "<!-- 2 -->🐛 Bug Fixes" },
{ message = "^doc", group = "<!-- 3 -->📚 Documentation" },
{ message = "^perf", group = "<!-- 4 -->⚡ Performance" },
{ message = "^refactor", group = "<!-- 5 -->🚜 Refactor" },
{ message = "^test", group = "<!-- 6 -->🧪 Testing" },
{ message = "^chore|^ci", group = "<!-- 7 -->⚙️ Chores/CI" },
{ message = "^revert", group = "<!-- 8 -->◀️ Revert" },
{ message = ".*", group = "<!-- 9 -->Other" },
]
# filter out the commits that are not matched by commit parsers
filter_commits = false
# sort the tags topologically
topo_order = false
# sort the commits inside sections by oldest/newest order
sort_commits = "oldest"
3 changes: 3 additions & 0 deletions feature-tests/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ repository.workspace = true
readme.workspace = true
version.workspace = true

[package.metadata.release]
release = false

[dependencies]
delta_kernel = { path = "../kernel" }

Expand Down
3 changes: 3 additions & 0 deletions ffi-proc-macros/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ readme.workspace = true
rust-version.workspace = true
version.workspace = true

[package.metadata.release]
release = false

[lib]
proc-macro = true

Expand Down
11 changes: 7 additions & 4 deletions ffi/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ version.workspace = true
rust-version.workspace = true
build = "build.rs"

[package.metadata.release]
release = false

[lib]
crate-type = ["lib", "cdylib", "staticlib"]

Expand All @@ -21,16 +24,16 @@ url = "2"
delta_kernel = { path = "../kernel", default-features = false, features = [
"developer-visibility",
] }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.0" }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.1" }

# used if we use the default engine to be able to move arrow data into the c-ffi format
arrow-schema = { version = "53.0", default-features = false, features = [
arrow-schema = { version = ">=53, <55", default-features = false, features = [
"ffi",
], optional = true }
arrow-data = { version = "53.0", default-features = false, features = [
arrow-data = { version = ">=53, <55", default-features = false, features = [
"ffi",
], optional = true }
arrow-array = { version = "53.0", default-features = false, optional = true }
arrow-array = { version = ">=53, <55", default-features = false, optional = true }

[build-dependencies]
cbindgen = "0.27.0"
Expand Down
4 changes: 2 additions & 2 deletions ffi/examples/read-table/arrow.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ static GArrowRecordBatch* add_partition_columns(
}

GArrowArray* partition_col = garrow_array_builder_finish((GArrowArrayBuilder*)builder, &error);
if (report_g_error("Can't build string array for parition column", error)) {
if (report_g_error("Can't build string array for partition column", error)) {
printf("Giving up on column %s\n", col);
g_error_free(error);
g_object_unref(builder);
Expand Down Expand Up @@ -144,7 +144,7 @@ static void add_batch_to_context(
}
record_batch = add_partition_columns(record_batch, partition_cols, partition_values);
if (record_batch == NULL) {
printf("Failed to add parition columns, not adding batch\n");
printf("Failed to add partition columns, not adding batch\n");
return;
}
context->batches = g_list_append(context->batches, record_batch);
Expand Down
2 changes: 1 addition & 1 deletion ffi/examples/read-table/read_table.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ void print_partition_info(struct EngineContext* context, const CStringMap* parti
}

// Kernel will call this function for each file that should be scanned. The arguments include enough
// context to constuct the correct logical data from the physically read parquet
// context to construct the correct logical data from the physically read parquet
void scan_row_callback(
void* engine_context,
KernelStringSlice path,
Expand Down
2 changes: 1 addition & 1 deletion ffi/src/engine_funcs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ impl Drop for FileReadResultIterator {
}
}

/// Call the engine back with the next `EngingeData` batch read by Parquet/Json handler. The
/// Call the engine back with the next `EngineData` batch read by Parquet/Json handler. The
/// _engine_ "owns" the data that is passed into the `engine_visitor`, since it is allocated by the
/// `Engine` being used for log-replay. If the engine wants the kernel to free this data, it _must_
/// call [`free_engine_data`] on it.
Expand Down
2 changes: 1 addition & 1 deletion ffi/src/expressions/kernel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ pub struct EngineExpressionVisitor {
/// Visit a 64bit timestamp belonging to the list identified by `sibling_list_id`.
/// The timestamp is microsecond precision with no timezone.
pub visit_literal_timestamp_ntz: VisitLiteralFn<i64>,
/// Visit a 32bit intger `date` representing days since UNIX epoch 1970-01-01. The `date` belongs
/// Visit a 32bit integer `date` representing days since UNIX epoch 1970-01-01. The `date` belongs
/// to the list identified by `sibling_list_id`.
pub visit_literal_date: VisitLiteralFn<i32>,
/// Visit binary data at the `buffer` with length `len` belonging to the list identified by
Expand Down
4 changes: 2 additions & 2 deletions ffi/src/handle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
//! boundary.
//!
//! Creating a [`Handle<T>`] always implies some kind of ownership transfer. A mutable handle takes
//! ownership of the object itself (analagous to [`Box<T>`]), while a non-mutable (shared) handle
//! takes ownership of a shared reference to the object (analagous to [`std::sync::Arc<T>`]). Thus, a created
//! ownership of the object itself (analogous to [`Box<T>`]), while a non-mutable (shared) handle
//! takes ownership of a shared reference to the object (analogous to [`std::sync::Arc<T>`]). Thus, a created
//! handle remains [valid][Handle#Validity], and its underlying object remains accessible, until the
//! handle is explicitly dropped or consumed. Dropping a mutable handle always drops the underlying
//! object as well; dropping a shared handle only drops the underlying object if the handle was the
Expand Down
2 changes: 1 addition & 1 deletion ffi/src/scan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ struct ContextWrapper {
/// data which provides the data handle and selection vector as each element in the iterator.
///
/// # Safety
/// engine is responsbile for passing a valid [`ExclusiveEngineData`] and selection vector.
/// engine is responsible for passing a valid [`ExclusiveEngineData`] and selection vector.
#[no_mangle]
pub unsafe extern "C" fn visit_scan_data(
data: Handle<ExclusiveEngineData>,
Expand Down
2 changes: 1 addition & 1 deletion ffi/src/test_ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ use delta_kernel::{
/// output expression can be found in `ffi/tests/test_expression_visitor/expected.txt`.
///
/// # Safety
/// The caller is responsible for freeing the retured memory, either by calling
/// The caller is responsible for freeing the returned memory, either by calling
/// [`free_kernel_predicate`], or [`Handle::drop_handle`]
#[no_mangle]
pub unsafe extern "C" fn get_testing_kernel_expression() -> Handle<SharedExpression> {
Expand Down
4 changes: 2 additions & 2 deletions integration-tests/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ fn create_kernel_schema() -> delta_kernel::schema::Schema {
fn main() {
let arrow_schema = create_arrow_schema();
let kernel_schema = create_kernel_schema();
let convereted: delta_kernel::schema::Schema =
let converted: delta_kernel::schema::Schema =
delta_kernel::schema::Schema::try_from(&arrow_schema).expect("couldn't convert");
assert!(kernel_schema == convereted);
assert!(kernel_schema == converted);
println!("Okay, made it");
}
Loading

0 comments on commit 86f70c8

Please sign in to comment.