Skip to content

[tracking] Kernelize! #3298

Open
2 of 12 issues completed
Open
Epic
2 of 12 issues completed
@roeap

Description

@roeap

Important

This is a living issue that we'll update with new issues / comments as we get more clarity on the concrete implementations.

Description

This is a tracking issue to align on and coordinate the integration of delta-kernel-rs into delta-rs.

Motivation

While tremendous strides have been made by the community to support more and more delta features in delta-rs, we are still lagging behind with more features on the way that user will want to leverage. This is exactly the use case the kernel libraries aim to address - a correct and complete implementation of the Delta protocol.

Kernel explicitly does not take an opinion on all io / execution related aspects that are needed to actually consume / work with delta tables. This is what delta-rs provides, leaving the current (high level) user facing APIs conceptually as is.

Execution

In simplified terms adopting kernel mean carving out the functionality that currently resides in

  • core/src/kernel (named so in preparation for being replaced by kernel)
  • core/src/protocol (mainly our snapshot code, that I wanted to update for quite a while now)
  • core/src/schema (only partition pruning remained in this module after previous updates)

At the heart of the migration is creating a new snapshot implementation (RFC in #3137) which provides all required machinery (the engine) to kernel and exposes methods tailored to the needs of delta-rs.

One potential avenue forward is to get the RFC merge-ready and merge it without being "hooked-up" to the rest of the crate. This PR also exposes a Snapshot trait (we already have something similar, but not quite fitting - I think) the we can hopefully leverage to refactor all the operations that require access to the snapshot - i.e. implement that trait for current snapshots ... This should hopefully surface any missing APIs in kernel that we may yet require for full adoption.

Challenges

  • kernel currently only has very limited write paths support, so we'll have to keep maintaining that for now. However we can motivate the API designs based on our needs.
  • In terms of feature support there is no full overlap as of now. e.g. kernel supports deletion vectors, which delta-rs does not, but delta-rs supports generated columns, which are not yet part of kernel and still require some designs (i.e. how to handle arbitrary SQL that an engine will need to parse).
  • While kernel offers great opportunities for performance enhancements, there are several areas that might take an initial hit until we can implement performance optimizations that work well with kernel. These mainly relate to less frequently requests actions such as Txn, CommtInfo ...

Any feedback / concers around proceeding with this is highly appreciated.

Related Work

PRs cannot be tracked as sub-issues

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions