-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: re-export name mapping #1116
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jdockerty for this pr, generally LGTM! Left some comments to improve.
@@ -32,9 +36,12 @@ pub struct NameMapping { | |||
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)] | |||
#[serde(rename_all = "kebab-case")] | |||
pub struct MappedField { | |||
/// Iceberg field ID when a field's name is present within `names`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we should add a MappingFields like what we did in java. MappingFields
is a list of fields with index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this into mapped_fields.rs
🙇
Edit: it looks like my latest commits aren't showing up and are still being processed by GitHub after approx 10minutes. If we're running into a GitHub outage, the current diff is viewable here.This has resolved after about an hour, ignore.
a9ff2b9
to
85b024e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jdockerty for this pr, generally LGTM! Left some comments to improve.
/// Utility mapping which contains field names to IDs and | ||
/// field IDs to the underlying [`MappedField`]. | ||
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)] | ||
pub struct MappedFields { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we are going to add a lot of visitors for NameMapping
, how about we create a NameMapping
module, and puts everything related there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean, there is already a name_mapping.rs
as a separate module. Or do you mean include everything in this file instead and use 👇 ?
mod name_mapping {
// contents of name_mapping.rs here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes mapped field should also be included in name_mapping.rs. You can move everything from mapped_fields
to name_mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why we need to have the MappedFields
field. In the pyiceberg implementation it doesnt use it (the initial implementation/review is here apache/iceberg-python#212). There isn't a usecase (unless there is) where we just index the first layer for MappedFields
without having to later create another index based on a full traversal. cc @liurenjie1024 @Fokko
|
||
impl MappedFields { | ||
/// Create a new [`MappedFields`]. | ||
pub fn new(fields: Vec<MappedField>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value should be Result
, user passed value maybe wrong.
|
||
for field in &fields { | ||
if let Some(id) = field.field_id() { | ||
id_to_field.insert(id, field.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check duplication of id
and name
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me 👍
Does this matter that it differs from the Java impl here?
I modelled this based on the Java impl and it doesn't look like they have duplicate checks there, perhaps I'm missing something very obvious though from not doing much Java 😆
Edit: I've implemented this in dea509b for now, it is easy to change if there's something wrong with it 👍
/// Iceberg fallback field name to ID mapping. | ||
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)] | ||
#[serde(transparent)] | ||
pub struct NameMapping { | ||
pub root: Vec<MappedField>, | ||
root: Vec<MappedField>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should contains a MappedFields
#[serde(default)] | ||
#[serde(skip_serializing_if = "Vec::is_empty")] | ||
#[serde_as(deserialize_as = "DefaultOnNull")] | ||
pub fields: Vec<MappedField>, | ||
fields: Vec<MappedField>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a MappedFields
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By changing this one to MappedFields
, this alters all of the expected output JSON too.
I assume that is expected for now and I'll update the other tests 👍
@jdockerty @liurenjie1024 I believe #1072 contains a lot of the functionality in this pr, this got split into #1082 being the first part of it. |
Hi, @jonathanc-n I think #1072 added extra functionality into like visitor, indexing into |
Which issue does this PR close?
Likely helps towards #919 and this was also discussed in Slack.
What changes are included in this PR?
This publicly re-exports the
name_mapping
module toiceberg::spec
. Prior to this, it is private and inaccessible outside of this crate.This also includes a
MappedFields
structure, which borrows heavily from the Java implementation.Are these changes tested?
The main changes here are not functional changes except to visibility.
The new
MappedFields
structure has basic test coverage.