-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Impl Extend
for VariantArrayBuilder
#8611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Variant] Impl Extend
for VariantArrayBuilder
#8611
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
At first I worried about variant objects and field ids, but after looking at the code I believe they should Just Work: VariantValueBuilder::append_object
recursively rewrites the whole object in order to reassign field ids.
match v { | ||
Some(v) => self.append_variant(v), | ||
None => self.append_null(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this better or worse than the current code?
match v { | |
Some(v) => self.append_variant(v), | |
None => self.append_null(), | |
} | |
v.map_or_else(|| self.append_null(), |v| self.append_variant(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a match
statement is more readable IMO
One question: Are there any existing places in the code that would benefit from this new capability? |
Hi @scovich, long time no talk! I hope you've been well and thank you for your help with the shredding 🔥 . I'm not too sure, I added an example of Fwiw, I plan on using this quite heavily in Btw, I'd be curious to get your thoughts on the crate in general! See: datafusion-contrib/datafusion-variant#2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too -- thank you @friendlymatthew and @scovich
A quick code search turns up two candidates:
BTW, do you foresee enough new+extend+build triples that we should also consider adding implementations of
Howdy! 👋 |
impl<'m, 'v> Extend<Option<Variant<'m, 'v>>> for VariantArrayBuilder { | ||
fn extend<T: IntoIterator<Item = Option<Variant<'m, 'v>>>>(&mut self, iter: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized... we probably want two impl Extend
:
impl<'m, 'v, V: Into<Variant<'m, 'v>> Extend<V> for VariantArrayBuilder {
fn extend<T: IntoIterator<Item = V>>(&mut self, iter: T) {
for v in iter {
self.append_variant(v.into())
}
}
}
and
impl<'m, 'v, V: Into<Variant<'m, 'v>> Extend<Option<V>> for VariantArrayBuilder {
fn extend<T: IntoIterator<Item = Option<V>>>(&mut self, iter: T) {
for v in iter {
match v {
Some(v) => self.append_variant(v.into()),
None => self.append_null(),
}
}
}
}
I think that's typical for the other variant arrays, to capture both nullable and non-nullable data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder if it's fine to just support the Option<T>
case, as it follows the other builder Extend
impls.
For example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But fwiw, the goal of this PR was to prepare #8606 for closing.
Now, here we would want to support both Vec<Variant>
and Vec<Option<Variant>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's fine to just support the
Option<T>
case, as it follows the other builderExtend
impls.
I mean, any support is better than none. But we have existing use cases in the code for both Option<T>
and T
.
There seems to be an odd split here:
- We have both
impl From<Vec<Option<bool>> for BooleanArray
andimpl From<Vec<bool>> for BooleanArray
- But we only have
impl FromIterator<Option<bool>> for BooleanArray
andimpl Extend<Option<bool> for BooleanBuilder
So maybe you're right, that to mirror existing conventions we should not Extend<Variant>
. And should potentially consider adding the two From<Vec<...>>
?
But then we basically end up implementing Extend<Variant>
inside From<Vec<Variant>>
itself... 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of why the (primitive) arrays have impl From<Vec<...>>
is so they can re-use the Vec allocations
I don't think that is relevant for Variants as the in memory representation of a Vec<Variant>
is not the same as a VariantArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense for e.g. From<Vec<bool>>
. They don't need an Extend
(not even a hidden one) because they just take over the input and are done.
Meanwhile, I guess From<Vec<Option<bool>>>
will use Extend<Option<bool>>
under the hood, because that Vec can't be directly used?
As everyone seems to agree this is a useful feature, I'll merge it in I didn't completely follow the conversation in #8611 (comment) -- did we come to a conclusion if we should add other From impls? If so I can file a ticket to track it |
Thank you @friendlymatthew and @scovich |
Rationale for this change
This PR adds support for bulk insertion of
Option<Variant>
values into aVariantArray
Similar to other builders such as
GenericByteViewBuilder
, it now implements theExtend
traitFor example: