Skip to content

Conversation

friendlymatthew
Copy link
Contributor

Rationale for this change

This PR adds support for bulk insertion of Option<Variant> values into a VariantArray

Similar to other builders such as GenericByteViewBuilder, it now implements the Extend trait

For example:

let mut b = VariantArrayBuilder::new(2);
b.extend([None, Some(Variant::Null)];
let v = b.build();

@friendlymatthew
Copy link
Contributor Author

cc @alamb @scovich

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

At first I worried about variant objects and field ids, but after looking at the code I believe they should Just Work: VariantValueBuilder::append_object recursively rewrites the whole object in order to reassign field ids.

Comment on lines +168 to +171
match v {
Some(v) => self.append_variant(v),
None => self.append_null(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better or worse than the current code?

Suggested change
match v {
Some(v) => self.append_variant(v),
None => self.append_null(),
}
v.map_or_else(|| self.append_null(), |v| self.append_variant(v))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a match statement is more readable IMO

@scovich
Copy link
Contributor

scovich commented Oct 14, 2025

One question: Are there any existing places in the code that would benefit from this new capability?
If so, we might want to consider updating them so we get code coverage right away.

@friendlymatthew
Copy link
Contributor Author

One question: Are there any existing places in the code that would benefit from this new capability? If so, we might want to consider updating them so we get code coverage right away.

Hi @scovich, long time no talk! I hope you've been well and thank you for your help with the shredding 🔥 . I'm not too sure, I added an example of extend that should come up in a doc test, plus a couple more unit tests

Fwiw, I plan on using this quite heavily in datafusion-variant

Btw, I'd be curious to get your thoughts on the crate in general! See: datafusion-contrib/datafusion-variant#2

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too -- thank you @friendlymatthew and @scovich

@scovich
Copy link
Contributor

scovich commented Oct 14, 2025

One question: Are there any existing places in the code that would benefit from this new capability? If so, we might want to consider updating them so we get code coverage right away.

I'm not too sure, I added an example of extend that should come up in a doc test, plus a couple more unit tests

A quick code search turns up two candidates:

  • benches/variant_kernels.rs, create_primitive_variant_array - it currently does something very inefficient (fortunately outside the actual benchmarking loop) and could be simplified to just:
    variant_builder.extend(std::iter::repeat_with(|| rng.random::<i64>()).take(size))
    (this assumes you update the impl to take V: Into<Variant> instead of Variant, see other comment.
  • src/shred_variant.rs, the loop in create_test_variant_array can be replaced with an extend call

BTW, do you foresee enough new+extend+build triples that we should also consider adding implementations of From and/or FromIter for VariantArray?

Hi @scovich, long time no talk!

Howdy! 👋

Comment on lines +176 to +177
impl<'m, 'v> Extend<Option<Variant<'m, 'v>>> for VariantArrayBuilder {
fn extend<T: IntoIterator<Item = Option<Variant<'m, 'v>>>>(&mut self, iter: T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized... we probably want two impl Extend:

impl<'m, 'v, V: Into<Variant<'m, 'v>> Extend<V> for VariantArrayBuilder {
    fn extend<T: IntoIterator<Item = V>>(&mut self, iter: T) {
        for v in iter {
            self.append_variant(v.into())
        }
    }
}

and

impl<'m, 'v, V: Into<Variant<'m, 'v>> Extend<Option<V>> for VariantArrayBuilder {
    fn extend<T: IntoIterator<Item = Option<V>>>(&mut self, iter: T) {
        for v in iter {
            match v {
                Some(v) => self.append_variant(v.into()),
                None => self.append_null(),
            }
        }
    }
}

I think that's typical for the other variant arrays, to capture both nullable and non-nullable data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if it's fine to just support the Option<T> case, as it follows the other builder Extend impls.

For example:

Copy link
Contributor Author

@friendlymatthew friendlymatthew Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But fwiw, the goal of this PR was to prepare #8606 for closing.

Now, here we would want to support both Vec<Variant> and Vec<Option<Variant>>

Copy link
Contributor

@scovich scovich Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's fine to just support the Option<T> case, as it follows the other builder Extend impls.

I mean, any support is better than none. But we have existing use cases in the code for both Option<T> and T.

There seems to be an odd split here:

So maybe you're right, that to mirror existing conventions we should not Extend<Variant>. And should potentially consider adding the two From<Vec<...>>?

But then we basically end up implementing Extend<Variant> inside From<Vec<Variant>> itself... 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of why the (primitive) arrays have impl From<Vec<...>> is so they can re-use the Vec allocations

I don't think that is relevant for Variants as the in memory representation of a Vec<Variant> is not the same as a VariantArray

Copy link
Contributor

@scovich scovich Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense for e.g. From<Vec<bool>>. They don't need an Extend (not even a hidden one) because they just take over the input and are done.

Meanwhile, I guess From<Vec<Option<bool>>> will use Extend<Option<bool>> under the hood, because that Vec can't be directly used?

@alamb
Copy link
Contributor

alamb commented Oct 15, 2025

As everyone seems to agree this is a useful feature, I'll merge it in

I didn't completely follow the conversation in #8611 (comment) -- did we come to a conclusion if we should add other From impls? If so I can file a ticket to track it

@alamb alamb merged commit 1b17001 into apache:main Oct 15, 2025
17 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 15, 2025

Thank you @friendlymatthew and @scovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants