Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 5, 2025

Which issue does this PR close?

Rationale for this change

@etseidl suggested:

A further optimization would be to change ColumnOrder::get_sort_order to take an Option<&LogicalType>, but that is a breaking API change.

What changes are included in this PR?

  1. Deprecate ColumnOrder::get_sort_order
  2. Add ColumnOrder::sort_order_for_type` that returns the same value

Are these changes tested?

Yes by CI

Note I was not able to observe any changes in benchmark performance with this, so I am not sure it it worth doing

Are there any user-facing changes?

There is a new API but I think there not many people use this (low level) API so the disruption should be minimal

@alamb alamb marked this pull request as draft November 5, 2025 15:56
@github-actions github-actions bot added the parquet Changes to the parquet crate label Nov 5, 2025
@alamb alamb changed the title Alamb/more refs Avoid copying LogicalType in ColumnOrder::get_sort_order Nov 5, 2025
@etseidl
Copy link
Contributor

etseidl commented Nov 5, 2025

If we deprecate the logical_type getters in this PR, then that's justification enough for the new function.

I agree this isn't an earth shaking performance enhancer 😄 But it might have a small effect on the metadata decoding.

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2025

If we deprecate the logical_type getters in this PR, then that's justification enough for the new function.

I can certainly do that

@alamb alamb changed the title Avoid copying LogicalType in ColumnOrder::get_sort_order Avoid copying LogicalType in ColumnOrder::get_sort_order, deprecate get_logical_type Nov 5, 2025

impl ColumnOrder {
/// Returns sort order for a physical/logical type.
#[deprecated(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is one deprecation

///
/// Note that this function will clone the `LogicalType`. If performance is a concern,
/// use [`Self::logical_type_ref`] instead.
#[deprecated(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just straight up deprecated the non reference version

@alamb alamb changed the title Avoid copying LogicalType in ColumnOrder::get_sort_order, deprecate get_logical_type [Parquet] Avoid copying LogicalType in ColumnOrder::get_sort_order, deprecate get_logical_type Nov 5, 2025
@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2025

If we deprecate the logical_type getters in this PR, then that's justification enough for the new function.

I deprecated it

@alamb alamb marked this pull request as ready for review November 5, 2025 23:02
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb!

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/more_refs (2b6b876) to 40300ca diff
BENCH_NAME=metadata
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench metadata
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_more_refs
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2025

🤖: Benchmark completed

Details

group                             alamb_more_refs                        main
-----                             ---------------                        ----
decode parquet metadata           1.01      9.5±0.03µs        ? ?/sec    1.00      9.4±0.05µs        ? ?/sec
decode parquet metadata (wide)    1.00     43.0±0.21ms        ? ?/sec    1.03     44.1±0.18ms        ? ?/sec
open(default)                     1.00     10.0±0.12µs        ? ?/sec    1.01     10.0±0.11µs        ? ?/sec
open(page index)                  1.01    169.4±0.81µs        ? ?/sec    1.00    168.2±0.47µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/more_refs (2b6b876) to 40300ca diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_more_refs
Results will be posted here when complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants