Skip to content

GH-45332: [C++] Improve build time through forward Declarations #46692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Ziy1-Tan
Copy link
Contributor

@Ziy1-Tan Ziy1-Tan commented Jun 3, 2025

Rationale for this change

When reviewing the output of ClangBuildAnalyzer, it appears that the arrow/array/data.h header is considered to be the most expensive include:

**** Expensive headers:
157903 ms: /home/simple/code/cpp/arrow/cpp/src/arrow/array/data.h (included 388 times, avg 406 ms), included via:
  78x: array.h array_base.h 
  21x: array_base.h 
  21x: exec_plan.h exec.h 
  15x: test_util.h extension_type.h array_base.h 
  10x: api_scalar.h datum.h 
  10x: api_aggregate.h datum.h 
  ...

122962 ms: /home/simple/code/cpp/arrow/cpp/build/_deps/googletest-src/googletest/include/gtest/gtest.h (included 235 times, avg 523 ms), included via:
  124x: <direct include>
  45x: gmock.h gmock-actions.h gmock-internal-utils.h 
  29x: gmock-matchers.h gmock-internal-utils.h 
  10x: gtest_util.h 
  6x: test_util_internal.h gmock.h gmock-actions.h gmock-internal-utils.h 
  3x: test_common.h gtest_util.h 
  ...

113797 ms: /home/simple/code/cpp/arrow/cpp/src/arrow/array/statistics.h (included 390 times, avg 291 ms), included via:
  78x: array.h array_base.h data.h 
  21x: array_base.h data.h 
  21x: exec_plan.h exec.h data.h 
  15x: test_util.h extension_type.h array_base.h data.h 
  10x: api_aggregate.h datum.h data.h 
  10x: api_scalar.h datum.h data.h 
  ...

What changes are included in this PR?

  • Replace "data.h" header with ArrayData foward declarations
  • Replace "statistics.h" header with ArrayStatisticsfoward declarations

Are these changes tested?

Manually build pass.

Are there any user-facing changes?

No, only an API improvement.

@pitrou
Copy link
Member

pitrou commented Jun 3, 2025

Thanks for trying this out @Ziy1-Tan . I agree with the general goal of reducing compile times.

However, the problem is that this un-inlining some methods such as IsValid where it can be critical to have good performance. So I'm not sure these changes are a good idea.

Maybe other developers can give an opinion on this. cc @zanmato1984 @wgtmac

@wgtmac
Copy link
Member

wgtmac commented Jun 4, 2025

I agree with @pitrou that these functions might be repeatedly called so their performance is critical. If we have good benchmark result to demonstrate an acceptable or negligible performance penalty.

Specific to the IsValid function, if the downstream really cares about its performance, it should directly query the bitmap in a batched fashion.

@Ziy1-Tan
Copy link
Contributor Author

Ziy1-Tan commented Jun 4, 2025

@pitrou @wgtmac Thank your guideline. It is indeed necessary to benchmark these critical functions.

@pitrou
Copy link
Member

pitrou commented Jun 4, 2025

Specific to the IsValid function, if the downstream really cares about its performance, it should directly query the bitmap in a batched fashion.

Ideally yes, but IsValid is the quick go-to solution with acceptable performance and an intuitive programming model (handling one value at a time). Processing the bitmap in a batched fashion is more involved and less "natural".

@zanmato1984
Copy link
Contributor

Thanks for the attempt to improve compile time. Before jumping into the trade-off between compile time and runtime efficiency (inlining), I'm wondering how much is the compile time reduction by this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants