-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add a crate for HeapSize trait #9138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR introduces two new crates: - `arrow-memory-size`: Contains the `HeapSize` trait for estimating heap memory usage, with implementations for std types, arrow-buffer types, and all arrow-array types. - `arrow-memory-size-derive`: Provides `#[derive(HeapSize)]` proc macro for automatically implementing HeapSize on structs and enums. The HeapSize infrastructure was previously in the parquet crate. Moving it to a standalone crate allows users who don't use parquet to depend on memory size estimation utilities. **Breaking Change**: The `#[derive(HeapSize)]` macro has been removed from `parquet_derive`. Users should switch to `arrow-memory-size-derive`. The parquet crate re-exports `HeapSize` from `arrow_memory_size` for backward compatibility. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Move HeapSize implementations from arrow-memory-size to their respective crates (arrow-buffer, arrow-array) - arrow-memory-size now contains only the trait and std impls - Re-export HeapSize and HeapSizeDerive from arrow::util - Add derive macro attributes: #[heap_size(ignore)], #[heap_size(size = N)], #[heap_size(size_fn = path)] - Add type coverage: tuples (up to 12), arrays [T; N], Mutex, RwLock - Add README files with comparison to deepsize and get-size2 crates This allows users who don't use arrow to depend on just arrow-memory-size for the trait, while arrow crates implement HeapSize for their own types. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove prescriptive recommendations about which crates to use - Explain motivations: minimal API, customizable Arc/Rc semantics, Arrow integration - Simplify comparison section to just explain our approach Co-Authored-By: Claude Opus 4.5 <[email protected]>
Tests for the HeapSize derive macro are not parquet-specific, so move them from parquet_derive_test to arrow-memory-size-derive/tests/. Co-Authored-By: Claude Opus 4.5 <[email protected]>
arrow-array/src/heap_size.rs
Outdated
| impl HeapSize for StringArray { | ||
| fn heap_size(&self) -> usize { | ||
| self.get_buffer_memory_size() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be a blanket implementation:
impl<T> HeapSize for T
where
T: ArrayProc-macro crates cannot resolve intra-doc links to external crates, causing rustdoc errors for the arrow_memory_size::HeapSize references. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace individual HeapSize implementations with a macro to reduce code duplication. A blanket impl<T: Array> is not possible due to Rust's orphan rules (E0210). Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
I am not sure we need a new crate just for the HeapSize trait in a downstream crate (DataFUsion) What about making a trait like this in DataFusion common: pub trait DFHeapSize {
...
}And then providing a blanket implementation for impl <T: HeapSize> for DFHeapSize {
...
}That way we can use the arrow implementations, but we don't have to make a whole new crate etc |
| arrow-data = { version = "57.2.0", path = "./arrow-data" } | ||
| arrow-ipc = { version = "57.2.0", path = "./arrow-ipc" } | ||
| arrow-json = { version = "57.2.0", path = "./arrow-json" } | ||
| arrow-memory-size = { version = "57.2.0", path = "./arrow-memory-size" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to do this I would suggest we put it in arrow-buffer or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to just move it there instead of a new crate, certainly easier
|
I agree we can move this into |
|
What do we think about just adding the new crate in DataFusion? Or is there more to this PR that I am missing? |
|
I haven't had a chance to look through it (2k lines is a lot!) |
The rationale is that we want to use it in DataFusion but don't want to force a dependency on Parquet to use this trait.