-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make features configurable #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR makes the feature set of the string offsets library configurable via a compile‐time configuration type. It introduces a new config module with configuration types (AllConfig and OnlyLines), adapts the StringOffsets API to be generic over a configuration parameter, and updates benchmark and bitrank implementations accordingly.
- Introduces a generic configuration parameter for StringOffsets.
- Adds a new configuration module (config.rs) to control feature enablement.
- Updates the BitRank builder and benchmark files to reflect these changes.
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
crates/string-offsets/src/lib.rs | General API update for StringOffsets to support configurable features |
crates/string-offsets/src/config.rs | New module defining configuration types (AllConfig, OnlyLines) |
crates/string-offsets/src/bitrank.rs | Updates to bit setting mechanisms and capacity handling in BitRankBuilder |
crates/string-offsets/benchmarks/performance.rs | Benchmarks updated to use new construction APIs |
crates/string-offsets/Cargo.toml | Added benchmark dependency and configuration |
crates/bpe/benchmarks/performance.rs | Benchmarks updated to use the new rand API |
Files not reviewed (1)
- crates/string-offsets/benchmarks/Cargo.toml.bak: Language not supported
Comments suppressed due to low confidence (2)
crates/string-offsets/src/lib.rs:295
- Changing the calculation in utf8_to_char by adding 1 to byte_number may cause off-by-one errors. Please verify that this adjustment correctly preserves the intended offset conversion behavior for all edge cases.
self.utf8_to_char.rank(byte_number + 1) - 1
crates/string-offsets/src/bitrank.rs:47
- Switching from a runtime assert to a debug_assert may allow out-of-bound indices to go unchecked in production. Consider whether this check is critical and if it should remain a runtime assert.
debug_assert!(index < BITS_PER_BLOCK);
Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the bitrank changes, but looked at the rest. The type level config trick is nice to support per-instance variations! The lib changes look good. We could simplify the wasm module more, now that it's separate.
Edit: This seems to include changes that are supposed to be merged as part of #58 already? At least the bitrank changes in that PR look the same....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry -- today was a little crazy. I'll get this reviewed tomorrow. Just a few early comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
To revisit my earlier comments about whether type-level config is necessary: I see that it causes Rust to emit custom builds of new_converter
.
I wonder what happens if the config is a plain old struct with boolean fields, and new_converter(config: Config, content: &[u8])
and StringOffsets::new_with_config(config: Config, content: &str)
are both marked #[inline(always)]
. When the caller calls it with a constant config, seems like all the branches should be optimized away, without type-level booleans.
Admittedly, it's impossible to be sure without trying it. :-\
No description provided.