Skip to content

Make vendor data more static #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 22, 2025
Merged

Conversation

quininer
Copy link
Contributor

This PR is not yet completed. Its goal is to make browserslist-rs data completely static to reduce resident memory and binary size.

This is a big refactor, so I opened a draft to get early feedback on whether this is a good direction.

Current implemented

  1. use static slice instead of lazy + cell
  2. more use of binary search for .get()
  3. introduction PooledStr to reduce data size (reduce the size of relocation section and compress type from u64*2 to u32*2 )

The refactor of features and region is not yet complete. The data size of these two is quite large, and implement them as code will cause serious compile time regression.
I expected to implement it using include_bytes! and Aligned trick (like https://jack.wrenn.fyi/blog/include-transmute/), which would look a bit ugly, but have good compile time and runtime performance.

Copy link
Member

@g-plane g-plane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just look them at a glance for early feedback.

@quininer
Copy link
Contributor Author

Also, since the data was converted from json to u32seq binary, our binary size regresses at this point. because json is actually more compact for small numbers. json only uses 3 bytes of extra space for each entry (","). but PooledStr requires 8 bytes per entry for index.

We could use elias-fano encoding as an index to make it smaller than json, but i'm not sure the complexity is worth it. I'm hope that the disadvantage can be offset by more string dedup after implement refactor region mod.

@quininer
Copy link
Contributor Author

By packing PooledStr to 4 bytes, we are now better than json. but this will limit to not allow strings longer than 255. Our strings are mainly browser version numbers, and we are safe as long as no browser publishes version numbers with pi or e.

For reference (without optimizations beyond --release), our .wasm is 1M smaller than before.

commit: ba07c7c
3.5M ../target/wasm32-unknown-unknown/release/browserslist.wasm

commit: 19945b9
4.5M ../target/wasm32-unknown-unknown/release/browserslist.wasm

@quininer quininer force-pushed the static-map branch 2 times, most recently from d8500fa to bed014b Compare June 18, 2025 07:44

pub static CANIUSE_GLOBAL_USAGE: &[(&'static str, &'static str, f32)] =
pub static CANIUSE_GLOBAL_USAGE: &[(PooledStr, PooledStr, f32)] =
include!("../generated/caniuse-global-usage.rs");

pub static BROWSER_VERSION_ALIASES: LazyLock<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are not completely static, we still have two legacy LazyLock. but their data size is not large, so the memory usage is acceptable.

@quininer quininer marked this pull request as ready for review June 18, 2025 08:00
@quininer
Copy link
Contributor Author

quininer commented Jun 18, 2025

I did some simple bench (quininer@e95425a) and I think we have not an order of magnitude regression in performance. I checked that most of our data is retrieval scaled between 10-200, and using binary search at this scale is not significantly slower than hashmap.

"> 0.5%", "last 2 versions", "Firefox ESR", "not dead"
before:
simple query time: [10.756 µs 10.834 µs 10.979 µs]
after:
simple query time: [10.771 µs 10.781 µs 10.791 µs]

"> 0.5%", "last 2 versions", "Firefox ESR", "not dead", "supports objectrtc"
before:
simple query time: [20.729 µs 20.816 µs 20.931 µs]
after
simple query time: [39.584 µs 39.662 µs 39.755 µs]

quininer added 3 commits June 19, 2025 15:08
Currently generate-data uses hashmap data,
which cause in a different string order each time.
Switch to btreemap will help stabilize this.
Copy link
Member

@g-plane g-plane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@g-plane g-plane merged commit 526599c into browserslist:main Jun 22, 2025
2 checks passed
@quininer quininer deleted the static-map branch June 22, 2025 09:38
@quininer quininer changed the title Make vendor data completely static Make vendor data more static Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants