Skip to content

Conversation

@marcelm
Copy link
Collaborator

@marcelm marcelm commented Dec 4, 2024

Profiling suggested that the Index::is_filtered() call is a bit slow. It checks whether a randstrobe occurs more often than filter_cutoff by accessing randstrobes[i] and randstrobes[i + filter_cutoff] and comparing the hashes.

The slowness could come from two cache misses because two quite far apart memory locations are read. To get rid of the second access, the idea is to use one bit within RefRandstrobe to store whether the item is filtered.

Somewhat unexpectedly, this does not improve speed. It does reduce cache misses according to perf stat -d, but this does not translate to a shorter runtime.

@marcelm
Copy link
Collaborator Author

marcelm commented Dec 6, 2024

After a couple of measurements on a different (10 years younger) machine, I can measure a difference - this PR makes mapping-only mode about 2% faster. (This comes at the expense of one less bit available for the hash, but this has very little impact.)

Base automatically changed from auxlen to main December 10, 2024 08:27
@ksahlin
Copy link
Owner

ksahlin commented Dec 11, 2024

Great! Don't we anyway have B top bits available to store other things because of our prefix vector? This depends of course on that the bit is added after the sorted vector has been produced.

@marcelm
Copy link
Collaborator Author

marcelm commented Dec 17, 2024

Great! Don't we anyway have B top bits available to store other things because of our prefix vector? This depends of course on that the bit is added after the sorted vector has been produced.

Right, good point! I have the impression the filter bit would better fit in those upper bits anyway. Let me update the PR later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants