Skip to content

feat(frecency): use blake3 hash as db key for accesses #1908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 15, 2025

Conversation

otakenz
Copy link
Contributor

@otakenz otakenz commented Jun 14, 2025

  • Replace raw CompletionItemKey as database key with a blake3 hash of its bincode serialization in FrecencyTracker.
  • Update get_accesses and access methods to use the hashed key.
  • Add key_to_hash_bytes helper for consistent key hashing.
  • Add blake3 and bincode as dependencies in Cargo.toml.
  • Update Cargo.lock to reflect new dependencies.

We replace the raw struct key with a hash to ensure database keys are fixed-size and uniform, which improves storage efficiency and lookup performance. This also avoids issues with variable-length or complex struct keys, and makes the database more robust against future changes to the key's serialization format.

Closes #1905
Closes #1895
Closes #1832
Closes #1727

Steps

  1. I temporary do this synchronously to get and observe the error thrown here
    fuzzy/init.lua
local rust = require('blink.cmp.fuzzy.rust')
rust.access(trimmed_item)  -- main thread

vim.uv
  .new_work(
    function(itm, cpath)
      package.cpath = cpath
      -- do fuzzy matching only, no mlua-bound calls here
    end,
    function() end
  )
  :queue(require('string.buffer').encode(trimmed_item), package.cpath)
  1. From here, I understand that the issue is due to directly serializing the "CompletionItemKey" as key to the LMDB.
    LMDB has a limit 511 bytes for key size, and will throw exception if larger than that, which is the case of CompletionItemKey with the combination of AI Code companion like Github Copilot.
runtime error: Failed to write to frecency database: MDB_BAD_VALSIZE: Unsupported size of key/DB name/data, or wrong DUPFIXED size
stack traceback:
	[C]: in function 'access'
	...l/share/nvim/lazy/blink.cmp/lua/blink/cmp/fuzzy/init.lua:57: in function 'access'
	.../lazy/blink.cmp/lua/blink/cmp/completion/accept/init.lua:76: in function 'apply_item'
	.../lazy/blink.cmp/lua/blink/cmp/completion/accept/init.lua:109: in function 'default_implementation'
	...zy/blink.cmp/lua/blink/cmp/sources/lib/provider/init.lua:161: in function 'execute'
	...e/nvim/lazy/blink.cmp/lua/blink/cmp/sources/lib/init.lua:237: in function <...e/nvim/lazy/blink.cmp/lua/blink/cmp/sources/lib/init.lua:224>
	[C]: in function 'pcall'
	...al/share/nvim/lazy/blink.cmp/lua/blink/cmp/lib/async.lua:107: in function 'cb'
	...al/share/nvim/lazy/blink.cmp/lua/blink/cmp/lib/async.lua:189: in function 'on_completion'
	...al/share/nvim/lazy/blink.cmp/lua/blink/cmp/lib/async.lua:106: in function <...al/share/nvim/lazy/blink.cmp/lua/blink/cmp/lib/async.lua:105>
	[C]: in function 'pcall'
	...al/share/nvim/lazy/blink.cmp/lua/blink/cmp/lib/async.lua:72: in function 'new'
	...al/share/nvim/lazy/blink.cmp/lua/blink/cmp/lib/async.lua:105: in function 'map'
	.../lazy/blink.cmp/lua/blink/cmp/completion/accept/init.lua:101: in function <.../lazy/blink.cmp/lua/blink/cmp/completion/accept/init.lua:88>
	...re/nvim/lazy/blink.cmp/lua/blink/cmp/completion/list.lua:271: in function 'accept'
	.../.local/share/nvim/lazy/blink.cmp/lua/blink/cmp/init.lua:134: in function <.../.local/share/nvim/lazy/blink.cmp/lua/blink/cmp/init.lua:134>
  1. Therefore, hashing the key is the approach to this problem, this ticket implements that.
  2. For me, I verified using the "fuzzy.rs" files, at the end of the file, I type "pub fn" and on my case my Github Copilot generates a very large functions which will crash the app, and after this fix, it no longer crashes.

@otakenz otakenz force-pushed the feat/frecency_hash_key branch from c255ff5 to 81a73ec Compare June 14, 2025 19:50
- Replace raw CompletionItemKey as database key with a blake3 hash of
its bincode serialization in FrecencyTracker.
- Update get_accesses and access methods to use the hashed key.
- Add key_to_hash_bytes helper for consistent key hashing.
- Add blake3 and bincode as dependencies in Cargo.toml.
- Update Cargo.lock to reflect new dependencies.

We replace the raw struct key with a hash to ensure database keys are
fixed-size and uniform, which improves storage efficiency and lookup
performance. This also avoids issues with variable-length or complex
struct keys, and makes the database more robust against future changes
to the key's serialization format.
@otakenz otakenz force-pushed the feat/frecency_hash_key branch from 81a73ec to 730912b Compare June 14, 2025 19:53
@Saghen Saghen merged commit 9c007ae into Saghen:main Jun 15, 2025
3 checks passed
@Saghen
Copy link
Owner

Saghen commented Jun 15, 2025

Brilliant work, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants