Skip to content

borg2: implement new chunker? #8841

@ThomasWaldmann

Description

@ThomasWaldmann

#8803 opens the door for a unique opportunity: re-chunking while doing a borg2 transfer (which will be required anyway for transferring archives from borg1 repos to borg2 repos).

So, if borg2 gets a new chunker before it is released, we could use it there and convert relatively painlessly.

Usually one can not easily switch to a new chunker within an existing repo:

  • new-chunked chunks of identical files do not deduplicate with old-chunked ones
  • thus, space usage doubles as long as old-chunked archives are present (== for a very long time in usual pruning scenarios)

Requirements for new chunker:

  • little to no C code, rather Cython, Python. (*)
  • better security properties than buzhash, see https://github.com/borgbackup/borg/wiki/CDC-issues-reported-2025
  • not too slow, preferably similarly fast or faster than buzhash
  • better to maintain code (buzhash is too much C)
  • could be a separate project (like borghash, borgstore, now borgchunk(er)?)

(*) in the borg codebase. nothing against a well-maintained chunker library with more low-level code that is external.

Existing chunkers in borg:

  • fixed (fixed block size, relatively simple, fast, Python/Cython, can support sparse files efficiently)
  • buzhash (variable block size, CDC, complex, hard to maintain C code, no sparse files support)

Chunker tickets:

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions