Skip to content

Chunk size distribution problem. #36

@SaveTheRbtz

Description

@SaveTheRbtz

I've tried to use restic's chunker instead of a FastCDC and noticed that compression ratios dropped substantially. Looking deeper into the issue I've found that most of the chunks produced were right at the lower bound of the MinSize (which is set pretty low for my use case: 16384).

I've narrowed down the issue to

if (digest&c.splitmask) == 0 || add >= maxSize {

Changing the code to use a different constant (e.g. == 1) fixes the problem. So the distribution of the digest values are likely to blame here. Math in the chunker is above my ability to review but I would assume that for chunker to work reasonably well digest should be uniformly distributed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions