Skip to content

Improve MEDIUM_LINE_BYTES guessing with heuristic #26

@nil0x42

Description

@nil0x42

MEDIUM_LINE_BYTES is currently hardcorded in const.h, to a value of 8.
The hasmap & chunks chunks are then made in such way that if real medium length of lines is MEDIUM_LINE_BYTES, the hashmap will be filled by a factor defined by HMAP_LOAD_FACTOR (currently set to 0.5, for 50% hmap filling).

Therefore, we could read some random pages in the file (e.g: start/middle/end of file), and get a better guess of MEDIUM_LINE_BYTES from there.

It would greatly improve performance in wordlists with a lot of very long lines (for example, a list of md5).
Because if lines are 32bytes long, hmap will be filled 12.5% only (50%/2/2). And a lot more chunks are needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementimportantperf impactedThis issues impacts performance of duplicut (either positively of negatively)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions