Skip to content

Otimize duplicut for SSDs #22

@nil0x42

Description

@nil0x42

HDD vs SSD

On HDD, sequential access is relatively fast, while random access is terribly slow. That's why duplicut, written back in 2014 has been optimized thinking of it.
It made at that time no sense to have multiple threads reading concurrently a massive wordlist's content, so sequential access with a single thread was more performant when all lines could fit in hashmap at once.

Now we entered the SSD era, concurency could leverage great performance, as random access is way faster.

@solardiz suggested OpenMP, which would probably increase perf a lot.

TODO

  • compare duplicut/unique/rling on HDD to verify my assumption
  • compare duplicut/unique/rling on massive wordlist (>30GB)

@solardiz i'd love your suggestions & opinion about duplicut & ways to optimize 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions