Skip to content

Publish a design rationale / justification #88

@cessen

Description

@cessen

Large-output hashes like MeowHash aren't feasible to positively test for quality empirically due to their enormous state space. @NoHatCoder discussed this as well in an old issue, and it's fantastic to see that MeowHash isn't designed against SMHasher like so many other hash functions.

However, as the authors of a large-output hash, it's important not only to avoid relying on empirical test suites in the design process, but also to not rely on them when making quality claims. Instead, alongside the hash itself, you should provide an analysis that justifies the hash design with respect to quality. It doesn't need to be fancy, but it should be reasonably complete.

This is (very unfortunately) not yet standard practice for non-cryptographic hashes, but for large-output hashes it absolutely should be.

Since MeowHash is not yet declared final, I don't think this is urgent. But I wanted to poke you guys to see if providing a design justification is at least on the roadmap as a 1.0 target. It would be great if MeowHash could help set an example for how this ought to be done.


As an aside:

I've also developed a hash (TentHash) that's trying to set an example here. However, its design focuses more on simplictity and portability than performance, so people looking for extreme performance are unlikely to reach for it.

Part of my motivation for filing this issue is that I would really like to be able to recommend MeowHash to people who need or want higher performance than TentHash offers, but at the moment I can't do that. People currently often misguidedly reach for xxHash3's 128-bit variant in such cases, which is really unfortunate because it has some questionable design decisions and really shouldn't be used. Having MeowHash as a properly justified alternative would be a huge boon.

Aside-to-the-aside:

My own very cursory analysis of MeowHash 0.5 suggests that it's not conservative about quality, lacking sufficient mixing between the incorporation of input blocks. I may have missed something elsewhere in the design that accounts for that. If so, then that's an example of something that ought to be in a design justification document. If not, then that should be fixed. There's not much point in a large-output hash that's not conservative about quality.

I don't know if the 0.6 candidate hash functions (#82) address that or not, as I haven't taken a look yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions