Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Versions should use hashes #54

Open
martinheidegger opened this issue Dec 16, 2018 · 4 comments
Open

Discussion: Versions should use hashes #54

martinheidegger opened this issue Dec 16, 2018 · 4 comments

Comments

@martinheidegger
Copy link
Contributor

martinheidegger commented Dec 16, 2018

Currently in hyperdrive, hypercore, beaker browser (and probably at a few other tools) versions are specified as length of the append-log (a number). However, that is not a safe specification of a version.

Problem case: a researcher wants to specify exactly which version of a DAT is used, and specifies it like dat://ab...ef+234. The researcher notices that the data-set doesn't fit the output, reverts to version 1 and creates a new DAT with exactly 234 versions to fit the output. With this the researcher just managed to specify false claims.

How to make sure this never happens? Each version of a hypercore creates a hash.
Which makes one version of a hyperdrive a combinations of various hypercore versions.

Specifying a dat version like this though:

dat://<channel:64-hex-chars>+<metadata:64-hex-chars>+<content:64-hex-chars>

... for a single-writer-dat. Which would become even more of a hassle with a
multi-writer-dat (1 key for the channel + 2 hashes per writer). Note: I know that it could be okay to have only the first 8 characters as version identification, but that would probably not be good enough for a researcher.

Thinking about this for a little, I got following solution which might be a good idea for a new DEP:

(Single-writer for the sake of simplicity)

We could add another version hypercore to a hyperdrive, that keeps an index of the versions and hashes:

{
  string hash = 0; // Hash of the version (calculated by hashing all hashes in here)
  repeated string tags = 1; // Names to find this version by
  int32 metadataLength = 2; // Length of the metadata-core
  string metadataHash = 3; // Hash for the version on the metadata-core
  int32 contentLength = 4; // Length of the content-core
  string contentHash = 5; // Hash for the version of the content-core
}

This way a version checkout could download all versions of the version hypercore, create a lookup-table and select the version based on that lookup-table.

My questions now are:

  • Is this a reasonable approach? Do you know a better way to get that done?
  • How could a multi-writer version look like?
  • Should this be turned into a DEP?
@pfrazee
Copy link
Contributor

pfrazee commented Dec 17, 2018

Good ideas. Maf and I have been discussing this. I'll let maf comment on what you're suggesting but I'll dump what I know has been done in this area:

  1. Strongly-versioned links aka Strong Links. Links which include a content hash to verify their content. We have all the internal code needed for this IIRC, and it's been up to me to implement them. Our idea was to add another + to the version, so that it looked like this: dat://{pubkey}+{n}+{hash}/.
  2. Version tags. String identifiers that can be used to identify specific points in the history. We haven't sorted out yet how this would be done, because we've been waiting for multiwriter to land so that we fully understand those requirements.
  3. Multiwriter versioning. IIRC maf came up with a way to do this without it being a nightmare, but I can't recall what it was. @mafintosh do you recall what your versioning scheme is going to be? IIRC you had a versioning solution that wasn't a vector.

@jwerle
Copy link
Member

jwerle commented May 10, 2019

👋 just seeing this and the last working group notes. I put a little experiment together that tries to define a deep link based on hypercore strong links: https://github.com/jwerle/dat-deep-link
Happy to collaborate on any of this

[Edit] also happy to make the module conform to whatever ends up being the spec

@RangerMauve
Copy link
Contributor

Ping @mafintosh :)

@martinheidegger
Copy link
Contributor Author

martinheidegger commented Jun 5, 2019

Reference to the meeting notes: https://github.com/datprotocol/working-group/blob/master/meeting-notes/24-08May2019.md#meeting-notes

Take-aways:

  1. URL compatible
  2. Needs to work for a single-core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants