Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: integrate AuthDecode #634

Open
wants to merge 118 commits into
base: authdecode_2024
Choose a base branch
from
Open

Conversation

themighty1
Copy link
Member

@themighty1 themighty1 commented Oct 10, 2024

This PR integrates AuthDecode into the notarization protocol and also makes accommodating changes to how the plaintext hashes are handled in tlsn-core.
AuthDecode is marked as experimental and is feature-gated.

Depends on #479 being merged into dev first.

(I had trouble basing this PR on PR479, so I'm basing it on dev)

sinui0 and others added 30 commits February 8, 2024 13:52
* refactor: selective disclosure api

* remove incomplete substring proof API

* remove unnecessary type annotation

* simplify tests

* switch from unit structs to empty structs

* skip committing empty strings

* fix notary server test

* rename RecordKind to MessageKind

* update json commit error doc

* commits -> commits to

* update commit_array doc

* function argument doc styling

* Update tlsn/tlsn-core/src/proof/substrings.rs

Co-authored-by: dan <[email protected]>

---------

Co-authored-by: dan <[email protected]>
* Use env var for logging filter, remove otel.

* Fix directives.

* Revert to using config for logging filter.

* Modify default logging strategy and make filter optional.

* Revert formatting of other crates.

* Update README.

* Update notary-server/README.md

Co-authored-by: Hendrik Eeckhaut <[email protected]>

---------

Co-authored-by: Hendrik Eeckhaut <[email protected]>
* remove dead argument doc

* remove another dead argument doc
)

* feat(tlsn-formats): default commit to entire http request/response

* refactor(tlsn-formats): avoid duplicate HTTP commitments, add test fixtures
* add notary function to examples lib

* use hyper 1.1 version in examples

* update twitter example to use HTTP prover

* use deferred decryption in twitter example
* chore: bump tlsn-utils version

* chore: bump mpz version

* bump mpz
* bump version to v0.1.0-alpha.4

* set package version for tlsn-formats
feat: basic html info response for notary server's root endpoint

Co-authored-by: Christopher Chong <[email protected]>
* feat(tls-mpc): separate transcript size limits

* feat: separate transcript limits

* feat(tlsn-server-fixture): configurable length byte payload

* refactor(tls-mpc): use defaults in ghash setup

* fix OT estimates

* feat(notary-server): separate transcript limits

* remove dep patch

* fix notary server test
* Update repo readme.

* Doc: Added minor improvements + a link to other repos

#353

---------

Co-authored-by: Hendrik Eeckhaut <[email protected]>
* feat: automated network benches

* Update tlsn/benches/src/metrics.rs

Co-authored-by: dan <[email protected]>

* remove explicit drops

* remove unnecessary sudo

---------

Co-authored-by: dan <[email protected]>
* feat: implement record layer preprocessing

* fix ke test

* fix pa tests

* fix aead tests

* fix integration test

* Apply suggestions from code review

Co-authored-by: dan <[email protected]>

* add mode sanity check

---------

Co-authored-by: dan <[email protected]>
* Remove cargo/bin from PATH

* Modify script to run only in nightly env

* Modify script to stop the oldest version in stable env

* Modify script to support dir preparation for the 3 latest stable versions

* Modify script to start service for the 3 latest stable versions

* Modify sercice validation script

* Create proxy modification script

* Add step in workflow to enable ssm execution against proxy + aux script

* Add running state filter when fetching InstanceID

* Enhancement of validation script

* Modify bash behavior

* Point tags/deployment to new AWS resources

* Change GH owner to production one

* Point tags to new EC2 v1

* Move all cd scripts to a new folder

* Add comment

* Add comment

* Add comment

* Add comment

* Modify scripts to support exit on error

* Check if all stable ports are in use and terminate
* Add branches info in readme.

* Correct branch links.
* Add hot reload of api key, remove prover ip, move html static text.

* Add documentation.

* Toggle back config, add comments.

* Edit comment and html info.

* Edit comment.

* Change to sync mutex.
ci: Update rust cache in GitHub action and do not skip draft PRs
* docs: fix style in components (except tls)

* Update components/cipher/stream-cipher/src/lib.rs

Co-authored-by: Hendrik Eeckhaut <[email protected]>

* Update components/universal-hash/src/ghash/ghash_core/mod.rs

Co-authored-by: Hendrik Eeckhaut <[email protected]>

---------

Co-authored-by: Hendrik Eeckhaut <[email protected]>
heeckhau and others added 6 commits October 3, 2024 07:18
* Add tests for signing, index.

* Add error scenarios.

* Add cert tests, modify previous tests.

* Improve cert tests.

* Add tests for request.

* Fix clippy

* Fix clippy.

* Change requests test style.

* Add attestation unit tests.

* Formatting.

* Clippy.

* make data fixtures optional

---------

Co-authored-by: yuroitaki <>
Co-authored-by: sinu <[email protected]>
… url path (#614)

* (fix: client) Fixed client issue of being able to implement the path for the url

* (feat: client) Improved the code to adjust for feedback received as well as extend the path calculation to avoid adding a `/` when already starts with a `/`

* (fix: client) Fixed client issue of being able to implement the path for the url

* (feat: client) Improved the code to adjust for feedback received as well as extend the path calculation to avoid adding a `/` when already starts with a `/`

* Update crates/notary/client/src/client.rs

Co-authored-by: yuroitaki <[email protected]>

* (fix: client) Renamed `path` to `path_prefix`

* (fix: client) Remove condition on the URL

* (chore: client) Fix formating

---------

Co-authored-by: yuroitaki <[email protected]>
Copy link
Member

@sinui0 sinui0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

I think we can make some simplifications, and avoid leaking details into core which should not be aware of the authdecode protocol at all. core should only interact with plaintext data and hashes.

.map(|bytes| {
let mut bytes = bytes.to_vec();
// Reverse to little-endian.
bytes.reverse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? The bytes do not have an endianness, they do not encode an integer. Can this implementation detail be moved behind the authdecode API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary for F::from_bytes below which expects LE.

My thinking was that for the devs who will use the hash in their circom circuits, it will be conceptually simpler to think that the 31-byte chunk of plaintext must be viewed as a BE/MSB0 bitstring which represent the field element.
That's why I purposely stick to BE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detail should be handled behind the authdecode API. From the perspective of the core lib, it gives a slice of the plaintext and it gets a hash back. It shouldn't be aware

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -236,6 +244,11 @@ pub trait HashAlgorithm {

/// Computes the hash of the provided data with a prefix.
fn hash_prefixed(&self, prefix: &[u8], data: &[u8]) -> Hash;

/// Computes the hash of the provided blinded data.
fn hash_blinded(&self, data: &Blinded<Vec<u8>>) -> Hash {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant. The hash method should be sufficient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because currently when hashing the blinded plaintext, it gets serialized first, e.g. here

if commitment.hash.value != alg.hash_canonical(&self.data) {

This means that some serialization data will be present in the pre-image. Which means that my Poseidon hasher cannot reliably know the last 16 bytes of the pre-image is the blinder.

Recall that the Poseidon hasher places the blinder in a separate field element, so it must be able to separate it from the plaintext. (The need to put the blinder into a separate field element is a limitation of the halo2 circuit. This limitation can potentially be lifted but requires redesigning the circuit.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense that we can't use the hash_canonical but instead of adding a hash_blinded method we can just use hash and construct the preimage as needed, ie data | blinder

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait is a user-facing extension point, and adding this method would force them to be aware of how to serialize Blinded<Vec<u8>> but we decide that. By matter of convention we can say all preimages are data | blinder

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would probably be easier to switch to blinder | data at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize, are you suggesting that in this place
if commitment.hash.value != alg.hash_canonical(&self.data) {
we should check if the hash alg requires AuthDecode, and if so, then we use hash(data|blinder).
For all other algs, we keep using hash_canonical(&self.data), correct?

#[cfg(feature = "authdecode_unsafe_common")]
/// Maximum number of bytes which can be committed to using a zk-friendly hash.
#[builder(default = "0")]
max_zk_friendly_hash_data: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is kind of clunky, perhaps max_authdecode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is overkill to add single-range to the crate name. That can just be communicated via the API and changed in future versions as needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting calling it e.g. authdecode-transcript and to only allow it to be instantiated with a single range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just call it authdecode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I called it transcript

}

/// An encoder of a TLS transcript.
pub struct TranscriptEncoder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. The encodings can be provided directly via the API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an approach in mind?
Currently in crates/core/src/transcript/encoding/encoder.rs encode_subsequence returns active encodings and is pub(crate), but I need full encodings.

crates/core/src/transcript/encoding/encoder.rs Outdated Show resolved Hide resolved
@@ -62,7 +59,7 @@ impl PlaintextHashProof {
) -> Result<(Direction, Subsequence), PlaintextHashProofError> {
let alg = provider.get(&commitment.hash.alg)?;

if commitment.hash.value != alg.hash_canonical(&self.data) {
if commitment.hash.value != alg.hash_blinded(&self.data) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already blinded, hash_canonical is correct. Is this an issue with how the authdecode circuit structures the preimage?

crates/core/src/transcript/proof.rs Show resolved Hide resolved
@@ -151,7 +170,7 @@ impl<'a> TranscriptProofBuilder<'a> {
pub(crate) fn new(
transcript: &'a Transcript,
encoding_tree: Option<&'a EncodingTree>,
plaintext_hashes: &'a Index<PlaintextHashSecret>,
plaintext_hashes: &'a Option<Index<PlaintextHashSecret>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just accept an empty index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed more straightforward to reason: "I don't want to send you anything, so Im sending you None".

crates/core/src/transcript/proof.rs Outdated Show resolved Hide resolved
@themighty1 themighty1 changed the base branch from dev to authdecode_2024 October 24, 2024 09:21
@themighty1
Copy link
Member Author

@sinui0, I'd be happy to not leak authdecode details into core but the prover crate needs access to the fields of the request and the secrets in order to build authdecode inputs.

let mut authdecode_prover = authdecode_prover(
&request,
&secrets,
&*encoding_provider,
&transcript,
max,
)?;

Those fields are marked as pub(crate), so the only solution was to leak authdecode into core.

I couldn't think of a simple solution to this, lmk if you have anything in mind.

@sinui0
Copy link
Member

sinui0 commented Oct 24, 2024

Those fields are marked as pub(crate), so the only solution was to leak authdecode into core.

I couldn't think of a simple solution to this, lmk if you have anything in mind.

Decouple authdecode from the attestation. The prover can instead send as a separate message the ranges they want to commit using authdecode. They run the protocol beforehand, afterwards the Notary holds the hashes and inserts them into the attestation. Remember that we will also want to use this for the P2P case

@themighty1
Copy link
Member Author

@sinui0 , do you have any suggestions what approach you think is best here:

vm.finalize().await?;
let attestation: Attestation = io.expect_next().await?;

we need the prover to get the seed
let seed = vm.finalize().await?;

and then the seed should be passed to the caller who will generate and send AuthDecode proofs in an external context while the prover is awaiting the attestation on line 87.

Also we need a similar change in Verifier<Notarize>::finalize() - the verifier should not send the attestation until some external context signals that it is safe to do so.

@sinui0
Copy link
Member

sinui0 commented Oct 28, 2024

We can add a new message which the Prover sends to the Verifier to signal they want hash commitments (based on the TranscriptCommitConfig).

They start the commit phase of authdecode, finalize the VM, then finalize authdecode. Afterwards the Prover sends the attestation request.

@themighty1
Copy link
Member Author

@sinui0 , but the attestation request contains commitments required before finalization. The attestation request must be sent before finalizing the VM.
We can work around it by having the prover first commit to the attestation request and then the rest of the steps will be like you describe above. wdyt.

@sinui0
Copy link
Member

sinui0 commented Oct 29, 2024

We can work around it by having the prover first commit to the attestation request and then the rest of the steps will be like you describe above. wdyt.

Let's remove the encoding commitment from the request and send them earlier. The Notary can add them when building the attestation

@themighty1
Copy link
Member Author

themighty1 commented Oct 31, 2024

@sinui0 , what's the best approach for allocating io for the AuthDecode protocol?
I was hoping to add this to Prover<Notarize>

    /// Allocates a new io.
    pub async fn allocate_io(&self, io_name: String) -> Io {
        self.state.mux_ctrl.open_framed(&io_name).await.unwrap()
    }

The problem is that Prover<Notarize>::finalize closes yamux, which will make it impossible for AuthDecode to finalize using the allocated io.
Any good solutions to this?

@sinui0
Copy link
Member

sinui0 commented Oct 31, 2024

I'm not sure I understand. Authdecode can be communicated over the existing io channel, and it should be completed before yamux is shut down

@themighty1
Copy link
Member Author

Ready for review. The biggest changes are:

  • decouple authdecode from the core crate
  • field id is sent with the plaintext hash in the attestation request
  • moved builder logic into crates/core/src/transcript/commit/builder.rs

@sinui0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants