Feat/mr trait #24

lopeetall · 2024-01-09T17:39:26Z

Trait for MapReduce computations

Each Map needs an input type and output type, each Reduce needs an input type (which is the same as its output type).

The input and output types are DataItems: an atomic piece of data that can be represented as a vector of field elements.

To define a Map computation you must provide a function that converts between the input type and the output type and a function adding constraints to a CircuitBuilder which verify the transformation. Defining a Reduce is similar except that the functions take a pair of inputs and produce an output of the same type.

Then a MapReduce computation is an ordered list of Maps and Reduces.

nikkolasg

Good starting point ! I think we need some iteration over this PR to make sure we are both aligned on the vision and the usage of its API.
Left some long comments :D

nikkolasg · 2024-01-10T10:15:59Z

src/mapreduce/data_types.rs

+        &self,
+        builder: &mut CircuitBuilder<F, D>,
+    ) -> Vec<Target> {
+        let cons = builder.constants(&self.encode());


Why should this be a constant ? Isn't it a variable if it's in a public input ? (otherwise what's the point of exposing a constant as public input, since both verifier and prover would know it already)

Yep thanks for catching. This should be a non-constant target

nikkolasg · 2024-01-10T10:16:58Z

src/mapreduce/data_types.rs

+    }
+}
+
+struct PublicByteString(Vec<u8>);


Can you implement the trait directly on Vec<u8> rather a wrapper ? I'm not sure I see the utility of having the wrapper here, maybe I'm missing something.

I think there are opportunities to simplify here but I want to think it through. I realized yesterday that every DataItem is going to correspond to a list of field elements and so maybe a trait isn't necessary and a struct will do. Some kind of wrapper holding metadata or doing out-of-circuit bookkeeping might still be necessary. I'm thinking this over

nikkolasg · 2024-01-10T10:19:09Z

src/mapreduce/data_types.rs

+        &self,
+        builder: &mut CircuitBuilder<F, D>,
+    ) -> Vec<Target> {
+        let cons = builder.constants(&self.encode());


nikkolasg · 2024-01-10T10:20:22Z

src/mapreduce/mod.rs

+
+    /// An instance of DataItem must provide a function that encodes
+    /// the data it contains as a vector of field elements
+    fn encode<F: RichField + Extendable<D>, const D: usize>(&self) -> Vec<F>;


Given F and D are all present in each function, I'd suggest making it part of the trait constant, wdyt ?

sounds good to me!

nikkolasg · 2024-01-10T10:30:46Z

src/mapreduce/mod.rs

+    type Output: DataItem;
+
+    // The map computation to be performed on a single item.
+    // The input and output can have different types but each


Mmmhh... that is not sure.

I come from the cyclic circuit place and here, you must have the same input output pair for all the circuit, it's basically an IVC, one circuit you recurse over.

I'm ok to keep it like this here but can you note that in the case of cyclic, there must be only ONE input/output set ?

BTW in this case, we would need to have a MapReduce trait that define the union of the public inputs of the Map and Reduce step, don't we ? Can we do that ?

I don't understand the issue here. The Input and Output types can be made the same.

An initial Map in an IVC chain could be written as a "different" map which doesn't take in a proof, only initial data. Then there could be a different cyclic map following whose input and outputs are the same.

I(x) = (y, Proof)
C(y, Proof) = (z, Proof)

output = C(C(C(...(I(x))))

Or, a dummy proof could be supplied for the initial map allowing the initial map to be identical to the cyclic map. Both approaches are doable with this setup.

output = C(C(C(...C(x, DummyProof))))

What would need added to the MapReduce trait is a way to compose n Maps of the same kind by supplying n instead of manually writing them out. This would expand out to one of the two approaches above.

As for your last paragraph I'm not sure what you mean. Can you give an example where we need this? (I'm sure we can do it but a concrete example would be helpful.)

BTW since a Union is a binary operation on sets it can be formulated as a Reduce in this framework

nikkolasg · 2024-01-10T10:33:42Z

src/mapreduce/mod.rs

+        input: &Self::Input,
+        output: &Self::Output,


I don't believe this is gonna work when we apply the separation of building vs proving.
During building time, we have no inputs at all. That would be up to the trait to decide if it must instantiate one, two etc Target as u64 etc but we can not give directly the inputs during building time.
It would be great if you can re-work the traits so it fits the build and prove steps (similar to the ones I made in the #21 PR)

As I imagine things, "Proving" is a kind of Map, and proofs can be part of DataItems

The way I have thought about it is that each Map or Reduce takes in the input data and a mutable builder, adds its own constraints, and produces its output data with its eval method. When we reach a point where we want to produce a proof we can apply a special Map whose eval method builds the circuit that has accumulated and produces a proof for it, which becomes part of its output data. This gives us the flexibility choose when proofs are produced in the chain of computations instead of being forced to prove after each computational step.

nikkolasg · 2024-01-10T10:53:18Z

src/mapreduce/mod.rs

+        Self { map, reduce }
+    }
+
+    fn apply<F: RichField + Extendable<D>, const D: usize>(


I am not seeing in which situation do we want to apply map AND reduce in the same circuit.
Could we just separate the map and reduce step in two functions (and therefore have a build_map / build_reduce and prove_map and prove_reduce?

The reason I'm saying this is because I have the following two different mental model:

1. Different Circuit
Where we have defined a circuit for the map, and a circuit for the reduce step. Ideally that's what we would like to do but recursion in plonky2 and our settings (MPT) make it a bit hard (but maybe we can do it).

struct MapLagrange { m: MapReduce } impl MapLagrange { fn build_map( .... ) { let wires = self.m.map.build(); } fn prove_map( ... ) { self.m.map.prove(wires); } fn build_reduce( ... ) { let wires = self.m.reduce.build(); } fn prove_reduce ( ... ) { self.m.reduce.prove(wires); } }

2. Unified Circuit
This is in the case where we are stuck with a single monolithic circuit that we can't divide, because of the verification keys problem. So in this case, we have to apply both map and reduce and "filter" the output by having
a bool or some indication that we are in a map or reduce case.

struct LagrangeCircuit{ m: MapReduce } impl LagrangeCircuit { fn build( ... ) { let is_map = builder.new_virtual_target(); // the wires that need assignement during proving let map_wires = self.m.build_map(); // the wires that need assignement during proving let reduce_wires = self.m.build_reduce(); let map_output = self.m.map_outputs(map_wires); let reduce_output = self.m.reduce_outputs(reduce_wire); let outputs = vec![Target(0); MAX_OUTPUT]; for i in 0..MAX_OUTPUT { outputs[i] = builder.conditional(is_map, map_output[i], reduce_output[i]); } b.register_public_outputs(outputs); // save wires to be assigned to save them on disk etc } fn prove(map_wires, reduce_wires, map_or_reduce: bool, ... ) { self.m.prove_map(map_wires); self.m.prove_reduce(reduce_wires); pw.assign_target(is_map, map_or_reduce); } }

This is the first time I wrote it down like this so no worries if it's new for you :D
Does that make sense to you ? How have you seen the integration in your head while writing the traits ? It feels different than mine.

I think we are on the same page. The MR trait is designed to be flexible. A Map m:A->A followed by a Reduce r: A×A-> A can be formulated as a single Reduce r(m(x), m(y)) in this framework if we wish. Or they can be separated with a proving step in-between. Whichever we need to optimize the circuit according to the VK issue.

Users might find it convenient to separate complex computations into smaller steps for ease of writing and reasoning about the computation, but I don't want the proving points to be dictated by their choices.

nikkolasg · 2024-01-10T10:54:09Z

src/mapreduce/mod.rs

+    // A function adding constraints to a circuit builder which should
+    // be satisfied by the reduce computation in `eval`. That is,
+    // if eval(x, y) = z, then circuit(x, y, z) should be true.
+    fn add_constraints<F: RichField + Extendable<D>, const D: usize>(


Same here, we can't have real values at building time, it's only about creating wires.

I don't understand this at all. We can have "real values" but not do anything with them, right? Can you shed some light?

nikkolasg · 2024-01-10T10:55:05Z

src/mapreduce/mod.rs

+        let map_outs = self.map.apply_map(inputs, builder);
+        self.reduce.apply_reduce(&map_outs, builder)
+    }


wrt my previous comment, I don't see in which situation we apply map AND reduce in circuit at the same time. Can you share some lights ? And if it's a valid scenario, then we should probably let the user choose if he wants to apply one or the other or both no ?

…a Target

* Implement a simple version of MPT digest with ARITY 16. * Add `src/benches/digest_tree.rs` to test recursive digest circuit. * Update digest circuit recursive test. * Finish 16-arity digest circuit. * Fix to use init proof for the leaves (only prove once). * Rename digest tree to merkle tree. * Fix failed merkle tree test. * Add comments for child proof arrangement. * Add comments to the poseidon process in circuit. * Simply the Poseidon squeeze code. * Add more comments to circuit and test. * Extract standard Poseidon hash logic to `build_standard_poseidon`, and add an unit test. * fix on CyclicCircuit assign. to empty proof wires * removed comments * Fix the leaf value to `[u8; 32]`, and handle differently with branch. * Fix to use `core::array` to build inputs argument. --------- Co-authored-by: nikkolasg <[email protected]>

* Add a trait for digest circuits, both the arity circuit and multiset hashing circuit implement this trait, and could be reused in the same benchmark and testing functions. * Initialize multiset hashing circuit. * Extract Merkle tree structures. * Add test case for multiset hashing circuit. * Extract field extension and hash-to-field functions. * Add crate `plonky2_ecgfp5` to it's `Point`, `CurveTarget` and associated functions. * Rename `benches/digest/arity.rs` to `canonical_hashing.rs`. * Update dep `ahash`. * Implement `map_to_curve_point` for field arithmetic. * Implement `map_to_curve_target` for circuit. * Fix wrong io number in circuit. * Add `map_to_curve` unit tests. * Fix test case. * Delete `child_num`, and add `is_leaf` flag. * Add `debug_assertions` to `set_curve_target` code. * Add more comments for Merkle tree functions, and `debug_assertions`. * Fix to use `flat_map` when constructs inputs. * Update `plonky2_ecgfp5`. * Refactor to `group_hashing`. * Add `CircuitBuilderGroupHashing` trait for circuit builder. * Update comments. * Update comments. * Add `num_gates` benchmark. * Rename public functions in group-hashing. * Fix to use constants of inputs and outputs in `test_simple_swu_for_curve_point`. * Fix to check the output with public input of proof in tests.

nikkolasg reviewed Jan 10, 2024

View reviewed changes

lopeetall added 24 commits January 23, 2024 16:01

initial sketch of traits needed for mapreduce

0c80231

made MapReduce a struct and fixed typing

5ce5006

move neutral method to Data

b9934c7

begin implementing Sum as a reduce

0c9653b

moved neutral back to reduce trait, made circuit constructors output …

0b9373b

…a Target

name changes and finish Sum impl as Reduce

6ba8cf1

finished Sum as MapReduce and wrote test

f671dbf

fmt

3ce6f6e

restructure mods

3c866e5

generalize Data and error fixing

0bc08de

add prover for map computation

acf8c8e

reworked Data as DataItem

c2fa6b7

reworked mapreduce trait and struct for DataItem

80126fa

add struct for byte strings and little tweaks

1802ced

move generic type params around

12e4c00

remove unnecessary methods from DataItem

749266a

moved generics back; rewrote add_constraints to produce closure

6fdb4b8

removed unused imports

5133f2e

removed unnecessary generics

58b955e

removed phantomdata

a50932b

update for nikkolas

0f5f0c9

reformulate data types with Wire trait and Primitive enum

c4b7354

implemented data tree as iterator

8bbc7a0

finished Sum reduce circuit and test

cf15f70

lopeetall force-pushed the feat/mr-trait branch from 10d2bba to cf15f70 Compare January 23, 2024 21:05

lopeetall and others added 4 commits January 23, 2024 16:45

moved boilerplate to reduce wrapper struct

f4450ed

clippy fixes

4e79ed9

merkle first implementation hashing

acd8c8c

silathdiir and others added 5 commits February 22, 2024 16:12

Pin to Rust nightly-2024-02-01. (#44)

73def3a

Rename wrong toolchain file. (#45)

38053cd

restructure mods

4d82456

minor renames

aee1b7d

nikkolasg closed this Jun 18, 2024

delehef deleted the feat/mr-trait branch July 18, 2024 11:33

Feat/mr trait #24

Feat/mr trait #24

Uh oh!

Conversation

lopeetall commented Jan 9, 2024

Uh oh!

nikkolasg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants