Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFCREATE: Don't hash the init-container #162

Open
Tracked by #165
chfast opened this issue Sep 6, 2024 · 24 comments
Open
Tracked by #165

EOFCREATE: Don't hash the init-container #162

chfast opened this issue Sep 6, 2024 · 24 comments
Assignees

Comments

@chfast
Copy link
Member

chfast commented Sep 6, 2024

The create address derivation for EOFCREATE is based on CREATE2.

keccack256(sender_address + salt + keccak256(init-container))

where the sender_address is the logical address of the contract invoking EOFCREATE.

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

It also seems unnecessary expensive: EOFCREATE can only pick up one of the deploy-time sub-containers.

Solution 1: Use sub-container index

The create address is already bound to the "sender address", code is immutable (no SELFDESTRUCT) so replacing the hash of the sub-container with just its index may be enough.

Solution 2: Use code's address + sub-container index

The CREATE2 scheme uses the "sender address" with may not be the address of the code (see DELEGATECALL). I'm not sure if this is desired property for CREATE2. But for EOFCREATE this looks to be a problem. A contract may deploy different contract using DELEGATECALL proxy: for EOFCREATE inside a DELEGATECALL the same sub-container index will point to different sub-container. To fix this we can replace/combine the physical code address:

  • keccak256(code_address + salt + sub-container-index)
  • keccak256(sender_address + code_address + salt + sub-container-index)
@pdobacz pdobacz mentioned this issue Sep 9, 2024
4 tasks
@axic
Copy link
Member

axic commented Oct 3, 2024

A relevant code is CREATE3: https://github.com/Vectorized/solady/blob/main/src/utils/CREATE3.sol

Should ask for feedback from library authors.

@pdobacz
Copy link
Member

pdobacz commented Oct 7, 2024

If we decide to take away initcontainer hashing (or change it somehow), we need to revisit and ask for an update to the EOF considerations for Verkle EIPs. A link a tentative PR which handles this (and a thread about initcontainer hashing) here.

  • check Verkle EIP(s) if update is necessary

@frangio
Copy link

frangio commented Oct 7, 2024

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

A different perspective on this: the hash of the init container locks in the semantics, not the exact code contents of the contract. This doesn't prevent future rewrites of the code because they will be semantics-preserving or inherently breaking regardless of code observability.

Code observability should be removed to the extent that CODECOPY/EXTCODECOPY can cause semantics-preserving rewrites to become breaking changes indirectly. In my opinion, it should be totally okay to rewrite code even when the address is a witness of the original code on the account. In fact, it's a good thing that there's a way to recover CODECOPY via this witness in a way that doesn't risk being broken by rewrites!

I also think this ability to provide on-chain proof of the code/semantics of an account is an important primitive that we shouldn't get rid of.

@pdobacz
Copy link
Member

pdobacz commented Oct 8, 2024

Thank you for this perspective. We're gathering inputs on this one, so this feedback is very useful.

I also think this ability to provide on-chain proof of the code/semantics of an account is an important primitive that we shouldn't get rid of.

Do the Solution 1 & 2 above still qualify as getting rid of? Note that EOFCREATE in current form doesn't allow deploying arbitrary off-chain code, only that listed as one of its subcontainers. So it can be proven on-chain that a given account has code deployed from a known address' subcontainer

@frangio
Copy link

frangio commented Oct 8, 2024

Because of DELEGATECALL I don't think Solution 1 gives any guarantees about the code/semantics of an account.

Solution 2 would work if there was a way to trace back to a "root" deployer whose address was computed with codehash. If EOFCREATE doesn't do that (and CREATE2 is "removed" via EOF), I don't think it would be possible to get a root deployer like that because creation transactions don't use the codehash.

I think the current state where the code hash is directly included is better though, because it takes a single hash to compute the address rather than a tracing process involving multiple hashes. Additionally, you may only care about proving the code of an account ignoring the code of its deployer, and if you have to trace it back to the deployer you are not able to do that. Overall I think including code hash into the address directly is significantly better.

@frangio
Copy link

frangio commented Oct 8, 2024

A note about CREATE3.sol and similar patterns: this is often used to deploy contracts via CREATE2 at a deterministic address that doesn't depend on the creation parameters (or only some of them). For example Uniswap v3 does this here:

This kind of use case is actually natively addressed by EOFCREATE! The workaround will no longer be needed because the input is not included in the address formula.

The other side to this is that users of CREATE2 that do care about the creation parameters will need a new way to validate them. Either a trusted factory mixes them into the salt, or the contract exposes getters.

There is another potential use case for CREATE3.sol, which is to deploy at a deterministic address that is fully independent of the creation code. This other use case is not only not addressed by EOFCREATE, it becomes impossible to strictly implement under EOF, although it is easy to work around by deploying a proxy instead. I don't know how common this use case is honestly.

More context here: https://github.com/moodysalem/EIPs/blob/46350bb/EIPS/eip-3171.md#motivation

@chfast
Copy link
Member Author

chfast commented Oct 15, 2024

Analysis of input data for create address

It is preferable the inputs be less than 136 bytes (the keccak256 block size).

CREATE

Input length: 23–31
Prefix byte: 0xd60xde

The input is RLP encoded sender's address and sender's nonce. The encoding is variable-length with fixed encoding for the address and variable-length encoding of the nonce.

CREATE2

Input length: 85
Prefix byte: 0xff

The input is fixed length concatenation of prefix byte 0xff, sender's address (20 bytes), user-provided salt (32 bytes) and initcode hash (32 bytes). The prefix byte has been added to avoid collisions with the CREATE scheme but this is unnecessary because both schemes don't have inputs of the same lengths.

EOFCREATE solution 2

Input length: 63 or 97
Prefix byte: none

Concatenation of the addresses the sender and the code (2x 20 bytes), user-provided salt (32 bytes) and subcontainer index (1 byte). Alternatively, we can allocate 32 bytes per address for compatibility with Address Space Expansion.
None of these total lengths match any existing schemes so the prefix byte is not necessary.

@chfast
Copy link
Member Author

chfast commented Oct 15, 2024

This kind of use case is actually natively addressed by EOFCREATE! The workaround will no longer be needed because the input is not included in the address formula.

Interesting. I was thinking about extending solution 2 to hash also inputs provided to EOFCREATE. But looks there are some use cases where this is undesirable.

The other side to this is that users of CREATE2 that do care about the creation parameters will need a new way to validate them. Either a trusted factory mixes them into the salt, or the contract exposes getters.

I think the pattern may be for user to hash inputs and combine them with the salt.

@shemnon
Copy link
Contributor

shemnon commented Oct 15, 2024

What about chained EOF creates going deep? In this case would the "code address" of the second level create be the code address of the parent? If the code address is the address if the topmost container... not so much. Because then the index could be re-used at different depths to cause different contracts to be deployed at the same address based on call data (although returncontract can do that more cleanly).

Or is it the "sender address" that gets updated in nested EOFCREATES? Either way we need tests for this scenario.

@chfast
Copy link
Member Author

chfast commented Oct 15, 2024

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)

O: CALL A
A: DELEGATECALL B
B: EOFCREATE[2](X)
C: eofcreate_addr(sender=A, code=B, salt=X, idx=2)

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=C?, salt=Y, idx=2)
C: deployed

O: CALL C (deployed above)
C: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=C?, salt=Y, idx=2)
this will generate the same address and collide with D.

@pdobacz
Copy link
Member

pdobacz commented Oct 15, 2024

this will generate the same address and collide with D.

Is this a problem though? seems OK to me. We have the a deterministic address to deploy D / E at, but the code itself (contents) is not in the witness

@frangio
Copy link

frangio commented Nov 4, 2024

Wouldn't this scheme work?

keccack256(sender_address || code_address || salt || during_init || init_subcontainer_idx)

during_init would be a boolean that is true iff EOFCREATE executes during sender init.

The combination (code_address, during_init, init_subcontainer_idx) should uniquely identify a container. If during_init == true, take the init container that created code and look at subcontainer number init_subcontainer_idx. If during_init == false, take the runtime container of code and look at subcontainer number init_subcontainer_idx.

Amending @chfast's examples above:

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)

O: CALL A
A: DELEGATECALL B
B: EOFCREATE[2](X)
C: eofcreate_addr(sender=A, code=B, salt=X, during_init=0, idx=2)

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=C, salt=Y, during_init=1, idx=2)

O: CALL C (deployed above)
C: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=C, salt=Y, during_init=0, idx=2)

E no longer equal to D

Any further nested EOFCREATE would have different sender.

@pdobacz
Copy link
Member

pdobacz commented Nov 5, 2024

EDIT: this post is likely some kind of misunderstanding on my part. We mentioned originally that code address is the address of the outer-most EOFCREATE, but actually, a scheme where code_address changes during each EOFCREATE seems to avoid address conflicts better...

Having revisited @chfast 's example after a while, I think by code_address we mean the address where the outer-most EOFCREATE in a nested chain of EOFCREATEs resides (there is no other address with code in that chain yet!), so:

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=C A, salt=Y, idx=2) <----- code not C but A here
C: deployed

This already makes D != E or putting it differently:

  1. When CALL is used to change context in a chain of calls, both sender and code will change
  2. DELEGATECALL - code will change, sender won't
  3. EOFCREATE - sender will change, code won't

However, this only changes the way one runs into the D==E conflict:

O: CALL C (deployed above)
C: DELEGATECALL A
A: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=A, salt=Y, idx=2) == D

but initcodes used for D and E are different (at different nesting depth in A)

@pdobacz
Copy link
Member

pdobacz commented Nov 5, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

@gumb0
Copy link
Contributor

gumb0 commented Nov 7, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

@pdobacz
Copy link
Member

pdobacz commented Nov 7, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

The problem stems from the fact that we're swapping the code executing at the context of C - during init it is the initcode, after it is the initcode's subcontainer (RETURNCONTRACTed). This makes the two instances of EOFCREATE[2](Y) mean different things and is solved by Frangio's proposal. From that PoV it kinda makes sense - it's 2 different codes, but both live at C, so they have a common code_address, so to speak.

code_address name is just a name related to DELEGATECALL (which we are addressing here). Having this in mind, with executing_address it isn't clear to me if it's the code_address or msg.recipient address (the context which "executes" some code)...

@gumb0
Copy link
Contributor

gumb0 commented Nov 7, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

The problem stems from the fact that we're swapping the code executing at the context of C - during init it is the initcode, after it is the initcode's subcontainer (RETURNCONTRACTed). This makes the two instances of EOFCREATE[2](Y) mean different things and is solved by Frangio's proposal. From that PoV it kinda makes sense - it's 2 different codes, but both live at C, so they have a common code_address, so to speak.

From my perspective in case of EOFCREATE nested in outer EOFCREATE initcode, the inner EOFCREATE's initcode doesn't "live at C" at all, it has almost nothing to do with C. C is an address that will be deployed (or not) when outer EOFCREATE finishes.
C happens to be msg.recipient when inner EOFCREATE is being executed (this is what I call "executing address")

But this is a bit of bikeshedding. I agree during_init flag seems to solve it.

@pdobacz
Copy link
Member

pdobacz commented Nov 7, 2024

inner EOFCREATE's initcode doesn't "live at C" at all, it has almost nothing to do with C

Yeah, I see your point here. But actually the bikeshedding is useful. I revisited the option with "code_address doesn't change on EOFCREATE + during_init" with this new perspective and now it seems to me it works too, I must've made a mistake somewhere yesterday, PTAL:

O: CALL A
A: EOFCREATE1
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)
C: EOFCREATE2 (initcode execution)
D: eofcreate_addr(sender=C, code=A, salt=Y, during_init=1, idx=2)
C: deployed

O: CALL C (deployed above)
C: EOFCREATE2
E: eofcreate_addr(sender=C, code=C, salt=Y, during_init=0, idx=2) != D

O: CALL C (deployed above)
C: DELEGATECALL A
A: EOFCREATE2
F: eofcreate_addr(sender=C, code=A, salt=Y, during_init=0, idx=2) != D and != E

In this version code_address matches the expectations - it is where the code lives

@gumb0
Copy link
Contributor

gumb0 commented Nov 7, 2024

In this version code_address matches the expectations - it is where the code lives

Yes, I like this version more. Seems to work and not conflict on deeper nesting levels, too.

O: CALL A
A: EOFCREATE[1]
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)
C: EOFCREATE[2] (initcode execution)
D: eofcreate_addr(sender=C, code=A, salt=Y, during_init=1, idx=2)
D: EOFCREATE[2] (initcode execution)
G: eofcreate_addr(sender=D, code=A, salt=Y, during_init=1, idx=2)

@frangio
Copy link

frangio commented Nov 7, 2024

This seems to work. The approach seems equivalent to a list of indices pointing at a deeply nested subcontainer of code.

I find it hard to reason about though.

I think if you see G = eofcreate_addr(sender=D, code=A, salt=Y, during_init=1, idx=2), during_init=1 means that the runtime container at G was deployed by an init container located somewhere in A, in particular subcontainer index 2 of the init container that deployed D. Recursively you arrive at C, where during_init=0 means that the runtime container at C was deployed by the subcontainer A[1].

So the init containers for each of these contracts are:

  • C: A[1]
  • D: A[1][2]
  • G: A[1][2][2]

There seems to be some redundancy here:

the runtime container at G was deployed by an init container located somewhere in A, in particular subcontainer index 2 of the init container that deployed D

A is not really used in the procedure.

So an alternative could be to remove during_init and replace during_init=1 with code_address=0, since the actual code_address is implicit in sender_address in this case.

keccack256(sender_address || code_address || salt || init_subcontainer_idx)

Note that EOFCREATE in a DELEGATECALL context always results in code_address set to the target of DELEGATECALL, because that's where the init container is located, regardless of whether the sender is being deployed.

@frangio
Copy link

frangio commented Nov 7, 2024

Since it looks like we may have solved this issue I'll resurface my previous comment. With this change we would be losing the ability to make on-chain proofs about the behavior of an account without a trusted factory (although it would be recoverable with a zk-coprocessor). I do think we need to consider whether it's okay to remove that, or if it's a primitive that applications are relying on.

I'm currently weakly leaning towards probably okay to remove.

@pdobacz
Copy link
Member

pdobacz commented Nov 7, 2024

losing the ability to make on-chain proofs about the behavior of an account without a trusted factory

Can you clarify what kind of a behavior proof? Just that codehash(address) == particular hash? Could this be substituted by address_and_subcontainer_idx_of_a_particular_factory(address) == particular address and idx? That is instead of proving code hash of an address is X, we prove where exactly the code is coming from.

@frangio
Copy link

frangio commented Nov 7, 2024

Yeah that works if you have a trusted/known factory. This is probably enough.

@gumb0
Copy link
Contributor

gumb0 commented Nov 8, 2024

So an alternative could be to remove during_init and replace during_init=1 with code_address=0, since the actual code_address is implicit in sender_address in this case.

I like this variant, too. I would reframe it as we'd have 2 different schemes depending on whether EOFCREATE is called inside initcode:

  1. Non-nested EOFCREATE: keccak256(sender_address + code_address + salt + sub-container-index)
  2. Nested EOFCREATE inside initcode: keccak256(sender_address + salt + sub-container-index)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants