Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompleteness caused by dynamic jumps #174

Open
nolanyc opened this issue Jan 9, 2025 · 4 comments
Open

Incompleteness caused by dynamic jumps #174

nolanyc opened this issue Jan 9, 2025 · 4 comments

Comments

@nolanyc
Copy link
Contributor

nolanyc commented Jan 9, 2025

Description:
Dynamic control flow introduced by dynamic jumps is broken. A dynamic jump's target address may come from memory or storage stored before. And gigahorse doesn't handle these indirect jumps, which leads to broken control flow. Reachability is the precondition of parsed code blocks being included in gigahorse IR. Broken control flow could make some code blocks unreachable and then be ignored in gigahorse IR generation.

POC:

  • source code
    contract IncompletenessPOC {
        function callee1() internal view returns (uint) {
            return 0x999;
        }
    
        function callee2() internal view returns (uint) {
            return 0x998;
        }
    
        function test() public view returns (uint) {
            function() internal view returns (uint)[2] memory funcs;
            funcs[0] = callee1;
            funcs[1] = callee2;
    
            function() internal view returns (uint) func = funcs[0];
            return func();
        }
    }
  • bin-runtime hex
    608060405234801561000f575f80fd5b5060043610610029575f3560e01c8063f8a8fd6d1461002d575b5f80fd5b61003561004b565b604051610042919061014b565b60405180910390f35b5f610054610101565b6100ef815f6002811061006a57610069610164565b5b602002019067ffffffffffffffff16908167ffffffffffffffff16815250506100f8816001600281106100a05761009f610164565b5b602002019067ffffffffffffffff16908167ffffffffffffffff1681525050610129815f600281106100d5576100d4610164565b5b602002015190506100e88163ffffffff16565b9250505090565b5f610999905090565b5f610998905090565b60405180604001604052806002905b6101298152602001906001900390816101105790505090565b610131610191565b565b5f819050919050565b61014581610133565b82525050565b5f60208201905061015e5f83018461013c565b92915050565b7f4e487b71000000000000000000000000000000000000000000000000000000005f52603260045260245ffd5b7f4e487b71000000000000000000000000000000000000000000000000000000005f52605160045260245ffdfea2646970667358221220f85da9acbfe2b470870c2989a497f2a258ebdd9424a6b153e0dd999a090f4a8d64736f6c63430008140033

Steps to Reproduce:

  1. run the tool under the latest version (for now it's the commit 3e6f86d)
    $ ./gigahorse.py --debug -C clients/visualizeout.py POC.bin-runtime.hex

Expected Behavior:

  1. TAC_Variable_Value(_, 0x999) or TAC_Variable_Value(_, 0x998)

Actual Behavior:

  1. !TAC_Variable_Value(_, 0x999) and !TAC_Variable_Value(_, 0x998)

Environment:

@sifislag
Copy link
Collaborator

sifislag commented Jan 9, 2025

Hi, thanks for the issue, this is a known limitation.
We plan to tackle this at some point but its priority is low because fixing it is non trivial and, as far as we know, these patterns are not widely adopted.

@nolanyc
Copy link
Contributor Author

nolanyc commented Jan 10, 2025

I didn't see anywhere mentioned it before in text. Anyway, I agree that these patterns are truly uncommon for now but some defi projects use them. By the way, dynamic jump targets may be passed through storage which stored in constructor. To be complete, we may need to handle the constructor code in addition to the runtime code.

@sifislag
Copy link
Collaborator

sifislag commented Jan 10, 2025

You're correct it wasn't mentioned.
However I think analyzing the initcode is not the way to go because even dynamic information inside the contract's runtime code can throw us off guard. As an example, the value of a function pointer can come from the calldata (directly, or via more complex paths).
In these cases the code will not contain the called method addresses as values, and we'll need to figure out their identities using other clues (if possible). As a result, in many cases, we'll end up having to grossly overestimate the possible called methods.
Finding a balance between generality and precision will take careful consideration.

@nolanyc
Copy link
Contributor Author

nolanyc commented Jan 11, 2025

Agree. To be totally complete, we may need to keep all JUMPDESTs or find an easy way to fallback to this bottom line.
AFAIK, at the source code level, passing an internal function pointer value directly as an argument of a public function is forbidden by the Solidity compiler. But other types can still be coerced to internal function pointer type in an assembly block, even though it's impractical, maybe only good for code obfuscation.
Considering JUMPDESTs only passed through memory and storage is mostly complete. It should be a good starting point because JUMPDESTs used as internal function pointer values are knowable from the code, namely, this JUMPDEST subset is enumerable trivially

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants