WASM Support MVP (#1495)

The WASM branch is ready to be included in the rest of Manticore. It doesn't include several advanced features, including a symbolic memory model, symbolic floating point support, or a standard library implementation (WASI/emscripten), but it does have full symbolic semantics for all of the integer instructions. ### Commit log: * Empty WASM Module Will run on a .wasm file and do nothing forever * Read and parse sections /shrug * Can partially allocate modules Just pushing for the weekend * Blacken * Can allocate with exports now * Static initialization Doesn't evaluate any of the starting expressions, but that's okay * Make type hints a little more pycharm-friendly * First part of invocation * Blacken... * Barebones execution poc Can execute the following snippet: ``` main(){ return 42; } ``` * blacken * Add some loads and stores * Add global operations * Move float methods further down * Support internal function calls ``` 18:20:42 (mc) ehennenfent@nessie:~/wasm ➜ cat loop_check.c #include <stdio.h> #include <stdbool.h> bool check(int arg){ if (((arg << 2) ^ 16) == 36) { return true; } else{ return false; } } int main(){ printf("Hello world!\n"); int g = getchar(); printf("Got: %d\n", g); int y = 0; for (int i = 0; i < 20; i++){ y++; if (y > 18){ return check(g) ? 0 : -1; } } return -1; } 18:20:54 (mc) ehennenfent@nessie:~/wasm ➜ manticore loop_check.wasm 65: i32.const (16) 11: end 65: i32.const (32) 11: end Initialization Complete 65: i32.const (0) 65: i32.const (0) 40: i32.load (Offset 4) 65: i32.const (16) 107: i32.sub 34: tee_local (Local 1) 54: i32.store (Offset 4) 65: i32.const (32) 16: call (Func Idx 2) Called stub function with args: (32,) HostFunc returned [13] 26: drop 32: get_local (Local 1) 16: call (Func Idx 0) Called stub function with args: () HostFunc returned [13] 34: tee_local (Local 0) 54: i32.store (Offset 0) 65: i32.const (16) 32: get_local (Local 1) 16: call (Func Idx 1) Called stub function with args: (16, 16) HostFunc returned [13] 26: drop 65: i32.const (0) 32: get_local (Local 1) 65: i32.const (16) 106: i32.add 54: i32.store (Offset 4) 65: i32.const (-1) 65: i32.const (0) 32: get_local (Local 0) 65: i32.const (1073741823) 113: i32.and 65: i32.const (13) 71: i32.ne 27: select 11: end [0] 2019-07-25 18:20:58,138: [13842] m.c.manticore:INFO: Generated testcase No. 0 - test 2019-07-25 18:20:58,143: [13842] m.c.manticore:INFO: Results in /home/ehennenfent/wasm/mcore_1evptpip ``` * Improve pickle-friendliness * Convert wasm module instructions into internal type s * Fix class naming conflict Thanks Pickle * Support bitvecs as an integer type * Refactor execution to handle single instructions at a time * Add Atomic Stacks It's a very inefficient implementation, but we'll make it better * Handle Concretize Exceptions Needs a lot of fine tuning but it's a start * Propagate symbols all the way to return * Add test generation infrastructure Still very basic but it's a start * Attempt to make Travis happy * Fix missing generation step * Ignore stack exhaustsion and invalid modules * Add basic shift instructions * Add lots of arithmetic instructions * Fix global initialization * Add optimization for fixed size left shift * Make Travis tests less verbose * Blacken * Properly handle modulo in shifting * Add several control flow instructions * Blacken * Partially Fix return instruction * Fix missing return value from look_forward * Update look_forward to handle nested blocks * Fix type sigs and executor copy/paste errors, deduplicate test names * Fix an instruction exit issue * Fix local ordering * Blacken * test_address now passes for wasm Fixed issue with json2mc.py issue with testname uniqueness Started some work on floating point * i hate git * i hate git * Adding linenumber to testcase function name minor float progress * f32_cmp passed for wasm * f64_cmp passes for wasm * f32_bitwise passes except for the problem with assertEqual not working with nan * f64_bitwise passes except for the problem with assertEqual not working with nan * Fixed the tests to handle [nan] comparisons * better handling of nans and infinites * added integer backing value for f64 * Blacken * Blacken * f64.nearest * Fix argument ordering and implement N-bit memory stores * Use subtests * Switch print to logger * Modify control flow instructions to match spec "Continuation" != the thing that goes inside the Label * Teporarily disable call instructions to prevent infinite loop Still need to figure out control flow for nested function calls * Blacken * Fixed infinite loop in _call via block depth tracking The behavior of `end` needs to depend on how many labels and activations are on the stack. Every time we push or pop a label, we increment or decrement the counter for the current call frame. * Blacken * Initial `br` implementation Probably still bugged * Support if blocks without an else instruction * Add call_indirect * Add grow_memory * i32_wrap_i64 * Add several arithmetic operators * Copy i32.ctz impl from x86 TZCNT * Fix off-by-one in br I think this is the correct fix (it makes all the _br tests pass) but I'm leaving the TODO there just in case. * Blacken... * Fix lingering issue with if block splitting * Fix bad args to sextend in i64_extend_s_i32 * Fix indirect call null check and Float binop return types * Improve br_table indexing See comment * Remove vestigial argument popping in invoke * Fix looking forward in _return and _block * Blacken * Generate trap and action tests * Fix extraneous call to exec_for_test * Fix missing traps in load instructions * Fix c1=0 case in ctz/clz Hopefully this also fixes the fact that the original CTZ implementations seems to have been intended for little-endian architectures * Handle traps in-stride during test execution * Treat zero division as trap * Use ctypes to convert integers * Blacken * Apparently I was wrong about the CTZ endianness thing * Better range checks for offsets This really hammers home why checking whether ea + size is _greater than_ len(mem.data) is strange if mem.data is 0-indexed, but idk, that's what the spec says. * Make stub functions return the correct number of results * ctypes can't handle floats * Add option to run the start function * Correctly handle functions with unacceptable names * Blacken * Treat NotImplemented as a Skip Also counts individual subtests directly instead of screwing around with trying to grep for them in the log. * Fix unhandled negative indices * Fixed some integer instruction bugs i64/32: div_s/u shr_s/u shl clz * Fix global and memory imports * CC * Make default loglevel debug * Rework conversion from int * Make float_store work * _actually_ fix float_store * Restore old float creation method * Blacken * Add range checks to float load/store * Force cast to unsigned in _u instructions * Fix yet another I32 -> I64 copy paste error How many times do we have to teach you this lesson, ~old~ young man? * Skip problematic float tests for now * Handle floating point inaccuracies when dividing large ints * Skip conversion and endianness tests * Rotl, rotl, popcnt * Blacken * Add some conversion stubs * Maybe fix conversions * Improve reinterpret and float demotion * Re-enable endianness tests * Include WASM logging in verbosity controls * Propagate constraints to executor * Blacken * Add prototype symbolic tests * Add api for invocation to Manticore object * Temporarily disable native/evm tests There's no reason to tie up Travis for 40 minutes every time I push a commit. * Fix wasmworld import * Concretize br_if * Pass arg generator to invoke * Download updated WABT It's fixed now! * Generate Symbolic tests (partial) * Follow m.run format * Blacken * Reinitialize Manticore every 50 tests This should cut down on the performance issues we've been seeing * Revert "Temporarily disable native/evm tests" This reverts commit 7abd6c1. * I32 and I64 pass Symbolic * Create new MC on reinit * Blacken * Fix return handling in test generation * Concretize operand in `if_` * Concretize memory addresses before access In the long run, we should have a symbolic memory model * Better memory access concretization (and br_table) * Concretize float converions and indirect call operands * Run start method * Handle size errors in wrap+i64 and select * Add timeout (and skip extra-problematic tests) * Blacken * Aggressively concretize floats * Recast ints that should have been floats * Better trap handling * Unstash float concretization * Fix log-lived 'trap' attr * Blacken * Concretize F32 when promoting to F64 _shouldn't_ strictly be necesssary, but the lazy conversion means that one of the tests fails. Until we get symbolic floating point, this will have to do. * Docstrings for platforms/wasm.py * Docstrings for ManticoreWASM * Add docs for the module structure * Add WASM to RTD * missing docstring for Module itself * Fix sphinx errors * Bump RTD python version I'm sorry and I swear I'll fix it later * Reformat docstrings for sphinx * Pacify RTD by adding newlines before params * Missed a few * Document the types * Document executor * A few more type docstrings * Partial runtime_structure docs * Finish documenting runtime structure * Fix trailing whitespace * Add collatz conjecture example * Include undocumented members in RTD * Add a bunch of events * Appropriately connect event publishers * Generate useful testcases * Blacken * Attempt at 3.6 backporting * Optimize AtomicStack * Sort solve_n output in test_solve_n Very unclear why this is happening, but it's breaking a travis build https://travis-ci.com/trailofbits/manticore/jobs/246451527 * Roll back to trusty? Unsure if this is what's causing the EVM issues * Create custom trap types * Lazily evaluate overflows May need to revert this if it slows down the Travis tests. It's 0.7% slower in the example I'm looking at, but that may not carry over. * Blacken * Address Dominik's feedback, remove stale TODOs * Expand & Test Examples * Fix relative path * Fix relative path? * capitalize Path * Fix relative path for real * Blacken... * Test finalize * Add supplemental tests for coverage * Use dedicated helper for store access * Handle quoted strings in imported module names * Allow WASMWorld to hold multiple modules Necessary for getting imports/exports to work, but still needs some changes before it passes the tests. * Actually use export maps * Fix most of the control flow issues with import tests * Use available cores for test generation * Escape module name * Don't reinit modules that have already been initialized * Use addresses for imports instead of copying * Fix removed timeout argument * Interleave registration and aliases * Fix broken "None" check for address * Blacken * Make manual exports work with address syntax * Import all registered modules Only necessary for elem, but it's tricky to make changes just for one test, so *shrug emoji* * Remove stringified annotations Turns out those only work in the same file... * Fix easy type issues in all but runtime structure * Fix easy type issues in runtime_structure * Fix* everything but imports * Blacken * Blacken 2 * Typo * Merge module structure and runtime structure * Fix instance vs type confusion * Fix mypy and concrete tests Still an issue with the symbolic tests * Still generate symbolic tests though I keep forgetting not to commit that file... * Attempt to fix env for symbolic tests * Blacken/mypy * Fix missing supplemental environment variables arg * Fix docstrings for new import style * CC * Bump timeout duration to 70 minutes I don't like it, but adding more jobs to Travis isn't going to make our builds any faster * Fix pycharm type, import, and docstring errors * Explain type: ignore * Missed two type:ignore's * Fix CLI so it doesn't throw up on arbitrary modules
trailofbits · Nov 8, 2019 · 1208759 · 1208759
1 parent 561c3ee
commit 1208759
Show file tree

Hide file tree

Showing 33 changed files with 5,632 additions and 12 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -15,13 +15,14 @@ stages:
 env:
   global:
     - CC_TEST_REPORTER_ID=db72f1ed59628c16eb0c00cbcd629c4c71f68aa1892ef42d18c7c2b8326f460a
-    - JOB_COUNT=4 # Four jobs generate test coverage: ethereum, ethereum_vm, native, and other
+    - JOB_COUNT=5 # Five jobs generate test coverage: ethereum, ethereum_vm, native, wasm, and other
     - PYTHONWARNINGS="default::ResourceWarning" # Enable ResourceWarnings
   matrix:
     - TEST_TYPE=examples
     - TEST_TYPE=ethereum
     - TEST_TYPE=ethereum_vm
     - TEST_TYPE=native
+    - TEST_TYPE=wasm
     - TEST_TYPE=other
 
 branches:
@@ -61,7 +62,7 @@ install:
   - scripts/travis_install.sh $TEST_TYPE
 
 script:
-  - travis_wait 60 scripts/travis_test.sh $TEST_TYPE
+  - travis_wait 70 scripts/travis_test.sh $TEST_TYPE
 
 after_success:
   - ./cc-test-reporter format-coverage -t coverage.py -o "coverage/codeclimate.$TEST_TYPE.json"

diff --git a/docs/index.rst b/docs/index.rst
@@ -14,6 +14,7 @@ Manticore is a symbolic execution tool for analysis of binaries and smart contra
    states
    evm
    native
+   wasm
    plugins
    gotchas
 

diff --git a/docs/wasm.rst b/docs/wasm.rst
@@ -0,0 +1,42 @@
+Web Assembly
+------------
+
+
+ManticoreWASM
+^^^^^^^^^^^^^
+
+.. automodule:: manticore.wasm.manticore
+    :members:
+    :undoc-members:
+
+
+WASM World
+^^^^^^^^^^
+
+.. automodule:: manticore.platforms.wasm
+    :members:
+    :undoc-members:
+
+
+Executor
+^^^^^^^^
+
+.. automodule:: manticore.wasm.executor
+    :members:
+    :undoc-members:
+
+
+Module Structure
+^^^^^^^^^^^^^^^^^
+
+.. automodule:: manticore.wasm.structure
+    :members:
+    :undoc-members:
+
+
+Types
+^^^^^
+
+.. automodule:: manticore.wasm.types
+    :members:
+    :undoc-members:
diff --git a/examples/wasm/collatz/collatz.c b/examples/wasm/collatz/collatz.c
@@ -0,0 +1,15 @@
+int collatz(int x){
+  if (x <= 1){
+    return 0;
+  }
+  if (x % 2 == 0){
+   return collatz(x / 2) + 1;
+  }
+  else{
+   return collatz(3 * x + 1) + 1;
+  }
+}
+
+int main() {
+  return collatz(getchar(">"));
+}
diff --git a/examples/wasm/collatz/collatz.py b/examples/wasm/collatz/collatz.py
@@ -0,0 +1,145 @@
+"""
+Three different ways of evaluating the Collatz conjecture.
+
+Example 1: Pass a symbolic argument to the `collatz` function
+Example 2: Pass a symbolic version of `getchar` as an imported function
+Example 3: Concretely evaluate `collatz(1337)` and count the instructions executed
+
+See: examples/collatz.c
+"""
+
+from manticore.wasm import ManticoreWASM
+from manticore.wasm.types import I32
+from manticore.core.plugin import Plugin
+from manticore.utils.log import set_verbosity
+
+
+print(
+    """
+
+============ Example 1 ============
+"""
+)
+
+m = ManticoreWASM("collatz.wasm")
+set_verbosity(2)
+
+
+def arg_gen(state):
+    # Generate a symbolic argument to pass to the collatz function.
+    # Possible values: 4, 6, 8
+    arg = state.new_symbolic_value(32, "collatz_arg")
+    state.constrain(arg > 3)
+    state.constrain(arg < 9)
+    state.constrain(arg % 2 == 0)
+    return [arg]
+
+
+# Set up Manticore to run the collatz function with the given argument generator.
+# We use an argument generator function instead of a list of arguments because Manticore
+# might have multiple states waiting to begin execution, and we can conveniently map a
+# generator function over all the ready states and get access to their respective
+# constraint sets.
+m.invoke("collatz", arg_gen)
+
+# Run the collatz function
+m.run()
+
+# Manually collect return values
+for idx, val_list in enumerate(m.collect_returns()):
+    print("State", idx, "::", val_list[0])
+
+# Finalize the run (dump testcases)
+m.finalize()
+
+
+print(
+    """
+
+============ Example 2 ============
+"""
+)
+
+
+def getchar(constraints, _addr):
+    """
+    Stub implementation of the getchar function. All WASM cares about is that it accepts the right
+    number of arguments and returns the correct type. All _we_ care about is that it returns a symbolic
+    value, for which Manticore will produce all possible outputs.
+
+    :param constraints: The current constraint set
+    :param _addr: Memory index of the string that gets printed by getchar
+    :return: A symbolic value of the interval [1, 7]
+    """
+    res = constraints.new_bitvec(32, "getchar_res")
+    constraints.add(res > 0)
+    constraints.add(res < 8)
+    return [res]
+
+
+# Pass our symbolic implementation of the `getchar` function into the WASM environment
+# as an import.
+m = ManticoreWASM("collatz.wasm", env={"getchar": getchar})
+
+# Invoke the main function, which will call getchar
+m.invoke("main")
+
+# Run the example
+m.run()
+
+# Manually collect return values
+for idx, val_list in enumerate(m.collect_returns()):
+    print("State", idx, "::", val_list[0])
+
+# Finalize the run (dump testcases)
+m.finalize()
+
+
+print(
+    """
+
+============ Example 3 ============
+"""
+)
+
+
+class CounterPlugin(Plugin):
+    """
+    A plugin that counts the number of times each instruction occurs
+    """
+
+    def did_execute_instruction_callback(self, state, instruction):
+        with self.locked_context("counter", dict) as ctx:
+            val = ctx.setdefault(instruction.mnemonic, 0)
+            ctx[instruction.mnemonic] = val + 1
+
+    def did_terminate_state_callback(self, state, *args):
+        insn_sum = 0
+        with self.locked_context("counter") as ctx:
+            for mnemonic, count in sorted(ctx.items(), key=lambda x: x[1], reverse=True):
+                print("{: <10} {: >4}".format(mnemonic, count))
+                insn_sum += count
+        print(insn_sum, "instructions executed")
+
+
+def arg_gen(_state):
+    return [I32(1337)]
+
+
+m = ManticoreWASM("collatz.wasm")
+
+# Registering the plugin connects its callbacks to the correct events
+m.register_plugin(CounterPlugin())
+
+# Invoke `collatz(1337)`
+m.invoke("collatz", arg_gen)
+
+# Run the collatz function
+m.run()
+
+# Manually collect return values
+for idx, val_list in enumerate(m.collect_returns()):
+    print("State", idx, "::", val_list[0])
+
+# Finalize the run (dump testcases)
+m.finalize()
diff --git a/examples/wasm/collatz/collatz.wasm b/examples/wasm/collatz/collatz.wasm
diff --git a/examples/wasm/collatz/collatz.wat b/examples/wasm/collatz/collatz.wat
@@ -0,0 +1,116 @@
+(module
+ (type $FUNCSIG$i (func (result i32)))
+ (type $FUNCSIG$ii (func (param i32) (result i32)))
+ (import "env" "getchar" (func $getchar (param i32) (result i32)))
+ (table 0 anyfunc)
+ (memory $0 1)
+ (data (i32.const 16) ">\00")
+ (export "memory" (memory $0))
+ (export "collatz" (func $collatz))
+ (export "main" (func $main))
+ (func $collatz (; 1 ;) (param $0 i32) (result i32)
+  (local $1 i32)
+  (set_local $1
+   (i32.const 0)
+  )
+  (block $label$0
+   (br_if $label$0
+    (i32.lt_s
+     (get_local $0)
+     (i32.const 2)
+    )
+   )
+   (set_local $1
+    (i32.const 0)
+   )
+   (loop $label$1
+    (set_local $1
+     (i32.add
+      (get_local $1)
+      (i32.const 1)
+     )
+    )
+    (br_if $label$1
+     (i32.gt_s
+      (tee_local $0
+       (select
+        (i32.add
+         (i32.mul
+          (get_local $0)
+          (i32.const 3)
+         )
+         (i32.const 1)
+        )
+        (i32.shr_u
+         (get_local $0)
+         (i32.const 1)
+        )
+        (i32.and
+         (get_local $0)
+         (i32.const 1)
+        )
+       )
+      )
+      (i32.const 1)
+     )
+    )
+   )
+  )
+  (get_local $1)
+ )
+ (func $main (; 2 ;) (result i32)
+  (local $0 i32)
+  (local $1 i32)
+  (set_local $1
+   (i32.const 0)
+  )
+  (block $label$0
+   (br_if $label$0
+    (i32.lt_s
+     (tee_local $0
+      (call $getchar
+       (i32.const 16)
+      )
+     )
+     (i32.const 2)
+    )
+   )
+   (set_local $1
+    (i32.const 0)
+   )
+   (loop $label$1
+    (set_local $1
+     (i32.add
+      (get_local $1)
+      (i32.const 1)
+     )
+    )
+    (br_if $label$1
+     (i32.gt_s
+      (tee_local $0
+       (select
+        (i32.add
+         (i32.mul
+          (get_local $0)
+          (i32.const 3)
+         )
+         (i32.const 1)
+        )
+        (i32.shr_u
+         (get_local $0)
+         (i32.const 1)
+        )
+        (i32.and
+         (get_local $0)
+         (i32.const 1)
+        )
+       )
+      )
+      (i32.const 1)
+     )
+    )
+   )
+  )
+  (get_local $1)
+ )
+)
diff --git a/manticore/__main__.py b/manticore/__main__.py
@@ -10,6 +10,7 @@
 from crytic_compile import is_supported, cryticparser
 from .core.manticore import ManticoreBase, set_verbosity
 from .ethereum.cli import ethereum_main
+from .wasm.cli import wasm_main
 from .utils import config, log, install_helper
 
 consts = config.get_group("main")
@@ -39,6 +40,8 @@ def main():
 
     if args.argv[0].endswith(".sol") or is_supported(args.argv[0]):
         ethereum_main(args, logger)
+    elif args.argv[0].endswith(".wasm") or args.argv[0].endswith(".wat"):
+        wasm_main(args, logger)
     else:
         install_helper.ensure_native_deps()
         native_main(args, logger)

diff --git a/manticore/core/manticore.py b/manticore/core/manticore.py
@@ -327,7 +327,7 @@ def _fork(self, state, expression, policy="ALL", setstate=None):
         to the ready list.
 
         """
-        assert isinstance(expression, Expression)
+        assert isinstance(expression, Expression), f"{type(expression)} is not an Expression"
 
         if setstate is None:
 

diff --git a/manticore/core/smtlib/operators.py b/manticore/core/smtlib/operators.py
@@ -248,6 +248,8 @@ def SREM(a, b):
         return a.srem(b)
     elif isinstance(b, BitVec):
         return b.rsrem(a)
+    elif isinstance(a, int) and isinstance(b, int):
+        return a - int(a / b) * b
     return a % b
-Original file line number
+Diff line change
@@ Expand Up @@
        states
        evm
        native
+       wasm
        plugins
        gotchas
@@ Expand Down @@