Skip to content

Reduce usage of concrete evaluation in x86 semantics #1241

Open
@Colton1skees

Description

@Colton1skees

Hi @JonathanSalwan,

In x86semantics.cpp, there are 46 instruction handlers where the emitted ASTs are dependent upon a concrete value retrieved through .evaluate():

  • x86Semantics::cmpsb_s
  • x86Semantics::cmpsd_s
  • x86Semantics::cmpsq_s
  • x86Semantics::cmpxchg_s
  • x86Semantics::cmpxchg8b_s
  • x86Semantics::cpuid_s
  • x86Semantics::div_s
  • x86Semantics::idiv_s
  • x86Semantics::lodsb_s
  • x86Semantics::lodsd_s
  • x86Semantics::lodsq_s
  • x86Semantics::lodsw_s
  • x86Semantics::movsb_s
  • x86Semantics::movsd_s
  • x86Semantics::movsq_s
  • x86Semantics::movsw_s
  • x86Semantics::pextrb_s
  • x86Semantics::pextrd_s
  • x86Semantics::pextrq_s
  • x86Semantics::pextrw_s
  • x86Semantics::pinsrb_s
  • x86Semantics::pinsrd_s
  • x86Semantics::pinsrq_s
  • x86Semantics::pinsrw_s
  • x86Semantics::rcl_s
  • x86Semantics::rcr_s
  • x86Semantics::rol_s
  • x86Semantics::ror_s
  • x86Semantics::sar_s
  • x86Semantics::scasb_s
  • x86Semantics::scasd_s
  • x86Semantics::scasq_s
  • x86Semantics::scasw_s
  • x86Semantics::shl_s
  • x86Semantics::shld_s
  • x86Semantics::shr_s
  • x86Semantics::shrd_s
  • x86Semantics::stosb_s
  • x86Semantics::stosd_s
  • x86Semantics::stosq_s
  • x86Semantics::stosw_s
  • x86Semantics::vextracti128_s
  • x86Semantics::vperm2i128_s
  • x86Semantics::vpextrb_s
  • x86Semantics::vpextrq_s
  • x86Semantics::vpextrw_s

There are some similar cases(push/pushfq/pop/popfq, fxsave/fxrstor, ret, etc) involving symbolic memory, but IMO they're out of the scope of this issue since Triton does not have STORE/LOAD ast nodes.

One step towards #473 would be to replace .evaluate() usages with code to emit AST nodes which depend on a symbolic value. Any objections to me opening a PR for this?

You can break the usages of .evaluate() into four categories:

  1. String instructions (e.g. scasb) where the concrete value of cx is used to determine whether any nodes should be emitted
  2. Exception raisers(e.g. div_s) where the exception variable is set for cases where x86 would raise an exception
  3. General cases(e.g. pinsrb_s, vextracti128_s) where I'm not sure why a concrete value is used instead of a symbolic value
  4. Conditional undefines of a register(e.g. rol_s where of is set to an undefined value if the src operand is greater than 1)

For these categories:

  1. An ITE node (e.g. 'dst = ite(cx == 0, original_value, new_value)` could be used instead
  2. Exception raising is probably out of the scope of this issue, since the IR has no way of modeling exception raising. I would leave this as is.
  3. A symbolic value could be used
  4. An ITE node(e.g. of = ite(src > 1, undef, of)) could be used

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions