-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Compile-time Target Architecture Tags #143
base: master
Are you sure you want to change the base?
Conversation
This commit is the first commit among a series of patches that implement the compile-time architecture tag proposal in capstone-rust#118. It contains the following major changes: - Add the ArchTag trait. This trait indicates a "tag" type that represents a specific architecture. Its associated types specify specific data types corresponding to the architecture it represents. - Add ArchTag implementations for supported architectures, and a special DynamicArchTag that indicates the target architecture is unknown at compile time. - Update existing general type definitions (e.g. Capstone, Insn, etc.) to acquire a general arch tag type parameter.
This commit changes the bindgen settings so that bindgen generates newtype enums instead of modules of constants for arch insn group types and reg types. Usages of these binding types are updated as well.
I like your change which uses newtype_enum instead of the constified enum module! It's already a big improvement. Bindgen didn't supported them when I first did this work. In fact, I contributed the "constified enum module" feature to bindgen (rust-lang/rust-bindgen#741) to avoid the less ergonomic constified enum. However, I think we may be able to do even better (although it might be more annoying). Ideally, we would expose actual Rust enums in the high-level bindings. That way, users could take advantage of the exhaustive pattern matching with // Have internal "invalid" variant
enum ArmReg {
Invalid = 0,
APSR,
APSR_NZCV,
// ...
} // No invalid variant, so converting gives `Option<ArmReg>`
enum ArmReg {
APSR = 1,
APSR_NZCV,
// ...
} You don't need to implement this "Rust enum" change as a part of this PR--I just thought I'd let you know what I was thinking. |
Actually, I tried this at the beginning when writing the second commit. However, I quickly ran into problems. Take ARM64 registers as an example. The problem is that there are actually 2 underlying C enums that both represent valid ARM64 reg IDs: On the other side, if we generate newtype enums for these two underlying C enums, we can easily convert an impl From<Arm64Sysreg> for Arm64Reg {
fn from(sysreg: Arm64Sysreg): Self {
Self(sysreg.0)
}
} |
This commit contains changes that make existing examples and benches compile with the latest API design. Unit tests will be covered in later commits. The primary changes include: - Implement DetailArchInsn for ArchDetail. - Add ArchOperandIterator enum that iterates values of ArchOperand inside an ArchDetail. The enum is basically a wrapper that holds all available arch- specific operand iterators in different variants. This commit also contains some refactors and minor changes, including: - Rename ArchDetail to ArchInsnDetail as the old name is a bit confusing. - Fully add sysz support. - Update existing examples and benches. In most cases, to migrate these existing examples and benches, I just need to add a small number of generic type arguments (e.g. X86ArchTag, ArmArchTag, DynamicArchTag, etc.) and the code just compiles.
This commit fixes a minor problem that prevents tests from compiling and running to success. Arch-specific register IDs (represented by ArchTag::RegId) and instruction group IDs (represented by ArchTag::InsnGroupId) are generated as newtype structs whose inner type is c_uint because their C definition are just C enums. However, in a cs_detail struct, the regs_read field is an array of u16, not an array of c_uint. So we cannot just type pun on the underlying array to get &[A::RegId] because the layout is totally different. The similar problem exists for InsnDetail::regs_write and InsnDetail::groups. This commit fixes the problem by making these function return an Iterator rather than a slice. The iterator will map the underlying array elements to actual arch-specific types when iterated.
Codecov Report
@@ Coverage Diff @@
## master #143 +/- ##
==========================================
- Coverage 93.99% 86.51% -7.49%
==========================================
Files 22 18 -4
Lines 2733 1379 -1354
Branches 2687 1377 -1310
==========================================
- Hits 2569 1193 -1376
- Misses 164 186 +22
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Makes sense to me. If we want "handcrafted enums" that are more ergonomic, it would certainly be a separate PR. |
I've updated some source code documentation to reflect the new architecture tag. However, I don't add a systematic overview or tutorial about the architecture tag as I'm not sure whether that's necessary. @tmfink Please have a look and see if the documentation can be further improved. The examples and benches are already up-to-date with the latest API design. I'm checking whether the test coverage can be further improved and after that I belive this PR is ready to be formally reviewed and merged. |
This PR resolves #118, which proposes compile-time target architecture tags. It contains a series of commits that implement the proposal. Problems may arise as I author these commits so I'm opening this PR in an early stage to make sure that we can agree on the specific designs in the end.
Here's a list of unresolved problems:
CapstoneBuilder
that buildsCapstone
instances whose target architectures are "really" unknown at compile-time. When we call functions likex86
, we have explicitly tell that the target architecture isx86
. We should implement a new builder that sets the target architecture via a runtimeArch
value.RegId
andInsnGroupId
(some architectures even theInsnId
) are simply defined as a bunch ofu32
values inside a mod rather than a dedicated enum (e.g. ARM, ARM64, x86, etc.). We’d better re-define them to be dedicated enums so that definingArchTag::RegId
andArchTag::InsnGroupId
is meaningful.After we support compile-time arch tags, many operations need not to return anOption
orResult
any more as they should always succeed given the exact target architecture. But forDynamicArchTag
these operations may still fail so we need to keep their return types as it is now. How do we resolve the conflict?Here's a TODO list to make this PR fully ready:
examples, benches andtests.