Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust bindings refactor #6309

Open
wants to merge 78 commits into
base: dev
Choose a base branch
from
Open

Rust bindings refactor #6309

wants to merge 78 commits into from

Conversation

emesare
Copy link
Member

@emesare emesare commented Jan 11, 2025

Opening this PR for public discussion. This needs more testing and also lacks a few major changes that make sense to land alongside the already existing set of breaking changes.

The Rust bindings are being heavily refactored for the 4.3 release. This is to address many of the shortcomings and historical issues, namely around maintaining and safety.

Some of the additions of this branch include:

  • Revised project structure
  • Consistent formatting (with rustfmt)
  • Many UB fixes
  • Unit tests
  • More/updated documentation
  • Real rust examples (i.e. cargo run --example)
  • Many of the already opened PR's
  • Additional APIs
  • Updated and idiomatic linking to the core
  • Consolidation of API's (removal of historical API's)
  • Rust main thread handler
  • Type driven API's (ex. RegisterId)
  • Updated CI

Reviewer Guide

TODO

  • Explain the CI and add a bunch of links to relevant GitHub Actions documentation
  • Explain the project structure (this isn't really necessary for consumers)
  • Explain the usage of Ref
  • Explain the usage of BnString (and strings in general)
  • Explain how to review PR's
  • Explain how to run tests

emesare and others added 30 commits December 22, 2024 01:21
Moves a bunch of stuff out of src/types.rs that did not belong:
- Confidence
- Variable
- Function specific stuff

- Refactored InstructionInfo, see the msp430 and riscv examples.
- Renamed Function::from_raw to Function::ref_from_raw and fixed places where the ref was incremented twice
- Fixed FunctionRecognizer leaking functions (see above)
- Fixed some apis incorrectly returning Result where Option is clearer
- Started to move destructured types to the From trait for converting to an from ffi types, see Location for an example
- Started to remove bad module level imports (importing std modules like mem all over the place)
- Moved some wrapper handle types to named handle field (this improves readability), see CoreArchitecture for an example
- Removed some unintuitive getters, this is bad practice for Rust code, just access the field directly, see DataVariable for an example
- General code cleanup, purposely did not run rustfmt, that will be a single seperate commit
- Fixed invalid views being able to invoke UB when dealing with databases
- Cleaned up some helper code in dwarf_import
- Fixed inverted is_null checks causing crashes! Oops!
Still a WIP, I think branch info is still invalid, need to figure out the issue there.

- Fixed some invalid Ref lifetimes when constructing indirectly, see Array<DataVariable> for example
- Added some more comments
- Removed some "magic" functions like MLIL Function::ssa_variables

There are still a bunch of invalid lifetimes that aren't crashing us due to the usage of those API's not living long enough. But they ARE an issue.
Trying to comment more TODO's as I go along.

- Renamed llil::Function to llil::LowLevelILFunction for clarity and consistency
- Take base structures by ref in StructureBuilder::set_base_structures to prevent premature drops
- Added more raw to wrapper conversions
- Removed UB prone apis
- Getting rid of more core module references, use std!
- Removed improper Guard usage in array impls for wrapper types with no context
- Untangling the UB of the Label api, still only half done :/
- Misc formatting
- Made Logger ref counted
- Fixed leaking name of logger every time something was logged
- Fixed the last (hopefully) of the unresolved labels
- Simplified some code
componentArray was never freed
When linking you must depend on the -sys crate.

This is because linker arguments (what says where to find binaryninjacore) are NOT transitive. The top level application crate MUST provide it.

For more information see:
- rust-lang/cargo#9554 (comment)
- oxidecomputer/omicron#225
Use cargo to manage the git repo ref instead
This is where all shipped public plugins that are not arch/view/platform/lang will be at from now on

Originally they were in the rust workspace, meaning they all shared a Cargo.lock which is ill-advised.
- More clarification on plugin/executable requirements
- Made examples actually rust examples
- Add Display impl for InstructionTextToken
- Renamed feature "noexports" to "no_exports"
We don't register a compatible log sink so they will just get sent into the void
This is being done in the hopes of curbing the multi-thousand line files that will occur once we flesh out the tests
Still need to add support for running tests in ci
- Architecture id's are now typed accordingly
- Fix some clippy lints
- Make instruction index public in LLIL
- Removed useless helper functions
- LLIL expressions and instruction indexes are now typed accordingly
This should show binaryninjacore-sys alongside binaryninja crate
- Remove lazy_static dependency
- Remove hacky impl Debug for Type and just use the display impl
- Add more debug impls
- Reorder some top level namespace items
- Add first type test
- Added main thread handler api
- Register a headless main thread handler by default in initialization
- Refactor QualifiedName to be properly owned
- Loosened some type constraints on some apis involving QualifiedName
- Fixed some apis that were crashing due to incorrect param types
- Removed extern crate cruft for log crate
- Simplified headless initialization using more wrapper apis
- Fixed segments leaking because of no ref wrapper, see BinaryViewExt::segment_at
- Added rstest to manage headless init in unit tests
- Added some more unit tests
- Refactored demangler api to be more ergonomic
- Fixed minidump plugin not building
- Fixup usage of QualifiedName in plugins
- Make QualifiedName more usable
- Rename Label to LowLevelILLabel
- Update the functions label map automatically

This fixed a major api blunder where the label map is returned as a reference and originally resulted in UB prone lifetime semantics. It was temporarily "fixed" with explicit label updates in the architecture implementation code. But that was less than ideal and was easy to mess up. Now the label map will be updated automatically as the location of labels is now tracked.
- Get rid of RawFunctionViewType
- Add better Debug impl for Function
- Fixed the documentation icon using local files (thank you @mkrasnitski)
- Fixed labels being updated and overwriting the label location used to update the label map
@emesare
Copy link
Member Author

emesare commented Jan 20, 2025

Before this is merged we should add a migration guide somewhere, most likely in the body of the first PR message.

We need to wrap this up this week, there are a few downstream items that depend on this, namely WARP (see: type containers / type archives)

We should put a pin in any further breaking changes and document the modules that are going to be changed in future releases (after next stable).

@rbran
Copy link
Contributor

rbran commented Jan 20, 2025

One suggestion is to remove the generic A: Architecture from the CallingConvention: https://github.com/Vector35/binaryninja-api/compare/rust_cleanup_0...rbran:binaryninja-api:cc-core-arch?expand=1

For two main reasons:

First the implementation seems to only use CoreArchitecture by default anyways, so a custom architecture is never specified, adding a complexity with no benefits. Also the CallingConvention struct don't make use of the function BNGetCallingConventionArchitecture.

Second to allow the user to define one Custom-CallingConvention and implement it to multiple architectures. The currently implementation only allow one CallingConvention (struct) per architecture, because CallingConventionBase define the Arch as an associated type

@rbran
Copy link
Contributor

rbran commented Jan 20, 2025

Another good to have feature is to BinaryViewReader implementing Read correctly. The idb-import plugin creates a custom implementation of std::io::Read for BinaryView instead of using BinaryViewReader because it don't implement SeekFrom::End correctly.

This can be fixed by adding a function that returns the BinaryView from the BinaryViewReader (or at least the size of it) is added to the core API, like this:

pub fn BNGetBinaryReaderView(stream: *mut BNBinaryReader) -> *mut BNBinaryView;
// or at least the size of the view
pub fn BNGetBinaryReaderBinaryViewLen(stream: *mut BNBinaryReader) -> u64;

The same applies to BinaryViewWriter

@emesare
Copy link
Member Author

emesare commented Jan 20, 2025

@rbran both of those suggestions seem completely reasonable, I will look into integrating both with this PR. Thank you!

- Added unit tests for MLIL and HLIL
- "Fixed" MLIL, LLIL, and HLIL having issues regarding Instruction vs Expression indexes
- Renamed CallingConvention to CoreCallingConvention and removed architecture generic
- Renamed CallingConventionBase to CallingConvention
- Simplified calling convention code and fixed some bugs with implicit registers
- Added impl Debug to MLIL and HLIL instructions

Still need to at some point add an Expression to MLIL and HLIL. We also might want to look into having the Instruction kind just return the expression kind.
@emesare
Copy link
Member Author

emesare commented Jan 21, 2025

Rolling up this PR is gonna be unfortunate from a git blame perspective but that was always going to be the case.

All contiguous "More rust cleanup" commits will most likely be squashed with an incrementing number.

  • Rust refactor
  • Some commit
  • Rust refactor 2
  • Another commit
  • Rust refactor 3

That or we just squash this PR down to 1 commit. I don't really mind either.

- Allow calling conventions to be registered for multiple architectures
- Swapped a unreachable statement to an unimplemented statement
@emesare
Copy link
Member Author

emesare commented Jan 21, 2025

@rbran does this look reasonable for reusing a calling convention across architectures?

b1693fd

@emesare
Copy link
Member Author

emesare commented Jan 21, 2025

test struct_grouper::test_dispatcher_header ... FAILED
test struct_grouper::test_bool_modifier ... FAILED

These were failing prior to my changes. However there still seems to be some regression when running through the python tests regarding the PDB import plugin.

@rbran
Copy link
Contributor

rbran commented Jan 21, 2025

@emesare In my view yes, You implement a CC, then register it for each architecture.

eg:

let my_cc = MyCC::new();
let my_cc_x86 = register_calling_convention(arch_x86, "MyCC-x86", my_cc.clone());
let my_cc_arm = register_calling_convention(arch_arm, "MyCC-arm", my_cc);

Although it depends on how binja understand the "cc-arch" associations. Maybe it requires a unique CC per arch somewhere inside.

@emesare
Copy link
Member Author

emesare commented Jan 21, 2025

Although it depends on how binja understand the "cc-arch" associations. Maybe it requires a unique CC per arch somewhere inside.

The cc provides the register id so as long as the arch keeps the same register id mapping everything should be ok.

All of the same checks are performed that were done prior to this change so while in theory you could receive a bad register mapping, you cannot produce a bad mapping (i.e. rust api will prohibit you from giving the core a bad register id in your CC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Rust API Issue needs changes to the Rust API Effort: High Issue should take > 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Type: Enhancement Issue is a small enhancement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add binaryninjacore-sys to online docs
3 participants