Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bitvector library #284

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

filipeom
Copy link
Member

@filipeom filipeom commented Feb 11, 2025

A library of bitvectors which aims to address the issues of the following libraries:

  • bitv: No functions for add, sum, mul, div, etc...
  • bitvec: No concat/append or extract/sub functions.

This will allow us to finally solve the issues with unsimplified extract/concat operators over concrete values

@filipeom filipeom marked this pull request as draft February 11, 2025 14:37
@filipeom filipeom linked an issue Feb 11, 2025 that may be closed by this pull request
2 tasks
@zapashcanon
Copy link
Collaborator

I believe there's now a nice bitvector library in alt-ergo, you may be able to re-use it?

@filipeom filipeom force-pushed the add-bitvector branch 6 times, most recently from 67c303a to 3262846 Compare February 16, 2025 11:50
@filipeom filipeom linked an issue Feb 17, 2025 that may be closed by this pull request
@filipeom filipeom marked this pull request as ready for review February 17, 2025 21:20
@filipeom
Copy link
Member Author

This is not 100% yet. But I think I'm going to merge it and then use Owi's fuzzer to guide the remaining implementation efforts, i.e. creating unit tests and fixing bugs (which there will be plenty 😅)

@zapashcanon
Copy link
Collaborator

I'm trying to use these changes in Owi. I already have a few comments.

Are you sure about removing Num.{I32,I64}.t ? They could be kept (but implemented using bitvectors).

On the same vein, do you think it is useful to expose the whole bitvector library through smtml ? It may be possible to keep it completely hidden and only expose the bits commonly needed (I8, I32, I62, I128) ?

Having only val make : Z.t -> int -> t to create bitvectors is no so convenient, it would be easir to use if we also had:

val make_int  : int -> int -> t (* so I don't have to look-up Z.t library's documentation :-) *)
val make_int32: Int32.t -> t (* no need to give a mask, we expect the user to *)
val make_int64: Int64.t -> t (* same *)

@@ -0,0 +1,6 @@
(library
(wrapped false)
(name bitvector)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed

@filipeom
Copy link
Member Author

filipeom commented Feb 18, 2025

Thanks for the awesome comments! This might be a long one so bear with me.

Minor comment

I'm trying to use these changes in Owi. I already have a few comments.

It might be too soon to try it in Owi. There are still many missing operators, and I’m sure there are bugs in the conversion of integers.

Question 1

Are you sure about removing Num.{I32,I64}.t ? They could be kept (but implemented using bitvectors).

The main reason I want to remove them is to make simplifications easier and expressions smaller. For instance, consider the rules for simplifying the extract and concat operators:

(* The extract rule taken from owi's symbolic_memory.ml *)
  match view v with
  | Val (Num (I32 i)) ->
    let i' = Int32.(to_int @@ logand 0xffl @@ shr_s i @@ of_int (pos * 8)) in
    value (Num (I8 i'))
  | Val (Num (I64 i)) ->
    let i' = Int64.(to_int @@ logand 0xffL @@ shr_s i @@ of_int (pos * 8)) in
    value (Num (I8 i'))

(* We could instead be written as: *)
  | Val (Bitv bv) -> value (Bitv (Bitvector.extract bv ~high ~low))

The change isn’t massive, but as we add larger integers, more rules need to be introduced.

The concat operator is where things get more annoying:

(* The concat simplifications taken from Owi's symbolic_memory.ml *)
  assert (offset > 0 && offset <= 8);
  match (view msb, view lsb) with
  | Val (Num (I8 i1)), Val (Num (I8 i2)) ->
    Value.const_i32 Int32.(logor (shl (of_int i1) 8l) (of_int i2))
  | Val (Num (I8 i1)), Val (Num (I32 i2)) ->
    let offset = offset * 8 in
    if offset < 32 then
      Value.const_i32 Int32.(logor (shl (of_int i1) (of_int offset)) i2)
    else
      let i1' = Int64.of_int i1 in
      let i2' = Int64.of_int32 i2 in
      Value.const_i64 Int64.(logor (shl i1' (of_int offset)) i2')
  | Val (Num (I8 i1)), Val (Num (I64 i2)) ->
    let offset = Int64.of_int (offset * 8) in
    Value.const_i64 Int64.(logor (shl (of_int i1) offset) i2)


(* Now becomes: *)
  match (view msb, view lsb) with
  | Val (Bitv a), Val (Bitv b) -> value (Bitv (Bitvector.concat a b))

This is why I believe that dropping I32 and I64 is the better approach. Currently, when we concatenate two I8s, we create an I32 and maintain it as long as the offset is ≤ 3. However, semantically, it would be more precise if two I8s created an I16, an I8 and an I16 created an I24, and an I8 and an I24 created an I32. Generalising this to the remaining concatenations for a 64-bit load, we would need:

| I8 | I16 | I24 | I32 | I40 | I48 | I56 | I64

This results in more pattern-matching cases in the simplification rules.

The Bitv approach also simplifies formulas such as:

Concat (Extract (Num (I32 0), 2, 0), Extract (Symbol _, 1, 0))
(* Becomes *)
Concat (Bitv bv, Extract (Symbol _, 1, 0))

where we load a single byte from memory and prepend zeros. However, to be fair, the former formula can be simplified using:

(* If we had an I24 *)
Concat (Num (I24 0), Extract (Symbol _, 1, 0))
(* If we did something more clever *)
Cvtop (Zero_extend 24, Extract(Symbol _, 1, 0))

and also allows us to eagerly remove extracts from concrete values since we would always know the bit length of a bitvector.

Question 2

On the same vein, do you think it is useful to expose the whole bitvector library through smtml ? It may be possible to keep it completely hidden and only expose the bits commonly needed (I8, I32, I62, I128) ?

I’d like it to be as accessible as the previously mentioned bitvector libraries, but I acknowledge that adding an additional library is awkward. Perhaps we could simply expose the module in the smtml public library? That way, anyone interested in the bitvector module could just import smtml and be done with it.

Question 3

Having only val make : Z.t -> int -> t to create bitvectors is no so convenient, it would be easir to use if we also had:
val make_int : int -> int -> t (* so I don't have to look-up Z.t library's documentation :-) )
val make_int32: Int32.t -> t (
no need to give a mask, we expect the user to )
val make_int64: Int64.t -> t (
same *)

That’s true. I also found it annoying while migrating the tests. I’ll add this in.

@filipeom filipeom marked this pull request as draft February 18, 2025 14:43
@filipeom
Copy link
Member Author

In any case, I can’t rush this at the moment. I have a lot of work for the next two weeks, so I’ll leave this pending. I’ll get back to it and experiment with the feedback you gave to see if there are more things I can do to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for I8 and I16 bitvectors
2 participants