Improve model of heap memory management

The current default model for `malloc` simply assumes that it generates a fresh result every time it is called.

This is inconvenient for comparative analysis, because if the two programs have an "identical" call to malloc, we currently assume each call gives a different result.  This causes false positives where we think the programs differ because of this different result value or, more subtly, because the memories differ after writes through those pointers.

What to do?  We certainly shouldn't assume `malloc`'s result is determined by its arguments, as it is stateful.  In the real world, `malloc`'s result is determined by its arguments and a bunch of allocator state stored in memory.  So it would be technically accurate to say `malloc`'s result is determined by its arguments and the current state of memory, but this doesn't really work for CBAT because we aren't modeling all that allocator state in memory.  Indeed, if the model of `malloc` doesn't also change memory, this model would result in two consecutive calls to `malloc` returning the same results.

In principle we could build an accurate model of libc's state in memory and dutifully track the ways `malloc` changes it.  But that's very time consuming (and would need to be redone for each libc implementation) and also probably extremely inefficient for analysis.

So, how about this:  Model `malloc` with a pair of uninterpreted functions: `malloc_result_model`, which is like `malloc` but with an extra integer parameter representing the current allocator state, and `malloc_state_update`, which takes the same arguments and models the change to the heap state when `malloc` is called.

The high level idea is for `malloc_result_model`'s extra allocator state parameter to be different at each call to `malloc` within one program, but the same between the two programs in comparative analysis.  Of course, we don't actually have a way to "match up" calls to `malloc` in the two programs, but perhaps it's good enough to just assume that the heap state is the same at the start of the two functions being analyzed.  The function `malloc_state_update` returns the new allocator state value for the next call to malloc.

I think would be an improvement to our current model.  We still keep the idea that each consecutive call to `malloc` returns a different result (because the state parameter changes), but in a way that is deterministic enough so that we can get the same results in comparative analysis.  I can, however, see two potential issues, which maybe we can ignore for now:

1) This model pretends that the allocator state is totally separate from memory.  But of course this allocator state parameter is a model for a whole bunch of data that malloc relies on, which is stored at various different places in memory.  Writes to memory don't change the allocator state in our model, but might in the real world.  This is a potential source of false NEGATIVES.

2) This model allows the allocator state to change after each call to malloc, but doesn't require the allocator state to be a fresh unique result every time.  This is a potential source of false positives, where malloc in this model can return the same value more often than is really the case.  On the other hand, if we somehow added the assumption that each change to the allocator state results in a truly unique result, that would also be wrong, and in a way that can cause false negatives.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve model of heap memory management #354

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve model of heap memory management #354

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions