Skip to content

Improve model of heap memory management #354

@ccasin

Description

@ccasin

The current default model for malloc simply assumes that it generates a fresh result every time it is called.

This is inconvenient for comparative analysis, because if the two programs have an "identical" call to malloc, we currently assume each call gives a different result. This causes false positives where we think the programs differ because of this different result value or, more subtly, because the memories differ after writes through those pointers.

What to do? We certainly shouldn't assume malloc's result is determined by its arguments, as it is stateful. In the real world, malloc's result is determined by its arguments and a bunch of allocator state stored in memory. So it would be technically accurate to say malloc's result is determined by its arguments and the current state of memory, but this doesn't really work for CBAT because we aren't modeling all that allocator state in memory. Indeed, if the model of malloc doesn't also change memory, this model would result in two consecutive calls to malloc returning the same results.

In principle we could build an accurate model of libc's state in memory and dutifully track the ways malloc changes it. But that's very time consuming (and would need to be redone for each libc implementation) and also probably extremely inefficient for analysis.

So, how about this: Model malloc with a pair of uninterpreted functions: malloc_result_model, which is like malloc but with an extra integer parameter representing the current allocator state, and malloc_state_update, which takes the same arguments and models the change to the heap state when malloc is called.

The high level idea is for malloc_result_model's extra allocator state parameter to be different at each call to malloc within one program, but the same between the two programs in comparative analysis. Of course, we don't actually have a way to "match up" calls to malloc in the two programs, but perhaps it's good enough to just assume that the heap state is the same at the start of the two functions being analyzed. The function malloc_state_update returns the new allocator state value for the next call to malloc.

I think would be an improvement to our current model. We still keep the idea that each consecutive call to malloc returns a different result (because the state parameter changes), but in a way that is deterministic enough so that we can get the same results in comparative analysis. I can, however, see two potential issues, which maybe we can ignore for now:

  1. This model pretends that the allocator state is totally separate from memory. But of course this allocator state parameter is a model for a whole bunch of data that malloc relies on, which is stored at various different places in memory. Writes to memory don't change the allocator state in our model, but might in the real world. This is a potential source of false NEGATIVES.

  2. This model allows the allocator state to change after each call to malloc, but doesn't require the allocator state to be a fresh unique result every time. This is a potential source of false positives, where malloc in this model can return the same value more often than is really the case. On the other hand, if we somehow added the assumption that each change to the allocator state results in a truly unique result, that would also be wrong, and in a way that can cause false negatives.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions