Skip to content

[LibOS] Use RW locks in the VMA tree #1794

@dimakuv

Description

@dimakuv

Problem

Multi-threaded workloads with many syscalls stress the VMA subsystem a lot, because almost all syscalls verify their buffers for read/write access using the following functions:

  • is_user_memory_readable()
  • is_user_memory_writable()
  • is_user_string_readable()
  • is_user_memory_writable_no_skip()

All these functions call test_user_memory() helper:

/*
* Tests whether whole range of memory `[addr; addr+size)` is readable, or, if `writable` is true,
* writable. The intended usage of this function is checking memory pointers passed to system calls.
* Note that this does not check the accesses to the memory themselves and is only meant to handle
* invalid syscall arguments (e.g. LTP test suite checks syscall arguments validation).
*/
static bool test_user_memory(const void* addr, size_t size, bool writable) {
if (!access_ok(addr, size)) {
return false;
}
return is_in_adjacent_user_vmas(addr, size, writable ? PROT_WRITE : PROT_READ);
}

This helper in turn calls the is_in_adjacent_user_vmas() func:

bool is_in_adjacent_user_vmas(const void* addr, size_t length, int prot) {
uintptr_t begin = (uintptr_t)addr;
uintptr_t end = begin + length;
assert(begin <= end);
struct adj_visitor_ctx ctx = {
.prot = prot,
.is_ok = true,
};
spinlock_lock(&vma_tree_lock);
bool is_continuous = _traverse_vmas_in_range(begin, end, adj_visitor, &ctx);
spinlock_unlock(&vma_tree_lock);
return is_continuous && ctx.is_ok;
}

The important part is spinlock_lock(&vma_tree_lock) and spinlock_unlock(&vma_tree_lock). On a multi-threaded app, this lock contention becomes the bottleneck.

Gramine introduced a workaround to sidestep this bottleneck, via the libos.check_invalid_pointers manifest option; it translates to the g_check_invalid_ptrs variable. However, this cannot be used in all cases:

  • Some runtimes like Java rely on being able to check invalid pointers. Thus they cannot set libos.check_invalid_pointers = false; this would lead to Java apps failing.
  • The is_user_memory_writable_no_skip() function does not honor the libos.check_invalid_pointers manifest option; this is because in certain situations Gramine really must decide whether the VMA is writable or read-only, see e.g. the ppoll() case which emulates how Linux works.

Solution

Use the RW lock that was previously introduced in the Gramine codebase: https://github.com/gramineproject/gramine/blob/master/libos/include/libos_rwlock.h

Example usage of this RW lock: f071450

Benchmark results

TODO

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions