Conversation
dimakuv
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
libos/src/bookkeep/libos_vma.c line 106 at r1 (raw file):
static struct avl_tree vma_tree = {.cmp = vma_tree_cmp}; static struct libos_rwlock vma_tree_lock; static bool vma_tree_lock_created = false;
Technically all these variables must have the prefix g_. But I didn't want to add this unrelated change in this PR.
libos/src/bookkeep/libos_vma.c line 108 at r1 (raw file):
static bool vma_tree_lock_created = false; static inline void vma_rwlock_read_lock(struct libos_rwlock* l) {
It's important to use these wrappers because at the very startup, we don't have the lock because it wasn't yet created. But at startup we have only one thread, so the lock would be redundant anyway.
Note that we can't create the lock as the very first step, because creating the lock itself requires the memory subsystem (VMA) to be fully initialized. So we disable the locking first, init the VMA subsystem, then create the lock and only then the VMA can be used in thread-safe manner.
libos/src/bookkeep/libos_vma.c line 146 at r1 (raw file):
#endif /* VMA code is supposed to use the vma_* wrappers of RW lock; hide the actual RW lock funcs */
Not sure if these define tricks are needed -- I wanted to make sure that future developers won't accidentally use rwlock_ functions but only wrappers.
libos/src/bookkeep/libos_vma.c line 1267 at r1 (raw file):
vma_rwlock_read_lock(&vma_tree_lock); bool is_continuous = _traverse_vmas_in_range(begin, end, adj_visitor, &ctx);
FYI: This is the main perf optimization (hopefully).
vasanth-intel
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
a discussion (no related file):
We don't see any problems with the PR and it improves perf for some workloads (MySQL and MariaDB) and doesn't degrade perf for other workloads like NginX, Tensorflow, SpecPower, Tensorflow Serving and Openvino(Latency).
dimakuv
left a comment
There was a problem hiding this comment.
Dismissed @vasanth-intel from a discussion.
Reviewable status: 0 of 1 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
Previously, vasanth-intel wrote…
We don't see any problems with the PR and it improves perf for some workloads (MySQL and MariaDB) and doesn't degrade perf for other workloads like NginX, Tensorflow, SpecPower, Tensorflow Serving and Openvino(Latency).
Thank you @vasanth-intel for the performance evaluation!
We're done with internal testing (on the Intel side); this PR is moved from Draft to Ready for Review.
|
Jenkins, test this please (just for sanity) |
c208c3c to
b27f24e
Compare
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
libos/src/bookkeep/libos_vma.c line 108 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
It's important to use these wrappers because at the very startup, we don't have the lock because it wasn't yet created. But at startup we have only one thread, so the lock would be redundant anyway.
Note that we can't create the lock as the very first step, because creating the lock itself requires the memory subsystem (VMA) to be fully initialized. So we disable the locking first, init the VMA subsystem, then create the lock and only then the VMA can be used in thread-safe manner.
can we add these explanations into the comments?
dimakuv
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
libos/src/bookkeep/libos_vma.c line 108 at r1 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
can we add these explanations into the comments?
Done.
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
dimakuv
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
a discussion (no related file):
@jkr0103 Could you point to the instructions on how to run those MongoDB and/or MySQL experiments that you did?
Instructions for MySQL:sudo chown -R $USER:$USER /var/lib/mysql-files sudo ln -s /etc/apparmor.d/usr.sbin.mysqld /etc/apparmor.d/disable/ sudo vim /etc/security/limits.conf
gramine-sgx mysqld --skip-log-bin --datadir /var/run/mysql-data numactl -N 0,1 -l gramine-sgx mysqld --skip-log-bin --datadir /var/run/mysql-data Sysbench Run:sudo apt install -y sysbench sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-db=sbtest --time=90 sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-db=sbtest --time=300 ======================================================================
|
dimakuv
left a comment
There was a problem hiding this comment.
@jkr0103 Where can we take the MySQL and MongoDB makefiles/manifests for Gramine?
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
|
You may find MySQL manifest amd makefiles here https://github.com/gramineproject/examples/pull/28/files#diff-3280189625504cf8039a0453946ea1b585f74319278ffd7abb72e4e71d019f57 For MongoDB: @vasanth-intel Please update here if it differ in your setup. |
chiache
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/src/bookkeep/libos_vma.c line 144 at r3 (raw file):
rwlock_write_unlock(l); }
Since only vma_tree_lock will be locked and unlocked here, why not remove the parameter to these functions and dedicate them to vma_tree_lock?
libos/src/bookkeep/libos_vma.c line 149 at r3 (raw file):
if (!vma_tree_lock_created) return true; return rwlock_is_read_locked(l);
To me, if a thread is holding the write lock, it should be able to perform a read-only operation atomically. So, IMO, the condition should be `rwlock_is_read_locked(l) or rwlock_is_write_locked(l).
libos/src/bookkeep/libos_vma.c line 158 at r3 (raw file):
} #endif
Shouldn't vma_rwlock_is_read_locked and vma_rwlock_is_write_locked be defined even if DEBUG is not?
mkow
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @chiache and @dimakuv)
libos/src/bookkeep/libos_vma.c line 149 at r3 (raw file):
Previously, chiache (Chia-Che Tsai) wrote…
To me, if a thread is holding the write lock, it should be able to perform a read-only operation atomically. So, IMO, the condition should be `rwlock_is_read_locked(l) or rwlock_is_write_locked(l).
If we decide so, then the function name will require changing, otherwise it's misleading.
libos/src/bookkeep/libos_vma.c line 158 at r3 (raw file):
Previously, chiache (Chia-Che Tsai) wrote…
Shouldn't
vma_rwlock_is_read_lockedandvma_rwlock_is_write_lockedbe defined even ifDEBUGis not?
No, these functions are inherently race'y and should be used only inside asserts (i.e. only in debug builds), I don't think there's any legitimate production use for them.
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache and @mkow)
libos/src/bookkeep/libos_vma.c line 144 at r3 (raw file):
Previously, chiache (Chia-Che Tsai) wrote…
Since only
vma_tree_lockwill be locked and unlocked here, why not remove the parameter to these functions and dedicate them tovma_tree_lock?
Done.
libos/src/bookkeep/libos_vma.c line 149 at r3 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
If we decide so, then the function name will require changing, otherwise it's misleading.
pls check if this is something you'd expect
6e40925 to
e79d27d
Compare
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @chiache and @mkow)
a discussion (no related file):
I have to rebase to revive the CI (hopefully this won't disrupt your reviews).
e79d27d to
143fb04
Compare
Multi-threaded workloads with many syscalls stress the VMA subsystem of LibOS, because almost all syscalls verify their buffers for read/write access using the functions `is_user_memory_readable()`, `is_user_memory_writable()`, etc. All these functions end up in VMA-specific `is_in_adjacent_user_vmas()` that grabs a global VMA lock. On some multi-threaded apps like MongoDB, this lock contention becomes the performance bottleneck. This commit tries to remove this bottleneck by switching from a spinlock to the Read-Write (RW) lock. The intuition is that most of the time, a read-only `is_in_adjacent_user_vmas()` func is called, which now uses the read lock. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
143fb04 to
50d5246
Compare
efu39
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @chiache, @dimakuv, @kailun-qin, and @mkow)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
assert(1 + idx == ARRAY_SIZE(init_vmas)); vma_rwlock_write_lock();
rwlock_create() is not yet done here.
Code quote:
vma_rwlock_write_lock();libos/src/bookkeep/libos_vma.c line 775 at r6 (raw file):
return -ENOMEM; } vma_tree_lock_created = true;
Should these be moved to front before line 697 where vma_rwlock_write_lock() is invoked earlier?
Code quote:
if (!rwlock_create(&vma_tree_lock)) {
return -ENOMEM;
}
vma_tree_lock_created = true;
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @chiache, @dimakuv, @efu39, and @mkow)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, efu39 (Erica Fu) wrote…
rwlock_create() is not yet done here.
Yes, I think this is what Dmitrii's comments were describing:
gramine/libos/src/bookkeep/libos_vma.c
Lines 114 to 121 in 50d5246
Well, upon rereading this, I don't understand the limitation and don't recall why we cannot create vma_tree_lock at the very beginning of the LibOS startup or why creating this lock requires the memory subsystem to be fully initialized. I'll need to double-check.
Signed-off-by: Kailun Qin <[email protected]>
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @efu39, and @mkow)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Yes, I think this is what Dmitrii's comments were describing:
gramine/libos/src/bookkeep/libos_vma.c
Lines 114 to 121 in 50d5246
Well, upon rereading this, I don't understand the limitation and don't recall why we cannot create
vma_tree_lockat the very beginning of the LibOS startup or why creating this lock requires the memory subsystem to be fully initialized. I'll need to double-check.
I rechecked and didn't get the limitation. There seemed to be no issues during testing when creating the lock early enough in the init. Let me know if I misunderstood anything.
libos/src/bookkeep/libos_vma.c line 775 at r6 (raw file):
Previously, efu39 (Erica Fu) wrote…
Should these be moved to front before line 697 where vma_rwlock_write_lock() is invoked earlier?
Not relevant any more.
Signed-off-by: Kailun Qin <[email protected]>
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @efu39, and @mkow)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I rechecked and didn't get the limitation. There seemed to be no issues during testing when creating the lock early enough in the init. Let me know if I misunderstood anything.
Ok, it turns out my understanding is wrong (though I can't reproduce on my setup); I've reverted back to the previous version.
mkow
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @dimakuv, @efu39, and @kailun-qin)
libos/src/bookkeep/libos_vma.c line 106 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Technically all these variables must have the prefix
g_. But I didn't want to add this unrelated change in this PR.
Please add g_ to the global variables refactored in this PR (at least vma_tree_lock and vma_tree_lock_created).
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Ok, it turns out my understanding is wrong (though I can't reproduce on my setup); I've reverted back to the previous version.
But the code up to the point of rwlock_create(&vma_tree_lock) doesn't need to take the lock? Could we just remove the lock calls from it? This way we wouldn't need this weird and rather dangerous semantics, that if the lock wasn't initialized then everything succeeds silently, without actually taking it.
In case there are some function which we really need to use which have assert, then we could split them into 2, one checking the lock and calling the other, and the other just doing the implementation without checking the lock.
libos/src/bookkeep/libos_vma.c line 123 at r8 (raw file):
* the VMA subsystem can be used in thread-safe manner. */ static inline void vma_rwlock_read_lock(void) {
I'd drop inline and let the compiler decide what's better.
Code quote:
static inlinelibos/src/bookkeep/libos_vma.c line 124 at r8 (raw file):
*/ static inline void vma_rwlock_read_lock(void) { if (!vma_tree_lock_created)
This should be assert(vma_tree_lock_created) after applying my suggestion from below.
libos/src/bookkeep/libos_vma.c line 1720 at r8 (raw file):
/* returns whether prot_refresh_vma() must be applied on a VMA */ static bool vma_update_valid_end(struct libos_vma* vma, void* _args) { assert(vma_rwlock_is_read_or_write_locked());
Hmm, I think this actually modifies VMAs, not only reads?
vma->valid_end = vma->begin + valid_length;It seems to be a bug in this PR, because dump_vmas_with_buf() only takes a read lock, not write? It takes a read-only lock, but passes non-const struct libos_vma* to the visitor function. It should either be read-only and pass const struct libos_vma*, or take a read-write lock and then pass struct libos_vma*.
6fc90e8 to
2925150
Compare
Signed-off-by: Kailun Qin <[email protected]>
2925150 to
f0f71be
Compare
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @dimakuv, @efu39, and @mkow)
libos/src/bookkeep/libos_vma.c line 106 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Please add
g_to the global variables refactored in this PR (at leastvma_tree_lockandvma_tree_lock_created).
Done.
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
But the code up to the point of
rwlock_create(&vma_tree_lock)doesn't need to take the lock? Could we just remove the lock calls from it? This way we wouldn't need this weird and rather dangerous semantics, that if the lock wasn't initialized then everything succeeds silently, without actually taking it.
In case there are some function which we really need to use which have assert, then we could split them into 2, one checking the lock and calling the other, and the other just doing the implementation without checking the lock.
I like the idea, but removing lock calls from some functions may not be that straightforward?
E.g., create_mem_mgr() (before rwlock_create(&vma_tree_lock) and thus doesn't need to take the lock):
gramine/common/include/memmgr.h
Lines 136 to 137 in 6bde814
would call into
bkeep_mmap_any_in_range() where the lock call inside is hard to untangle?gramine/libos/src/bookkeep/libos_vma.c
Lines 352 to 357 in 6bde814
gramine/libos/src/bookkeep/libos_vma.c
Line 1116 in 6bde814
libos/src/bookkeep/libos_vma.c line 123 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I'd drop
inlineand let the compiler decide what's better.
Done.
libos/src/bookkeep/libos_vma.c line 1720 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, I think this actually modifies VMAs, not only reads?
vma->valid_end = vma->begin + valid_length;It seems to be a bug in this PR, because
dump_vmas_with_buf()only takes a read lock, not write? It takes a read-only lock, but passes non-conststruct libos_vma*to the visitor function. It should either be read-only and passconst struct libos_vma*, or take a read-write lock and then passstruct libos_vma*.
Good catch! Updated.
kailun-qin
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @dimakuv, @efu39, and @mkow)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I like the idea, but removing lock calls from some functions may not be that straightforward?
E.g.,create_mem_mgr()(beforerwlock_create(&vma_tree_lock)and thus doesn't need to take the lock):gramine/common/include/memmgr.h
Lines 136 to 137 in 6bde814
would call intobkeep_mmap_any_in_range()where the lock call inside is hard to untangle?
gramine/libos/src/bookkeep/libos_vma.c
Lines 352 to 357 in 6bde814
gramine/libos/src/bookkeep/libos_vma.c
Line 1116 in 6bde814
Besides, could you pls remind me why there's a limitation that "creating this lock itself requires the memory subsystem (VMA) to be fully initialized"?
mkow
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r9, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @dimakuv, @efu39, and @kailun-qin)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Besides, could you pls remind me why there's a limitation that "creating this lock itself requires the memory subsystem (VMA) to be fully initialized"?
No idea, it's you who said it doesn't work if you move the creation to the earlier place? _PalEventCreate() uses calloc(), but I'm don't remember which allocator it uses, maybe this one and that's the problem?
donporter
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache, @dimakuv, @efu39, @kailun-qin, and @mkow)
libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Besides, could you pls remind me why there's a limitation that "creating this lock itself requires the memory subsystem (VMA) to be fully initialized"?
No idea, it's you who said it doesn't work if you move the creation to the earlier place?
_PalEventCreate()usescalloc(), but I'm don't remember which allocator it uses, maybe this one and that's the problem?
FWIW, I don't see this dependency either. What happens if you just try to create/initialize the lock here?
Description of the changes
Multi-threaded workloads with many syscalls stress the VMA subsystem of LibOS, because almost all syscalls verify their buffers for read/write access using the functions
is_user_memory_readable(),is_user_memory_writable(), etc. All these functions end up in VMA-specificis_in_adjacent_user_vmas()that grabs a global VMA lock. On some multi-threaded apps like MongoDB, this lock contention becomes the performance bottleneck.This commit tries to remove this bottleneck by switching from a spinlock to the Read-Write (RW) lock. The intuition is that most of the time, a read-only
is_in_adjacent_user_vmas()func is called, which now uses the read lock.Fixes #1794.
How to test this PR?
CI for functionality testing. Manual benchmarks for perf testing.
My quick tests on Memcached, Blender and iperf show no visible change in performance. This makes sense: these workloads use no more than 4 threads, so almost no contention.
WIP: I asked to run big workloads.UPDATE 1: @jkr0103 reported these results (thanks!):
check_invalid_pointers = false: 104,694 ops/seccheck_invalid_pointers = false: 258,343 QPS (latency 4.95ms)This change is