Draft
Conversation
This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that leverages the kernel's rhashtable to provide resizable hash map for BPF. The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at map creation time. While this works well for many use cases, it presents challenges when: 1. The number of elements is unknown at creation time 2. The element count varies significantly during runtime 3. Memory efficiency is important (over-provisioning wastes memory, under-provisioning hurts performance) BPF_MAP_TYPE_RHASH addresses these issues by using rhashtable, which automatically grows and shrinks based on load factor. The implementation wraps the kernel's rhashtable with BPF map operations: - Uses bpf_mem_alloc for RCU-safe memory management - Supports all standard map operations (lookup, update, delete, get_next_key) - Supports batch operations (lookup_batch, lookup_and_delete_batch) - Supports BPF iterators for traversal - Supports BPF_F_LOCK for spin locks in values - Requires BPF_F_NO_PREALLOC flag (elements allocated on demand) - max_entries serves as a hard limit, not bucket count The series includes comprehensive tests: - Basic operations in test_maps (lookup, update, delete, get_next_key) - BPF program tests for lookup/update/delete semantics - BPF_F_LOCK tests with concurrent access - Stress tests for get_next_key during concurrent resize operations - Seq file tests Signed-off-by: Mykyta Yatsenko <[email protected]> --- Current implementation of the BPF_MAP_TYPE_RHASH does not provide the same strong guarantees on the values consistency under concurrent reads/writes as BPF_MAP_TYPE_HASH. BPF_MAP_TYPE_HASH allocates a new element and atomically swaps the pointer, so RCU readers always see a complete value. BPF_MAP_TYPE_RHASH does memcpy in place with no lock held. rhash trades consistency for speed (5x improvement in update benchmark): concurrent readers can observe partially updated data. Two concurrent writers to the same key can also interleave, producing mixed values. As a solution, user may use BPF_F_LOCK to guarantee consistent reads and write serialization. Summary of the read consistency guarantees: map type | write mechanism | read consistency -------------+------------------+-------------------------- htab | alloc, swap ptr | always consistent (RCU) htab F_LOCK | in-place + lock | consistent if reader locks -------------+------------------+-------------------------- rhtab | in-place memcpy | torn reads rhtab F_LOCK | in-place + lock | consistent if reader locks Changes in v2: - Added benchmarks - Link to v1: https://lore.kernel.org/r/[email protected] --- b4-submit-tracking --- { "series": { "revision": 2, "change-id": "20251103-rhash-7b70069923d8", "prefixes": [ "RFC bpf-next" ], "history": { "v1": [ "[email protected]" ] } } }
Add resizable hash map into enums where it is needed. Signed-off-by: Mykyta Yatsenko <[email protected]>
Introduce basic operations for BPF_MAP_TYPE_RHASH, a new hash map type built on top of the kernel's rhashtable. Key implementation details: - Uses rhashtable for automatic resizing with RCU-safe operations - Elements allocated via bpf_mem_alloc for lock-free allocation - Supports BPF_F_LOCK for spin_lock protected values - Requires BPF_F_NO_PREALLOC Implemented map operations: * map_alloc/map_free: Initialize and destroy the rhashtable * map_lookup_elem: RCU-protected lookup via rhashtable_lookup * map_update_elem: Insert or update with BPF_NOEXIST/EXIST/ANY * map_delete_elem: Remove element with RCU-deferred freeing * map_get_next_key: Returns the next key in the table * map_release_uref: Free internal structs (timers, workqueues) Other operations (batch, seq file) are implemented in the next patch Signed-off-by: Mykyta Yatsenko <[email protected]>
Add batch operations and BPF iterator support for BPF_MAP_TYPE_RHASH. Batch operations: * rhtab_map_lookup_batch: Bulk lookup of elements by bucket * rhtab_map_lookup_and_delete_batch: Atomic bulk lookup and delete The batch implementation iterates through buckets under RCU protection, copying keys and values to userspace buffers. When the buffer fills mid-bucket, it rolls back to the bucket boundary so the next call can retry that bucket completely. BPF iterator: * Uses rhashtable_walk_* API for safe iteration * Handles -EAGAIN during table resize transparently * Tracks skip_elems to resume iteration across read() calls Also implements rhtab_map_mem_usage() to report memory consumption. Signed-off-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Mykyta Yatsenko <[email protected]>
Test basic map operations (lookup, update, delete) for BPF_MAP_TYPE_RHASH including boundary conditions like duplicate key insertion and deletion of nonexistent keys. Signed-off-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Mykyta Yatsenko <[email protected]>
Add tests validating resizable hash map handles BPF_F_LOCK flag as expected. Signed-off-by: Mykyta Yatsenko <[email protected]>
Test get_next_key behavior under concurrent modification: * Resize test: verify all elements visited after resize trigger * Stress test: concurrent iterators and modifiers to detect races Signed-off-by: Mykyta Yatsenko <[email protected]>
Test BPF iterator functionality for BPF_MAP_TYPE_RHASH: * Basic iteration verifying all elements are visited * Overflow test triggering seq_file restart, validating correct resume behavior via skip_elems tracking Signed-off-by: Mykyta Yatsenko <[email protected]>
Make bpftool documentation aware of the resizable hash map. Signed-off-by: Mykyta Yatsenko <[email protected]>
Support resizable hashmap in BPF map benchmarks. Results: $ sudo ./bench -w3 -d10 -a bpf-rhashmap-full-update 0:hash_map_full_perf 21641414 events per sec $ sudo ./bench -w3 -d10 -a bpf-hashmap-full-update 0:hash_map_full_perf 4392758 events per sec $ sudo ./bench -w3 -d10 -a -p8 htab-mem --use-case overwrite --value-size 8 Iter 0 (302.834us): per-prod-op 62.85k/s, memory usage 2.70MiB Iter 1 (-44.810us): per-prod-op 62.81k/s, memory usage 2.70MiB Iter 2 (-45.821us): per-prod-op 62.81k/s, memory usage 2.70MiB Iter 3 (-63.658us): per-prod-op 62.92k/s, memory usage 2.70MiB Iter 4 ( 32.887us): per-prod-op 62.85k/s, memory usage 2.70MiB Iter 5 (-76.948us): per-prod-op 62.75k/s, memory usage 2.70MiB Iter 6 (157.235us): per-prod-op 63.01k/s, memory usage 2.70MiB Iter 7 (-118.761us): per-prod-op 62.85k/s, memory usage 2.70MiB Iter 8 (127.139us): per-prod-op 62.92k/s, memory usage 2.70MiB Iter 9 (-169.908us): per-prod-op 62.99k/s, memory usage 2.70MiB Iter 10 (101.962us): per-prod-op 62.97k/s, memory usage 2.70MiB Iter 11 (-64.330us): per-prod-op 63.05k/s, memory usage 2.70MiB Iter 12 (-20.543us): per-prod-op 62.86k/s, memory usage 2.70MiB Iter 13 ( 55.382us): per-prod-op 62.95k/s, memory usage 2.70MiB Summary: per-prod-op 62.92 ± 0.09k/s, memory usage 2.70 ± 0.00MiB, peak memory usage 2.96MiB $ sudo ./bench -w3 -d10 -a -p8 rhtab-mem --use-case overwrite --value-size 8 Iter 0 (316.805us): per-prod-op 96.40k/s, memory usage 2.71MiB Iter 1 (-35.225us): per-prod-op 96.54k/s, memory usage 2.71MiB Iter 2 (-12.431us): per-prod-op 96.54k/s, memory usage 2.71MiB Iter 3 (-56.537us): per-prod-op 96.58k/s, memory usage 2.71MiB Iter 4 ( 27.108us): per-prod-op 96.62k/s, memory usage 2.71MiB Iter 5 (-52.491us): per-prod-op 96.57k/s, memory usage 2.71MiB Iter 6 ( -2.777us): per-prod-op 96.52k/s, memory usage 2.71MiB Iter 7 (108.963us): per-prod-op 96.45k/s, memory usage 2.71MiB Iter 8 (-61.575us): per-prod-op 96.48k/s, memory usage 2.71MiB Iter 9 (-21.595us): per-prod-op 96.14k/s, memory usage 2.71MiB Iter 10 ( 3.243us): per-prod-op 96.36k/s, memory usage 2.71MiB Iter 11 ( 3.102us): per-prod-op 94.70k/s, memory usage 2.71MiB Iter 12 (109.102us): per-prod-op 95.77k/s, memory usage 2.71MiB Iter 13 ( 16.153us): per-prod-op 95.91k/s, memory usage 2.71MiB Summary: per-prod-op 96.19 ± 0.57k/s, memory usage 2.71 ± 0.00MiB, peak memory usage 2.71MiB sudo ./bench -w3 -d10 -a bpf-hashmap-lookup --key_size 4\ --max_entries 1000 --nr_entries 500 --nr_loops 1000000 cpu00: lookup 28.603M ± 0.536M events/sec (approximated from 32 samples of ~34ms) sudo ./bench -w3 -d10 -a bpf-rhashmap-lookup --key_size 4\ --max_entries 1000 --nr_entries 500 --nr_loops 1000000 cpu00: lookup 27.340M ± 0.864M events/sec (approximated from 32 samples of ~36ms) Signed-off-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Mykyta Yatsenko <[email protected]>
6f71402 to
3aabcc8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.