Integrate LFC resizing logic with hashmap implementation #12736

quantumish · 2025-07-25T03:44:07Z

No description provided.

…e/neon into communicator-rewrite

…-shmem

erikgrinaker · 2025-07-25T09:15:41Z

pgxn/neon/communicator/src/file_cache.rs

+	// we *shouldn't* add them to the free list (since that's used to write new entries)
+	// but we still should remember them so that we can add them to the free list when
+	// a growth later occurs. this still has the same scalability flaws as the freelist 
+	hole_list: Mutex<Vec<CacheBlock>>,


For later: the filesystem already tracks the holes (SEEK_HOLE), can we get away with relying on that? Okay to leave that as a TODO.

Oh, good catch! Yeah, I can try and modify it to rely on that. I was mostly going off the original strategy in file_cache.c that did something similar by tracking holes in the hashmap.

One minor caveat is that SEEK_HOLE isn't guaranteed to actually find the next actual hole in the filesystem:

However, a filesystem is not obliged to report holes [...] In the simplest implementation, a filesystem can support the operations by making SEEK_HOLE always return the offset of the end of the file

It is supported by most popular filesystems though so it's probably safe to use regardless? I'll document it in the code but there may need to be some conditional compilation at some point.

We currently only care about ext4, but it's worth a note and test case.

It's also possible that this is significantly slower, idk. It's not a big deal, just figured it might be nice to avoid the manual tracking.

Yep, was just gonna comment about performance. Not entirely sure if it's worse though, will experiment!

It also seems like this could be accomplished in ~~only one syscall~~ less syscalls with FIEMAP? One issue with the lseek approach is if there's many holes it could lead to some nontrivial overhead - and we almost certainly will have far too many holes as our current way of shrinking is to punch single-block holes randomly spread across the LFC file.

erikgrinaker · 2025-07-25T09:18:39Z

pgxn/neon/communicator/src/file_cache.rs

+	// TODO(quantumish): possibly implement some batching? lots of syscalls...
+	// unfortunately should be at odds with our access pattern as entries in the hashmap
+	// should have no correlation with the location of blocks in the actual LFC file.


The existing LFC does the same, but it may be getting away with it because of the 128-page chunking which amortizes the cost. It's probably okay as long as we do this async after the actual hashmap has shrunk and allow concurrent access.

erikgrinaker · 2025-07-25T09:56:20Z

pgxn/neon/communicator/src/integrated_cache.rs

+				// Try again at evicting entries in to-be-shrunk region, except don't give up this time.
+				// Not a great solution all around: unnecessary scanning, spinning, and code duplication.
+				// Not sure what a good alternative is though, as there may be enough of these entries that
+				// we can't store a Vec of them and we ultimately can't proceed until they're evicted.
+				// Maybe a notification system for unpinning somehow? This also makes me think that pinning
+				// of entries should be a first class concept within the hashmap implementation...


How long do we expect entries to remain pinned for? To avoid starvation under heavy reads, can we mark the entry as soft-removed such that new accesses incur a miss and will allocate a new bucket? Can we do a best-effort 1ms sleep for each pinned entry as we find them under the assumption that the reader will be done soon?

I have to experiment a bit more to get a good grasp on how long entries remain pinned for (and whether re-pinning is the main issue or just long pin times), but in theory it depends on the workload. I wrote this code with pessimistic assumptions.

Soft-removal sounds like a good strategy, though / this sleep can definitely be optimized. Will update as I get more information!

erikgrinaker · 2025-07-25T10:09:25Z

pgxn/neon/communicator/src/integrated_cache.rs

+				{
+					let mut clock_hand = self.clock_hand.lock().unwrap();
+
+					block_map.begin_shrink(num_blocks);


Just to make sure I understand the overall behavior here:

Update alloc_limit. No new bucket allocations will be made beyond the new size.

Cheap O(1) under write lock.

Dictionary remains unchanged, as does the hashing.

Existing dictionary entries may still point beyond alloc_limit.

New dictionary entries will allocate buckets below alloc_limit.

Evict buckets beyond alloc_limit.

O(removed_blocks) operations, each O(1) under write lock.

Existing buckets beyond alloc_limit can still see both reads and writes.

After a bucket has been evicted, new writes will allocate a bucket below alloc_limit.

Resize dictionary.

O(remaining_blocks) operations under write lock.

Rehashes dictionary, keeps existing buckets.

Yep, this is mostly accurate, although this is excluding the retry part / LFC file behavior. Step 1 also has the additional property of changing the clock hand and the "logical" number of buckets (so that the eviction algorithm will treat the hashmap as if it was already shrunk, since we're gonna manually evict things in the to-be-shrunk region anyway). Step 3 has technically O(capacity) operations if you include the rehash (once incremental rehashing is merged it'll just be O(capacity) though and technically that's just for checking if invariants are maintained so it could be removed from release builds).

…umish/lfc-resize-static-shmem

github-actions · 2025-08-03T23:54:48Z

If this PR added a GUC in the Postgres fork or neon extension,
please regenerate the Postgres settings in the cloud repo:

make NEON_WORKDIR=path/to/neon/checkout \
  -C goapp/internal/shareddomain/postgres generate

If you're an external contributor, a Neon employee will assist in
making sure this step is done.

github-actions · 2025-08-04T01:24:11Z

No tests were run or test report is not available

Test coverage report is not available

_{The comment gets automatically updated with the latest test results
5f5d512 at 2025-08-04T01:24:10.697Z :recycle:}

quantumish added 6 commits July 21, 2025 10:44

Merge changes from quantumish/add-resizable-hashmap

028164d

Connect LFC resize logic to hashmap shrink API

fb510de

Merge branch 'communicator-rewrite' of https://github.com/neondatabas…

d3af04f

…e/neon into communicator-rewrite

Make references all 'static

6f3361b

Add LFC resizing implementation and utilities for hole punching

7f63cec

Merge branch 'communicator-rewrite' into quantumish/lfc-resize-static…

991fb50

…-shmem

erikgrinaker reviewed Jul 25, 2025

View reviewed changes

quantumish added 4 commits July 25, 2025 13:19

Remove manual LFC hole tracking and replace with OS-based solutions

39596ff

Make LFC resizing routine aware of invalid cache block entries

cb2d481

Merge remote-tracking branch 'origin/communicator-rewrite' into quant…

135316f

…umish/lfc-resize-static-shmem

Add capability to reclaim free blocks in LFC

5f5d512

Integrate LFC resizing logic with hashmap implementation #12736

Are you sure you want to change the base?

Integrate LFC resizing logic with hashmap implementation #12736

Uh oh!

Conversation

quantumish commented Jul 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quantumish Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 3, 2025

Uh oh!

github-actions bot commented Aug 4, 2025

No tests were run or test report is not available

Test coverage report is not available

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quantumish Jul 25, 2025 •

edited

Loading