fixing a bug in card mark stealing #117968

Maoni0 · 2025-07-23T06:06:22Z

this fixes a problem with card mark stealing where we missed clamping the card clearing by the card stealing unit in card_transition. for this bug to appear the following conditions need to be met -

an object A straddles the 2mb card stealing unit and originally for that object a card below the 2mb boundary and a card that corresponds to at least 256 bytes above the 2mb boundary are set. and there are no reference fields inbetween.
one thread T0 is working on the 1st 2mb and discovers A and the first set card bit. this card doesn't need to be set, so poo is set the address that's described by the 2nd card since there're no reference fields inbetween. so card_transition is called which will call clear_cards on [1st card, (2nd card. and it stops at this line -
card_table [end_word] &= highbits (~0, bits);
where it sees end_card with the 2nd card still set, but before it writes it back to card_table[end_word]
meanwhile, another thread T1 needs to be working on the memory starting from this 2mb boundary. it discovers the 2nd card doesn't need to be set, and none of the cards that correspond to the card bundle bit needs to be set so it clears the cards and the card bundle bit.
now T0 writes back to card_table[end_word] with the 2nd card bit set.

it's not a problem when a card that shouldn't be set is set, given that its corresponding card bundle bit is also set. but it's definitely a problem if a card is set but its card bundle bit isn't, because next time when we have a cross gen reference, what's supposed to happen in the write barrier is either the card isn't already set and the WB will set the card and its corresponding card bundle bit, or the card is set and the WB wouldn't do anything. but now we have a situation where the card is set but the card bundle bit isn't, it just means the next GC that should be looking at this card wouldn't, if there were no other cards covered by that card bundle bit got newly set by the WB.

the cleanest fix is to make sure we don't step outside of the 2mb boundary when we call clear_cards in card_transition.

this issue was very hard to observe and debug - full credit goes to @ChrisAhna who also verified the fix.

Copilot

Pull Request Overview

This PR fixes a race condition bug in the garbage collector's card mark stealing mechanism where multiple threads could incorrectly manage card table state across 2MB boundaries. The fix prevents one thread from clearing cards beyond its assigned 2MB card stealing unit, which could lead to inconsistent state between card bits and card bundle bits.

Introduces proper boundary checking when clearing cards in card_transition
Ensures card clearing operations are clamped to the card stealing unit limit
Prevents race conditions that could leave cards set without corresponding card bundle bits

Comments suppressed due to low confidence (2)

dotnet-policy-service · 2025-07-23T13:55:51Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Maoni0 · 2025-07-24T06:17:23Z

I also did some stress runs and didn't find any problems.

Maoni0 · 2025-07-24T06:19:08Z

/ba-g Known issue dotnet/dnceng#6004

Maoni0 · 2025-07-24T06:21:10Z

/backport to release/8.0-staging

github-actions · 2025-07-24T06:21:18Z

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/16489375887

fixing a bug in card mark stealing

b5a47e9

Copilot AI review requested due to automatic review settings July 23, 2025 06:06

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 23, 2025

Copilot AI reviewed Jul 23, 2025

View reviewed changes

dotnet-policy-service bot assigned Maoni0 Jul 23, 2025

jkotas added area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 23, 2025

mangod9 approved these changes Jul 23, 2025

View reviewed changes

Maoni0 enabled auto-merge (squash) July 23, 2025 21:41

Maoni0 merged commit 587f703 into dotnet:main Jul 24, 2025
97 of 100 checks passed

github-actions bot mentioned this pull request Jul 24, 2025

[release/8.0-staging] fixing a bug in card mark stealing #118009

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixing a bug in card mark stealing #117968

fixing a bug in card mark stealing #117968

Uh oh!

Maoni0 commented Jul 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

dotnet-policy-service bot commented Jul 23, 2025

Uh oh!

Maoni0 commented Jul 24, 2025

Uh oh!

Maoni0 commented Jul 24, 2025

Uh oh!

Uh oh!

Maoni0 commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

Uh oh!

fixing a bug in card mark stealing #117968

fixing a bug in card mark stealing #117968

Uh oh!

Conversation

Maoni0 commented Jul 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

dotnet-policy-service bot commented Jul 23, 2025

Uh oh!

Maoni0 commented Jul 24, 2025

Uh oh!

Maoni0 commented Jul 24, 2025

Uh oh!

Uh oh!

Maoni0 commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

Uh oh!