-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
gh-129824: Fix some data races with types in subinterpreters #129829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also add test_interpreters to the list of TSAN tests. The test_running and test_is_running test cases are skipped for now as they have file descriptor races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing two other races with TSan:
NEXT_GLOBAL_VERSION_TAG
should use an atomic store/load, not rely on the per-interpreter type lock.- Same thing for the
tp_versions_used
field inset_version_unlocked
, because that will be shared for static types.
type->tp_flags |= Py_TPFLAGS_DISALLOW_INSTANTIATION; | ||
} | ||
else { | ||
assert(_PyType_HasFeature(type, Py_TPFLAGS_DISALLOW_INSTANTIATION)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that these assertions might be racy. If two interpreters are trying to initialize the type at the same time, TSan might start complaining, because the store is non-atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no race because tp_flags
is only set in the initial interpreter (when initial=1
), and all static types are initialized at interpreter startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not understand the type lifecycle correctly. I'm imagining a case where:
- Two subinterpreters attempt to import a module that has not been initialized by any interpreter, and that module contains static types.
- Interpreter A will get
initial
atomically, and then interpreter B falls back to the assertion. - If interpreter A sets
tp_flags
at the same time as interpreter B callsPyType_HasFeature
, wouldn't that race?
Is there some extra layer of synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that modules are always imported in the main interpreter first, but @ericsnowcurrently would know better:
Lines 1973 to 1979 in 8d9d3e4
* To avoid all of that, we make sure the module's init function | |
* is always run first with the main interpreter active. If it was | |
* already the main interpreter then we can continue loading the | |
* module like normal. Otherwise, right after the init function, | |
* we take care of some import state bookkeeping, switch back | |
* to the subinterpreter, check for single-phase init, | |
* and then continue loading like normal. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think that just means it will have a thread state under the main interpreter, which should synchronize it on a GILicious build, but there would be no synchronization under FT. It's probably worth investigating that once we get the rest of the type races fixed. I'll yield to you on whether to keep the assertions in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is more complicated than I originally thought. I'm going to close this for now.
@ZeroIntensity - there are a number of additional races not fixed by this PR (see the linked issue). I think the two you mentioned are already listed there, but if you find additional ones, please edit the issue and add them. |
The
test_running
andtest_is_running
test cases are skipped for now because they have file descriptor races.