Skip to content

gh-129824: Fix some data races with types in subinterpreters #129829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Feb 7, 2025

The test_running and test_is_running test cases are skipped for now because they have file descriptor races.

Also add test_interpreters to the list of TSAN tests. The test_running
and test_is_running test cases are skipped for now as they have file
descriptor races.
@colesbury colesbury changed the title gh-129824: Fix data races in type_ready with subinterpreters gh-129824: Fix some data races with types in subinterpreters Feb 7, 2025
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing two other races with TSan:

  • NEXT_GLOBAL_VERSION_TAG should use an atomic store/load, not rely on the per-interpreter type lock.
  • Same thing for the tp_versions_used field in set_version_unlocked, because that will be shared for static types.

type->tp_flags |= Py_TPFLAGS_DISALLOW_INSTANTIATION;
}
else {
assert(_PyType_HasFeature(type, Py_TPFLAGS_DISALLOW_INSTANTIATION));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that these assertions might be racy. If two interpreters are trying to initialize the type at the same time, TSan might start complaining, because the store is non-atomic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no race because tp_flags is only set in the initial interpreter (when initial=1), and all static types are initialized at interpreter startup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not understand the type lifecycle correctly. I'm imagining a case where:

  • Two subinterpreters attempt to import a module that has not been initialized by any interpreter, and that module contains static types.
  • Interpreter A will get initial atomically, and then interpreter B falls back to the assertion.
  • If interpreter A sets tp_flags at the same time as interpreter B calls PyType_HasFeature, wouldn't that race?

Is there some extra layer of synchronization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that modules are always imported in the main interpreter first, but @ericsnowcurrently would know better:

cpython/Python/import.c

Lines 1973 to 1979 in 8d9d3e4

* To avoid all of that, we make sure the module's init function
* is always run first with the main interpreter active. If it was
* already the main interpreter then we can continue loading the
* module like normal. Otherwise, right after the init function,
* we take care of some import state bookkeeping, switch back
* to the subinterpreter, check for single-phase init,
* and then continue loading like normal. */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think that just means it will have a thread state under the main interpreter, which should synchronize it on a GILicious build, but there would be no synchronization under FT. It's probably worth investigating that once we get the rest of the type races fixed. I'll yield to you on whether to keep the assertions in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more complicated than I originally thought. I'm going to close this for now.

@colesbury
Copy link
Contributor Author

@ZeroIntensity - there are a number of additional races not fixed by this PR (see the linked issue). I think the two you mentioned are already listed there, but if you find additional ones, please edit the issue and add them.

@colesbury colesbury closed this Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants