Skip to content

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Dec 22, 2025

Description

This is a proof-of-concept for discussion, not a final implementation.

The Problem

In traditional Python, gil_scoped_acquire means:

  • "I have exclusive access to Python and C++ shared state"

In free-threaded Python (Py_GIL_DISABLED), gil_scoped_acquire means:

  • "I can call Python APIs" — but there's no mutual exclusion.

Code written assuming the first meaning is silently broken under free-threading. This includes pybind11 internals and user code.

With pybind11 as-is, free-threading turns every unaudited gil_scoped_acquire into a potential data race. Given pybind11's scale of adoption, it's not a question of if these races will bite users, but how many and how badly.

I'd much rather hand users a speed limit they can see and choose to remove, than landmines they discover in production.

The Proposal: Better Safe Than Sorry

  • Phase 1 (this POC): Add compat mutex — existing code works safely
  • Phase 2 (future): Introduce scoped_ensure_thread_state — just thread state, no locking
  • Phase 3 (future): Audit and migrate callsites one-by-one

This POC adds a global compatibility mutex that restores the mutual exclusion guarantee for free-threaded builds:

class gil_scoped_acquire {
public:
    gil_scoped_acquire() {
        state_ = PyGILState_Ensure();
#ifdef Py_GIL_DISABLED
        detail::get_compat_mutex().lock();  // Restore mutual exclusion
#endif
    }
    // ...
};

This makes existing code safe by default under free-threading, at the cost of serializing gil_scoped_acquire sections.

Trade-offs

Aspect Pros Cons
Safety ✅ No silent data races
Performance ⚠️ Serializes free-threaded code
Migration ✅ Incremental, no big bang
Complexity ✅ Simple implementation

The performance cost is the price of safety, but users who need maximum free-threading performance can migrate to the new helper.

Known Limitation of this POC

The global compat mutex conflicts with per-interpreter GILs (Py_MOD_PER_INTERPRETER_GIL_SUPPORTED), therefore the Per-Subinterpreter GIL test is skipped under free-threading.

Future work could use per-interpreter mutexes instead of a global one.

Questions for Discussion

  1. Is safety-first the right default?
  2. Should we prioritize per-interpreter mutexes to support per-interpreter GILs?
  3. What should the new "just thread state, no locking" helper be called?
    • scoped_ensure_thread_state?
    • Something else?

In free-threaded Python (Py_GIL_DISABLED), the GIL no longer provides
mutual exclusion. Existing code that assumes gil_scoped_acquire provides
mutual exclusion would have data races.

This adds a global compatibility mutex that restores the safety guarantee:
- Acquired when gil_scoped_acquire is constructed (if not already held)
- Released when gil_scoped_release is constructed (if held)
- Ownership is tracked per-thread to handle the main thread case

KNOWN LIMITATION: The global mutex conflicts with per-interpreter GILs
(Py_MOD_PER_INTERPRETER_GIL_SUPPORTED). The "Per-Subinterpreter GIL" test
deadlocks with this change. Future work could use per-interpreter mutexes.

This is a proof-of-concept for discussion.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 22, 2025

@oremanj @b-pass @XuehaiPan

What do you guys think about the general idea?

@oremanj
Copy link
Collaborator

oremanj commented Dec 22, 2025

I would suggest thread_scoped_attach or scoped_thread_attach as the name for the new version (and detach for its converse), since that accords with the modern terminology of "attached thread state" and "detached thread state" for what were previously called "GIL acquired" and "GIL released".

Even on free-threaded Python, attaching a thread state can block in order to support stop-the-world garbage collection. That makes acquiring the compat mutex after attaching the thread state deadlock-prone: imagine that thread A has an attached thread state and is blocking to try to acquire the compat mutex, while thread B holds the compat mutex and is waiting for everyone else to detach their thread states so it can perform a GC run. Changing the order so we acquire the compat mutex before attaching the thread state would improve the situation, but I'm not convinced it will totally eliminate the deadlock possibility. We could also consider using _PyEval_StopTheWorld directly, which should eliminate the possibility of deadlocks since it's the same interface used by GC. It has the advantage of being natively subinterpreter-specific, and the disadvantage of being a private API.

It is also worth considering whether we want to enforce mutual exclusion against all Python code (in which case we kind of have to reach for the stop-the-world API and its associated significant performance penalty) or only against other uses of py::gil_scoped_acquire (in which case the compat-mutex approach is feasible).

@b-pass
Copy link
Collaborator

b-pass commented Dec 22, 2025

I like this. While pybind11 modules can enable/require the GIL to exist (by not specifying py::gil_not_used() in their module definition), doing so requires people to implement their own locking in addition to gil_scoped_acquire in order to support mutual exclusion but also be GIL-agnostic. And having two locks like that is deadlock-prone. So this would make it easier to do correctly.

But I think it needs an opt-out mechanism ... some way to say "this code needs the GIL but does not need mutual exclusion". Currently (without this change) that is what gil_scoped_acquire does.

Options for that approach:

  • Make gil_scoped_acquire do both things through a flag ... maybe something like py::gil_scoped_acquire gil(py::gil_not_required())
  • Keep gil_scoped_acquire GIL-focused, and make this new behavior a separate thing (name TBD).
  • Make gil_scoped_acquire do both, and make a new thing that is GIL-only (name TBD).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants