Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed data race in all_type_info in free-threading mode #5419

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

vfdev-5
Copy link
Contributor

@vfdev-5 vfdev-5 commented Oct 24, 2024

Description

  • fixed data race in all_type_info in free-threading mode
  • added test

For example, we have 2 threads entering all_type_info. Both enter all_type_info_get_cache`` function and there is a first one which inserts a tuple (type, empty_vector) to the map and second is waiting. Inserting thread gets the (iter_to_key, True) and non-inserting thread after waiting gets (iter_to_key, False). Inserting thread than will add a weakref and will then call into all_type_info_populate. However, non-inserting thread is not entering if (ins.second) {clause and returnsins.first->second;`` which is just empty_vector. Finally, non-inserting thread is failing the check in allocate_layout:

if (n_types == 0) {
    pybind11_fail(
        "instance allocation failed: new instance has no pybind11-registered base types");
}

On master running this test gives:

terminate called after throwing an instance of 'std::runtime_error'
  what():  instance allocation failed: new instance has no pybind11-registered base types
Fatal Python error: Aborted

Suggested changelog entry:

cc @colesbury

@vfdev-5 vfdev-5 force-pushed the fix-all_type_info_populate-free-threading branch 2 times, most recently from d98dd4b to 8beaa20 Compare October 24, 2024 13:31
@vfdev-5 vfdev-5 changed the title Fix data race all_type_info_populate in free-threading mode Fixed data race in all_type_info in free-threading mode Oct 24, 2024
@vfdev-5 vfdev-5 force-pushed the fix-all_type_info_populate-free-threading branch from 7d1a270 to 1a07f3c Compare October 24, 2024 13:38
Description:
- fixed data race all_type_info_populate in free-threading mode
- added test

For example, we have 2 threads entering `all_type_info`.
Both enter `all_type_info_get_cache`` function and
there is a first one which inserts a tuple (type, empty_vector) to the map
and second is waiting. Inserting thread gets the (iter_to_key, True) and non-inserting thread
after waiting gets (iter_to_key, False).
Inserting thread than will add a weakref and will then call into `all_type_info_populate`.
However, non-inserting thread is not entering `if (ins.second) {` clause and
returns `ins.first->second;`` which is just empty_vector.
Finally, non-inserting thread is failing the check in `allocate_layout`:
```c++
if (n_types == 0) {
    pybind11_fail(
        "instance allocation failed: new instance has no pybind11-registered base types");
}
```
@vfdev-5 vfdev-5 force-pushed the fix-all_type_info_populate-free-threading branch from ef3fb1c to 6ab21db Compare October 24, 2024 13:42
@vfdev-5 vfdev-5 force-pushed the fix-all_type_info_populate-free-threading branch from cdc1663 to adbbbba Compare October 24, 2024 14:03
@colesbury
Copy link
Contributor

I think we may need a further rework of all_type_info(). It's used by get_type_info() and some of the callers to that function modify the returned detail::type_info*.

I'm not entirely sure what the locking strategy should be here. It's possible that all the callers to get_type_info() and all_type_info() should hold the internals lock.

@colesbury
Copy link
Contributor

On further thought, I think this approach makes sense. I don't think we need to rework the locking strategy and can address the other data races on type_info fields separately.

I'll write up a summary of the issues.

@vfdev-5
Copy link
Contributor Author

vfdev-5 commented Nov 4, 2024

@rwgk can you please review this PR?

@rwgk
Copy link
Collaborator

rwgk commented Nov 5, 2024

Will try tomorrow.

@cryos could you please also take a look?

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury could you please formally approve?

Looks good to me, but I'm not a free-threading expert. I'll merge when I see approvals from @colesbury and @cryos.

include/pybind11/detail/type_caster_base.h Outdated Show resolved Hide resolved
include/pybind11/pybind11.h Outdated Show resolved Hide resolved
tests/pybind11_tests.h Show resolved Hide resolved
tests/test_class.py Outdated Show resolved Hide resolved
tests/test_class.py Outdated Show resolved Hide resolved
@vfdev-5 vfdev-5 force-pushed the fix-all_type_info_populate-free-threading branch from 4336d12 to 22723d0 Compare November 6, 2024 13:13
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but waiting for approvals from @colesbury and @cryos.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this LGTM.

I'll try to put up a follow up PR later this week for the remaining issues described in #5421.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants