Skip to content

Conversation

@RogerSanders
Copy link
Contributor

As discussed on slack, this is a workaround for a very inefficient implementation of std::atomic_flag wait and notify functions on Windows under libc++ currently. Issue is known and ticketed in libc++ here:
llvm/llvm-project#127221

Copy link
Contributor

@jcm93 jcm93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we'll end up re-lowering the macOS deployment target with this workaround in place.

Even if we don't lower the target though, we can still account for macOS here and allow compiling these cores with a deployment target below 11.0.

Comment on lines +4 to +6
#if !defined(USE_ATOMIC_FLAG_NOTIFY_FALLBACK) && !defined(_MSC_VER) && defined(_WIN32)
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if !defined(USE_ATOMIC_FLAG_NOTIFY_FALLBACK) && !defined(_MSC_VER) && defined(_WIN32)
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
#endif
#if !defined(USE_ATOMIC_FLAG_NOTIFY_FALLBACK)
// Work around a slow implementation of atomic flag waits with libc++ on Windows; see https://github.com/llvm/llvm-project/issues/127221
#if !defined(_MSC_VER) && defined(_WIN32)
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
// Atomic flag waits are also unavailable below macOS 11.0
#elif defined(__APPLE__) && __MAC_OS_X_VERSION_MIN_REQUIRED < 110000
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
#endif
#endif

Comment on lines +4 to +6
#if !defined(USE_ATOMIC_FLAG_NOTIFY_FALLBACK) && !defined(_MSC_VER) && defined(_WIN32)
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if !defined(USE_ATOMIC_FLAG_NOTIFY_FALLBACK) && !defined(_MSC_VER) && defined(_WIN32)
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
#endif
#if !defined(USE_ATOMIC_FLAG_NOTIFY_FALLBACK)
// Work around a slow implementation of atomic flag waits with libc++ on Windows; see https://github.com/llvm/llvm-project/issues/127221
#if !defined(_MSC_VER) && defined(_WIN32)
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
// Atomic flag waits are also unavailable below macOS 11.0
#elif defined(__APPLE__) && __MAC_OS_X_VERSION_MIN_REQUIRED < 110000
#define USE_ATOMIC_FLAG_NOTIFY_FALLBACK
#endif
#endif

Copy link
Contributor Author

@RogerSanders RogerSanders Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with going this route, is I'd like to strip out the entire alternate implementation if/when we cut over to the Microsoft STL, which is being discussed (or if/when the libc++ ticket is resolved). If we tie it to MacOS platform support on older versions, support for which has already been dropped, it's not clear to me when the alternate implementation would be cut. Ares could benefit greatly in other areas from better use of modern threading approaches, particularly as we move away from the nall threading implementation, and I feel like having this slower, more error prone threading method in there muddies the waters. To properly support older MacOS versions, we'd have to carry this same dual-system threading model into other parts of the software, or just use the worse version, both of which seem like significant costs to me.

Copy link
Contributor Author

@RogerSanders RogerSanders Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add here though, it's possible to set the "USE_ATOMIC_FLAG_NOTIFY_FALLBACK" define externally, IE, through a CMake rule or CXXFLAGS. If you do want to do an unsupported build targeting an older MacOS release. This would allow you do that without having to modify the source, at least under this changeset today. Future changes elsewhere in the code would probably cause more problems over time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the platform support question as a totally separate concern, myself. I'm not proposing re-lowering the macOS deployment target. I just think if we're adding a workaround like this, we may as well provide complete information in the code about what that workaround enables. I do not think that the mere presence of this definition impacts our goals or criteria for whether something should be removed or an OS dropped, for whatever reason.

One other thing: I am not sure I personally see a time horizon where ares drops support for libc++ on Windows. A large plurality of homebrew and emulator developers target libc++ via MSYS2/MinGW on Windows, and from my experience I do not see that situation changing in any near-term timeframe. Even if LLVM ships an update that improves the underlying implementation of atomic flag waits, the code being introduced here will still be fairly long-lived in ares, I would expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @LukeUsher on what he thinks is the right way to go here. We could adopt the preprocessor change you propose here, I'm not strictly opposed to it. Shifting the platform/compiler detection logic centrally to CMakeLists might be more appropriate if it's going to be a pattern that's replicated or kept for the longer term though, rather than just a temporary workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have half a mind to just fix it in llvm-project myself and be done with it, but I can't spare the time for a few more weeks.

@rasky
Copy link
Collaborator

rasky commented Oct 14, 2025

In general, we should avoid polluting the core files too much with anything which is not pure core emulation of the system. The way Ares is designed, cores only contain the bare minimum required to emulate the system and is indeed a simple hardware description.

The addition of prefetch has already pushed this too far in my opinion, but this PR is doubling down in that direction. I think you should encapsulate the notification primitives in a new API in nall and use just that in the core. The ifdefs to workaround stdc++ really don’t believe to the core but they are something that should be kept contained in nall.

@RogerSanders
Copy link
Contributor Author

RogerSanders commented Oct 14, 2025

In general, we should avoid polluting the core files too much with anything which is not pure core emulation of the system. The way Ares is designed, cores only contain the bare minimum required to emulate the system and is indeed a simple hardware description.

The addition of prefetch has already pushed this too far in my opinion, but this PR is doubling down in that direction. I think you should encapsulate the notification primitives in a new API in nall and use just that in the core. The ifdefs to workaround stdc++ really don’t believe to the core but they are something that should be kept contained in nall.

The problem there is, you're essentially talking about wrapping basic std::atomic_flag functionality in a nall helper, which is the opposite of the direction that has been set lately, where the goal has been to use stl types more directly. The actual prefetch synchronization was clean and simple in the original implementation, and easily understandable by anyone familiar with C++ STL threading/synchronization primitives. This change undoubtedly complicates matters, but it's only done because the STL implementation we're using has a deficiency. I wouldn't want these two distinct threading implementations to live very long at all, I think either libc++ STL should be fixed, or we switch over to the Microsoft STL, which works as intended. In either case, no wrapper or dual implementation would be required if this occurred.

The puristic/idealistic thing to do here would be to simply leave it broken, say the code is correct as written, and say it'll work when the STL works as it's supposed to. This change was done as a pragmatic workaround to support the userbase trying to run this on Windows from official releases. I really don't want this modification to be any kind of "template" for what to do elsewhere though. It's horrible. It's a hack for a broken STL. My personal preference is to fix the STL or shift to a working STL, and revert this commit entirely afterwards.

Ultimately though, I think @LukeUsher has to weigh in on this one.

@rasky
Copy link
Collaborator

rasky commented Oct 14, 2025

The problem in the first place is that no code in the core should ever mention std::atomic_flag, IMO. Cores should be as close as possible to just describe the hardware being emulated. Issues related to performance like prefetching or even details like how to fetch an image from QON should be moved elsewhere.

The core should only say the moral equivalent of "laserdisc, fetch frame N" (simplifying). Then there would a "Laserdisc on MMI" class, in nall, which takes care also of the prefetching with the performance-related annoyances. If you want, you can have an intermediate ares component for each specific Pioneer model or whatever, but even those should avoid performance-related matters, or MMI-related matters, and just contain the "hardware HLE" code so to speak.

The fact that we're adding a workaround for a STL bug in the core is IMO a sign that we've moved too far into the wrong direction here.

@jcm93
Copy link
Contributor

jcm93 commented Oct 14, 2025

I agree that, ideally, prefetching would be abstracted to be an implementation detail of something intermediate, rather than living this far in the weeds of the core-level code.

I'd also push back on the notion that a nall helper object for multithreading necessarily runs contrary to the project's goals with nall->STL replacements (that is, replacing basic nall types in the cases where STL types are now equally functional as well as readable and usable, or at least very close). nall is full of useful abstractions for cumbersome functionality, and will remain that way after that project is completed.

While it doesn't live in nall, and I'm sure a better C++ programmer could find warts in my implementation, the Program::Guard object I wrote I think is a decent example; if anything outside of emulation needs to modify emulator state, it only needs to create an instance of that object and have safe access for the duration of that object's scope, rather than needing to directly deal with any sync primitives.

C++20 atomic flag waits are certainly more usable and less cumbersome than sync primitives available in prior standards that would achieve similar functionality and performance, as this PR demonstrates quite effectively. But they do still have some particularities and a fair amount of cognitive overhead to implement correctly. If there's a viable way to abstract that complexity (for generalized tasks inside cores that are suited to multithreading, not necessarily this specific prefetch routine), I think that's worth looking at, so that developers working on their cores can focus more on the desired functionality and less on C++ particularities.

@RogerSanders
Copy link
Contributor Author

I don't fundamentally disagree with anything that's been said. Creating logical helpers for synchronization is sensible, and I've done it in many other projects. I also agree the cores themselves should, as much as possible, be dedicated to emulating the hardware and not do anything "extraneous". Maybe this prefetch logic could be factored out to some more general "Laserdisc in mmi" type classes in mia or elsewhere as part of a future job. They may never be used elsewhere, but it could be done during cleanup. My next change will actually be to unify and factor out the duplicated implementations of the Pioneer PD6103A IC that currently live in the megald.cpp and ldrom2.cpp files, into a single reusable component for the IC that the cores pull in. This particular job is really the last piece to get to that point, where I can say "the games now work well and are playable on almost all systems, let's focus on cleanup without breaking that".

@RogerSanders
Copy link
Contributor Author

To keep this updated, I found some time tonight and submitted a fix for this issue upstream in libc++, which is currently working its way through review:
llvm/llvm-project#163524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants