Skip to content

Conversation

evetsso
Copy link
Contributor

@evetsso evetsso commented Sep 5, 2025

Fix a failure in the rtc_cache unit test on Windows when repeated in the same process (--gtest_repeat=N), due to a number of factors:

  • rocFFT uses std::async to launch runtime compilation work asynchronously.

  • The standard library is allowed to implement std::async in terms of a threadpool, and in this case the lifetimes of those threads are not well-defined from rocFFT's perspective.

  • The rtc_cache unit test goes through several iterations of setting up and cleaning up the rocFFT logging system and also uses logs to infer whether kernels were compiled when they were supposed to be.

  • rocFFT's logging system uses thread_local storage for some rocfft_ostream objects.

The timeline of a failure goes like:

  1. Start rtc_cache test iteration 1
  2. Build some plans with RTC logging enabled
    • Compilation work is done using std::async, so logging is done by a threadpool thread
    • Thread creates a thread_local rocfft_ostream object associated to the logging system.
  3. Reset logging system and start rtc_cache test iteration 2
  4. Build the same plans with RTC logging enabled
    • Compilation work is done using std::async again, and the same threadpool thread is recycled.
    • No new thread_local rocfft_ostream object is created, because this thread already has one. But it's not correctly connected to the logging system, as that was reset in the previous step. The test fails because it's not able to observe the log messages it expects.

So instead, explicitly manage the thread that does the asynchronous work so that it doesn't live past a cleanup/setup iteration.

Fix a failure in the rtc_cache unit test on Windows when repeated in the
same process (--gtest_repeat=N), due to a number of factors:

* rocFFT uses std::async to launch runtime compilation work asynchronously.

* The standard library is allowed to implement std::async in terms of a
  threadpool, and in this case the lifetimes of those threads are not
  well-defined from rocFFT's perspective.

* The rtc_cache unit test goes through several iterations of setting up and
  cleaning up the rocFFT logging system and also uses logs to infer whether
  kernels were compiled when they were supposed to be.

* rocFFT's logging system uses thread_local storage for some rocfft_ostream
  objects.

The timeline of a failure goes like:

1. Start rtc_cache test iteration 1
2. Build some plans with RTC logging enabled
   - Compilation work is done using std::async, so logging is done by a
     threadpool thread
   - Thread creates a thread_local rocfft_ostream object associated to the
     logging system.
3. Reset logging system and start rtc_cache test iteration 2
4. Build the same plans with RTC logging enabled
   - Compilation work is done using std::async again, and the same
     threadpool thread is recycled.
   - No new thread_local rocfft_ostream object is created, because this
     thread already has one.  But it's not correctly connected to the
     logging system, as that was reset in the previous step.  The test fails
     because it's not able to observe the log messages it expects.

So instead, explicitly manage the thread that does the asynchronous work so
that it doesn't live past a cleanup/setup iteration.
@malcolmroberts
Copy link
Contributor

Is there any convenient way to test this?

@evetsso
Copy link
Contributor Author

evetsso commented Sep 5, 2025

Is there any convenient way to test this?

Hmm... One option would be to bake the idea of the --gtest_repeat=2 into the test, so that it basically does the interesting things twice. Apart from that it's hard to think of ways to trigger this nicely from outside of the library, considering that the thread pooling and the thread-local storage are very deep implementation details.

@af-ayala af-ayala self-requested a review September 6, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants