Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient cache not properly invalidated #523

Open
JUVOJustin opened this issue May 21, 2024 · 4 comments
Open

Transient cache not properly invalidated #523

JUVOJustin opened this issue May 21, 2024 · 4 comments
Labels

Comments

@JUVOJustin
Copy link

Through two async requests, the transient cache is not properly invalidated

Description

I have two async jobs utilizing mutual locking by transient. This means each job trys to achieve a lock by setting a transient. Once finished, the "lock" transient is deleted. When the second job arrives in parallel, it waits till the lock is released. The problem now is that even when job 1 releases the lock by deleting the transient, the get_transient function of job 2 sill returns the old "lock" status.

Expected Behavior

After deleting the "lock" transient in job 1 a get_transient call in job 2 should return false.

Actual Behavior

After deleting the "lock" transient in job 1 a get_transient call in job 2 returns "1" which is the value representing the "lock" status.

Steps to Reproduce

  1. Create a script emitting two jobs using https://actionscheduler.org/
  2. With step debugging enabled, trigger the first job which creates a "lock" transient
  3. Start the second job manually and check the return of the get_transient for the same transient name previously set in job 1
  4. Continue with job 1 until the transient is deleted, meaning the lock is released.
  5. Check the return of get_transeint in job 2. Should return "false" will return "1"

Additional context

I am importing a lot of data using action-scheduler jobs. Each processed post_id is stored in a transient. To avoid race conditions, i integrated mutual locking using a transient. If a job is locked the running job will wait 1 second and try again.

Environment

  • Plugin version: 2.5.2
  • PHP version: 8.1
  • WordPress version: 6.5.3
@JUVOJustin JUVOJustin added the bug label May 21, 2024
@tillkruss tillkruss added question and removed bug labels May 21, 2024
@tillkruss
Copy link
Member

Transients are handled entirely by WordPress core.

If you can post exact code samples to reproduce this, we might be able to help.

@JUVOJustin
Copy link
Author

JUVOJustin commented May 21, 2024

Hi,
the problem only occurs with persisted object cache. Not saying it is caused by your plugin :) Appreciate any help.

The importer is open source. The main function can be found here: https://github.com/JUVOJustin/as-processor/blob/main/src/Sync_Data.php#L158

There is a clear_caches function that is required for non-persistent object caches: https://github.com/JUVOJustin/as-processor/blob/main/src/Sync_Data.php#L346 . This method basically handles the case where the first get_transient call for the lock returns "1" and is then stored inside the object cache. When another job releases the lock, the non-persistent object cache has to be deleted to query the actual database value again.

This is an example integration. The code is run in multiple action-scheduler jobs in parallel:

/**
* @throws Exception
*/
function process_chunk_data(\Generator $chunkData): void
{
  $chunk_post_ids = [];
  foreach ($chunkData as $row) {

      if (empty($row['Händler-SKU']) || empty($row['Artikelbezeichnung'])) {
          continue;
      }

      $sku = $row['Händler-SKU'];
      $asin = $row['ASIN 1'];

      $product_id = $this->upsert_product_by_sku($sku, $asin, $row);
      $chunk_post_ids[] = $product_id;
  }

  $attempts = 0;

  // Update sync data
  do {
      try {
          $this->update_sync_data(['post_ids' => $chunk_post_ids], true, true);
          return;
      } catch (Exception $e) {
          $attempts++;
          sleep(1);
          continue;
      }
  } while($attempts < 5);

  throw new Exception('Sync Data could not be added');

}

Not sure if it does matter, but i did the step debugging using: https://github.com/ddev/ddev-redis

@tillkruss
Copy link
Member

If the action-scheduler is a long running process it's cache might go stale, because it's cached in PHP memory as well.

If you switch to wp_cache_get() with the $force parameter instead of get_transient() the cache would always fetch from Redis and not go stale, try using that.

@JUVOJustin
Copy link
Author

JUVOJustin commented May 25, 2024

@tillkruss thank you very much. This fixed it! For reference, I fixed this by creating my own get_transient function: https://github.com/JUVOJustin/as-processor/blob/f898df613b7bc47beff2787a4d57237a434f54ce/src/Sync_Data.php#L361

Personally, I would love to see this fixed in core. I opened a ticket and would love to hear your opinion since you seem to be very invested in these APIs: https://core.trac.wordpress.org/ticket/61296#ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants