Handle WaitForCacheSync failures for resources without watch support #9384

kaovilai · 2025-11-04T03:19:01Z

Problem:
Velero's restore process ignores return values from WaitForCacheSync(),
causing error logs when informer caches fail to sync for API groups that
don't support watch operations (e.g., authorization.openshift.io/v1 on
OpenShift clusters). While restore operations complete successfully via
fallback to direct API calls, the error logs create confusion.

Solution:

Track resources that fail to sync in resourcesWithoutInformerCache set
Check WaitForCacheSync return values at two locations:
1. Initial cache sync for all resources (restore.go:609-617)
2. Per-resource sync for CRDs/RIA-added resources (restore.go:1070-1078)
Bypass informer cache for tracked resources in getResource() (restore.go:1099)
Use direct API calls via getResourceClient() for resources without cache
Log informational messages (not errors) explaining API server restrictions

Testing:

Added waitforcachesync_test.go with comprehensive unit tests
Tests use generic example.com/v1/widgets to demonstrate pattern
All existing restore package tests pass with no regressions

Impact:

No functional changes - restore operations continue to work correctly
Eliminates confusing error logs for expected API limitations
Clear informational logging about cache bypass behavior
Better handling of API groups with architectural watch restrictions

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Thank you for contributing to Velero!

Please add a summary of your change

Does your change fix a particular issue?

Fixes #(issue)

Please indicate you've done the following:

Accepted the DCO. Commits without the DCO will delay acceptance.
Created a changelog file (make new-changelog) or comment /kind changelog-not-required on this PR.
Updated the corresponding documentation in site/content/docs/main.

codecov · 2025-11-04T03:38:22Z

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.03%. Comparing base (45755e1) to head (458d8d7).

Files with missing lines	Patch %	Lines
pkg/restore/restore.go	64.70%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9384      +/-   ##
==========================================
- Coverage   60.03%   60.03%   -0.01%     
==========================================
  Files         384      384              
  Lines       35150    35165      +15     
==========================================
+ Hits        21103    21112       +9     
- Misses      12493    12498       +5     
- Partials     1554     1555       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Fixes vmware-tanzu#9381 Problem: Velero's restore process ignores return values from WaitForCacheSync(), causing error logs when informer caches fail to sync for API groups that don't support watch operations (e.g., authorization.openshift.io/v1 on OpenShift clusters). While restore operations complete successfully via fallback to direct API calls, the error logs create confusion. Solution: - Track resources that fail to sync in resourcesWithoutInformerCache set - Check WaitForCacheSync return values at two locations: 1. Initial cache sync for all resources (restore.go:609-617) 2. Per-resource sync for CRDs/RIA-added resources (restore.go:1070-1078) - Bypass informer cache for tracked resources in getResource() (restore.go:1099) - Use direct API calls via getResourceClient() for resources without cache - Log informational messages (not errors) explaining API server restrictions Testing: - Added waitforcachesync_test.go with comprehensive unit tests - Tests use generic example.com/v1/widgets to demonstrate pattern - All existing restore package tests pass with no regressions Impact: - No functional changes - restore operations continue to work correctly - Eliminates confusing error logs for expected API limitations - Clear informational logging about cache bypass behavior - Better handling of API groups with architectural watch restrictions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

kaovilai · 2025-11-05T00:18:01Z

moving to draft till we learn more about this problem.

github-actions bot assigned kaovilai Nov 4, 2025

github-actions bot added the has-unit-tests label Nov 4, 2025

github-actions bot requested review from Lyndon-Li and ywk253100 November 4, 2025 03:19

kaovilai force-pushed the issue9381 branch from cb62bc4 to 4d6a871 Compare November 4, 2025 03:19

github-actions bot added the has-changelog label Nov 4, 2025

kaovilai force-pushed the issue9381 branch 2 times, most recently from 26b7405 to fc82ad3 Compare November 4, 2025 03:28

kaovilai force-pushed the issue9381 branch from fc82ad3 to 458d8d7 Compare November 4, 2025 04:41

kaovilai marked this pull request as draft November 5, 2025 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle WaitForCacheSync failures for resources without watch support #9384

Handle WaitForCacheSync failures for resources without watch support #9384

Uh oh!

kaovilai commented Nov 4, 2025

Uh oh!

codecov bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Handle WaitForCacheSync failures for resources without watch support #9384

Are you sure you want to change the base?

Handle WaitForCacheSync failures for resources without watch support #9384

Uh oh!

Conversation

kaovilai commented Nov 4, 2025

Please add a summary of your change

Does your change fix a particular issue?

Please indicate you've done the following:

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Nov 4, 2025 •

edited

Loading