Skip to content

Conversation

@kaovilai
Copy link
Collaborator

@kaovilai kaovilai commented Nov 4, 2025

Fixes #9381

Problem:
Velero's restore process ignores return values from WaitForCacheSync(),
causing error logs when informer caches fail to sync for API groups that
don't support watch operations (e.g., authorization.openshift.io/v1 on
OpenShift clusters). While restore operations complete successfully via
fallback to direct API calls, the error logs create confusion.

Solution:

  • Track resources that fail to sync in resourcesWithoutInformerCache set
  • Check WaitForCacheSync return values at two locations:
    1. Initial cache sync for all resources (restore.go:609-617)
    2. Per-resource sync for CRDs/RIA-added resources (restore.go:1070-1078)
  • Bypass informer cache for tracked resources in getResource() (restore.go:1099)
  • Use direct API calls via getResourceClient() for resources without cache
  • Log informational messages (not errors) explaining API server restrictions

Testing:

  • Added waitforcachesync_test.go with comprehensive unit tests
  • Tests use generic example.com/v1/widgets to demonstrate pattern
  • All existing restore package tests pass with no regressions

Impact:

  • No functional changes - restore operations continue to work correctly
  • Eliminates confusing error logs for expected API limitations
  • Clear informational logging about cache bypass behavior
  • Better handling of API groups with architectural watch restrictions

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Thank you for contributing to Velero!

Please add a summary of your change

Does your change fix a particular issue?

Fixes #(issue)

Please indicate you've done the following:

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.03%. Comparing base (45755e1) to head (458d8d7).

Files with missing lines Patch % Lines
pkg/restore/restore.go 64.70% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9384      +/-   ##
==========================================
- Coverage   60.03%   60.03%   -0.01%     
==========================================
  Files         384      384              
  Lines       35150    35165      +15     
==========================================
+ Hits        21103    21112       +9     
- Misses      12493    12498       +5     
- Partials     1554     1555       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Fixes vmware-tanzu#9381

Problem:
Velero's restore process ignores return values from WaitForCacheSync(),
causing error logs when informer caches fail to sync for API groups that
don't support watch operations (e.g., authorization.openshift.io/v1 on
OpenShift clusters). While restore operations complete successfully via
fallback to direct API calls, the error logs create confusion.

Solution:
- Track resources that fail to sync in resourcesWithoutInformerCache set
- Check WaitForCacheSync return values at two locations:
  1. Initial cache sync for all resources (restore.go:609-617)
  2. Per-resource sync for CRDs/RIA-added resources (restore.go:1070-1078)
- Bypass informer cache for tracked resources in getResource() (restore.go:1099)
- Use direct API calls via getResourceClient() for resources without cache
- Log informational messages (not errors) explaining API server restrictions

Testing:
- Added waitforcachesync_test.go with comprehensive unit tests
- Tests use generic example.com/v1/widgets to demonstrate pattern
- All existing restore package tests pass with no regressions

Impact:
- No functional changes - restore operations continue to work correctly
- Eliminates confusing error logs for expected API limitations
- Clear informational logging about cache bypass behavior
- Better handling of API groups with architectural watch restrictions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@kaovilai
Copy link
Collaborator Author

kaovilai commented Nov 5, 2025

moving to draft till we learn more about this problem.

@kaovilai kaovilai marked this pull request as draft November 5, 2025 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unhandled Error: Failed to watch authorization

1 participant