Skip to content

Conversation

@thc1006
Copy link

@thc1006 thc1006 commented Nov 7, 2025

Summary

This PR implements admin-level configuration for driver pod labels and annotations, addressing issue #12015. Configuration is loaded once at API server startup and cached for performance.

Key Changes

  • Startup-time configuration loading: Driver pod config (labels and annotations) is initialized once when the API server starts via common.InitDriverPodConfig() in backend/src/apiserver/main.go:418
  • Thread-safe caching: Uses sync.RWMutex for safe concurrent access to cached configuration
  • Reserved label filtering: Automatically filters out Kubernetes reserved labels (kubernetes.io/, k8s.io/, etc.) to prevent conflicts
  • JSON configuration format: Supports only JSON format for configuration (as per reviewer feedback)
  • Comprehensive testing: 16 unit tests covering all configuration scenarios, all passing locally

All Reviewer Requirements Addressed (7/7)

Removed .gitignore changes - Only relevant backend and manifest changes included
Using Viper configuration - Integrated with existing Viper config system
JSON format only - Removed YAML support, JSON format only
Kubernetes-compatible format - Standard Kubernetes label/annotation format
Startup-time loading with caching - Config loaded once at startup, cached for performance
Documentation in backend/README.md - Concise documentation added
Removed unimplemented options - Removed environment variable and serviceaccount options

Testing

Local tests passed:

  • ✅ All 16 driver config unit tests (backend/src/apiserver/common/driver_config_test.go)
  • ✅ All argocompiler unit tests
  • ✅ go mod tidy
  • ✅ go vet
  • ✅ go fmt

Configuration Example

{
  "DriverPodLabels": {
    "custom-label": "value",
    "team": "ml-platform"
  },
  "DriverPodAnnotations": {
    "sidecar.istio.io/inject": "true",
    "custom-annotation": "value"
  }
}

Resolves: #12015


Note: This PR supersedes #12306 which was closed due to commit history issues. All code changes and reviewer feedback remain fully addressed in this clean PR.

@google-oss-prow
Copy link

Hi @thc1006. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

- Load and cache driver pod configuration at API server startup
- Implement thread-safe config access with sync.RWMutex
- Add comprehensive unit tests (16 test cases, all passing)
- Remove .gitignore unrelated changes
- Update documentation in backend/README.md
- Remove unimplemented configuration options

All reviewer requirements (7/7) fully addressed:
✅ Removed .gitignore changes
✅ Using Viper configuration system
✅ JSON format only
✅ Kubernetes-compatible format
✅ Startup-time config loading with caching
✅ Documentation in backend/README.md
✅ Removed unimplemented config options

CI checks passed locally:
✅ go mod tidy
✅ All unit tests (16/16)
✅ go vet
✅ go fmt

Resolves: kubeflow#12015
Signed-off-by: thc1006 <[email protected]>
@thc1006 thc1006 force-pushed the feature/issue-12015-driver-config-clean branch from 1aedf1e to 0f72b31 Compare November 7, 2025 09:00
- IMPLEMENTATION_SUMMARY.md: Complete implementation details and status
- QUICK_REFERENCE.md: Quick reference for resuming work

These documents capture the current state of issue kubeflow#12015 implementation
for easy reference when resuming work after session disconnect.

Related: kubeflow#12015
Signed-off-by: thc1006 <[email protected]>
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zazulam for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/XXL and removed size/L labels Nov 7, 2025
@hbelmiro
Copy link
Contributor

hbelmiro commented Nov 7, 2025

/ok-to-test

Adds package-level documentation comment to driver_config.go to resolve
staticcheck ST1000 warning that requires at least one file in a package
to have a package comment.

The comment describes the common package's purpose as providing shared
utilities and configuration for the KFP API server.

Signed-off-by: thc1006 <[email protected]>
@thc1006
Copy link
Author

thc1006 commented Nov 9, 2025

Dear @hbelmiro,
Here's fixed pre-commit staticcheck ST1000 warning by adding package comment to driver_config.go (commit ada110b).

Regarding E2E test failures: these are unrelated to this PR. The failures are caused by metadata-grpc-deployment CrashLoopBackOff, which is an intermittent upstream issue. Evidence:

  • Master branch E2E tests on 2025-11-05 (commit 36c8437, our base) also failed with identical metadata-grpc errors
  • Latest master (2025-11-07 20:30) E2E tests passed successfully
  • Our changes only modify driver pod labels/annotations configuration and do not touch metadata services

Local verification completed:

  • Go unit tests: 16/16 passed
  • go vet: passed
  • go fmt: passed
  • Package builds: successful

Now ready for review.

@hbelmiro
Copy link
Contributor

/ok-to-test

@thc1006
Copy link
Author

thc1006 commented Nov 11, 2025

TL;DR

E2E failures are caused by GCR shutdown (unrelated to this PR). Code is ready for review. Fix in progress: PR #12426.


Dear @hbelmiro and @mprahl,

Quick update on the CI failures:

Root Cause: GCR shutdown (2025-07-17) - minio image no longer accessible
Impact: Only 4 minio-dependent E2E tests failed
Evidence: All other tests passed ✓ (seaweedfs tests, builds, pre-commit)

Our PR Scope: Driver pod labels/annotations only - no minio changes
Local Tests: 16/16 unit tests ✓, linters ✓

Upstream Fix: PR #12426 migrates to Docker Hub
Next Step: Will rebase once #12426 merges

The driver config implementation is complete and ready for code review. Thank you! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[backend] Add the ability to specify driver labels/annotations as a KFP api server configuration

2 participants