Skip to content

Conversation

AYDEV-FR
Copy link

@AYDEV-FR AYDEV-FR commented Jul 4, 2025

Comprehensive Summary of your change

This PR updates the proxy cache logic so that if an image exists in the local cache but has been deleted from the remote repository, Harbor will serve the cached image instead of failing with a "not found" error. This brings the implementation in line with the documented behavior and improves reliability when remote repositories are unavailable or images have been removed upstream.

You can find more details about this change in issue #22106.

Implementation Notes

I hesitated between modifying the remote.ManifestExist function in harbor/src/controller/proxy/controller.go (since it does not return a 404 error directly in harbor/src/pkg/registry/client.go):

func (c *client) ManifestExist(repository, reference string) (bool, *distribution.Descriptor, error) {
	req, err := http.NewRequest(http.MethodHead, buildManifestURL(c.url, repository, reference), nil)
	if err != nil {
		return false, nil, err
	}
	for _, mediaType := range accepts {
		req.Header.Add("Accept", mediaType)
	}
	resp, err := c.do(req)
	if err != nil {
		if errors.IsErr(err, errors.NotFoundCode) {
			return false, nil, nil
		}
		return false, nil, err
	}
	defer resp.Body.Close()
	dig := resp.Header.Get("Docker-Content-Digest")
	contentType := resp.Header.Get("Content-Type")
	contentLen := resp.Header.Get("Content-Length")
	lenth, _ := strconv.Atoi(contentLen)
	return true, &distribution.Descriptor{Digest: digest.Digest(dig), MediaType: contentType, Size: int64(lenth)}, nil
}

However, to avoid unintended side effects by changing this function's behavior, I decided to keep the current "404 not found" handling (which returns no error) and instead update the UseLocalManifest function in harbor/src/controller/proxy/controller.go:

remoteRepo := getRemoteRepo(art)
exist, desc, err := remote.ManifestExist(remoteRepo, getReference(art)) // HEAD
if err != nil {
	if errors.IsRateLimitError(err) && a != nil { // if rate limit, use local if it exists, otherwise return error
		return true, nil, nil
	}
	return false, nil, err
}
if !exist || desc == nil {
	if a != nil { // if not found, use local if it exists, because a exist, otherwise return error
		log.Errorf("Artifact not found in remote registry but exists in local cache, serving from local: %v:%v", art.Repository, art.Tag)
		return true, nil, nil
	}
	return false, nil, errors.NotFoundError(fmt.Errorf("repo %v, tag %v not found", art.Repository, art.Tag))
}

Issue being fixed

Fixes #22106

No documentation modification is needed because this PR enforces behavior that is already described in the documentation:

If the image has not been updated in the target registry, the cached image is served from the proxy cache project.
If the image has been updated in the target registry, the new image is pulled from the target registry, then served and cached in the proxy cache project.
If the target registry is not reachable, the proxy cache project serves the cached image.
If the image is no longer in the target registry, but is still in the proxy cache project, the cached image is served from the proxy cache project.

Please confirm you've completed the following:

  • Well Written Title and Summary of the PR
  • Labeled the PR as needed ("release-note/ignore-for-release", "release-note/new-feature", "release-note/update", "release-note/enhancement", "release-note/community", "release-note/breaking-change", "release-note/docs", "release-note/infra", "release-note/deprecation")
  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Made sure tests are passing and test coverage is added if needed.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

@AYDEV-FR AYDEV-FR requested a review from a team as a code owner July 4, 2025 08:27
@Vad1mo Vad1mo added the release-note/update Update or Fix label Jul 4, 2025
Copy link

codecov bot commented Jul 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.88%. Comparing base (c8c11b4) to head (813dc3d).
⚠️ Report is 573 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main   #22153       +/-   ##
===========================================
+ Coverage   45.36%   65.88%   +20.51%     
===========================================
  Files         244     1073      +829     
  Lines       13333   116013   +102680     
  Branches     2719     2927      +208     
===========================================
+ Hits         6049    76430    +70381     
- Misses       6983    35350    +28367     
- Partials      301     4233     +3932     
Flag Coverage Δ
unittests 65.88% <100.00%> (+20.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/controller/proxy/controller.go 14.72% <100.00%> (ø)

... and 986 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wy65701436 wy65701436 assigned stonezdj and unassigned OrlinVasilev Jul 7, 2025
@reasonerjt
Copy link
Contributor

@AYDEV-FR

Thanks for the PR. But IMO this is a "feat" rather than a "fix".

If the image is no longer in the target registry, but is still in the proxy cache project, the cached image is served from the proxy cache project.

This is a break-change to the original design, which returns 404 when the remote content is removed.

Because we have passed the "feature freeze" and will hit "feature complete" in July, I don't think it can be made into v2.14. Are you willing to continue working on this after the branch for 2.14 is cut? We can continue the discussion in the issue. IMO at least an option should be added, so user can choose whether to server local content when the remote artifact is removed.

Copy link
Contributor

@reasonerjt reasonerjt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding, this is a break change. More discussion needed.

@AYDEV-FR
Copy link
Author

AYDEV-FR commented Jul 7, 2025

Hi @reasonerjt ,

I understand your point of view regarding this being considered a feature rather than a fix. From my perspective, it was more of a fix to align the behavior with what is described in the documentation: goharbor/website@2ee87ae
I don’t understand why the documentation was updated by @stonezdj, but nothing was changed in the code regarding this.

In the meantime, I found a workaround by returning a 429 response for each 404 returned by the remote registry. Since I have this workaround, I am willing to wait for your decision on whether to update the documentation or merge my change.

However, I think that giving users a choice via a parameter might create confusion about the feature. In my opinion, serving the local cache when the remote registry responds with a 404 is the very concept of a proxy cache. And the Harbor documentation seems to agree with me.

@reasonerjt
Copy link
Contributor

@AYDEV-FR Thanks for pointing this out, let me double check why this change in the doc was made, if this is on purpose, I agree this is a fix not a feat.

@AYDEV-FR AYDEV-FR requested a review from reasonerjt July 29, 2025 17:43
@khaibenz
Copy link

In the meantime, I found a workaround by returning a 429 response for each 404 returned by the remote registry. Since I have this workaround, I am willing to wait for your decision on whether to update the documentation or merge my change.

Hi @AYDEV-FR

I am also running into the same problem and I would be interested to know how did you implement the workaround to return a 429 from the registry?

@tamcore
Copy link

tamcore commented Sep 9, 2025

Is there any update on this? Especially with bitnami's upcoming removal of images from docker.io, having the pull-through cache continue serving removed, but cached, images would be incredibly usefull. After all, that's probably one of the reasons people configure Harbor as a pull-through cache. To not just reduce traffic, but also keep workloads operational in case upstream images go away for whatever reason.

@bupd
Copy link
Contributor

bupd commented Sep 19, 2025

@AYDEV-FR great work

https://github.com/goharbor/community/pull/144/files#diff-78dea958499f3e23826611cf839d9a96615a0b420f33520e92d564f2fb17d24fR127 as per the Harbor proxy spec. proxy cache is said to delete the local manifest if it does not exist in remote. But in the document it is said to be serving the artifact even when the remote artifact is deleted. Also adding to that we did a change only in 2.14 to delete the local artifact which makes it pretty clear that harbor is previously serving the artifact. Which makes this a fix.
With that said this is a much needed change.

This also delves into the question of data integrity and what to do with stale data..

I believe we should do this as a fix since I feel like harbor was serving in the past. And this is a behaviour different from the spec mentioned above. So we should update the spec.

there is also this parity between the pull through cache vs proxy cache.

Harbor doesn't support pull through cache. maybe we can do that.

@marevers
Copy link

marevers commented Sep 19, 2025

@bupd Harbor 2.14 made the opposite change: #22175
As you say, artifacts are now deleted from the cache if they don't exist anymore. I agree with @tamcore though, an important reason for us and I'm sure many to use Harbor as a cache is to safeguard deployments if somehow the original artifact is deleted or the repository is not reachable.

Maybe the best option is to make it configurable, leaving the current implementation in place as default? It could be a checkbox setting on the proxy cache level, something like Serve unavailable artifacts from cache.

@bupd
Copy link
Contributor

bupd commented Sep 19, 2025

Also from the distribution spec it is clear that Harbor should serve artifacts that it locally has even if the remote artifact is deleted.

Also given the use case and demand and Harbor already might been doing this in the past makes it clear to have it in 2.14.1

@bupd
Copy link
Contributor

bupd commented Sep 19, 2025

goharbor/community#144

Given that the Proposal PR is merged with lazy consensus with lack of reviews. given that spec is clearly titled as Pull through proxy cache rather than online-only proxy cache makes it clear what the intention was.

A true pull-through cache should be able to serve content even when the remote resource is deleted or completely offline, which is a critical distinction from an online-only proxy.

Copy link
Contributor

@bupd bupd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AYDEV-FR Thanks for your contribution

please do rebase, otherwise LGTM a much needed change in harbor

@AYDEV-FR AYDEV-FR force-pushed the fix/proxy-cache-serve-local-on-remote-not-found branch from f2e80f9 to 69adb22 Compare September 19, 2025 21:47
@AYDEV-FR
Copy link
Author

AYDEV-FR commented Sep 22, 2025

Hi @bupd and @reasonerjt,

Thanks for your review !
I agree, this change seems to be expected behavior for a solution like Harbor. Often used as a cache to avoid production outages or to overcome issues with image name changes, licences or maintenance (like the recent Bitnami changes).

I’ve rebased my code and run multiple tests with the changes in my PR.
The modification works well and handles several scenarios, including:

  • 404 Not Found Manifest
  • 401 Unauthorized Manifest
  • 503 Service Unavailable
  • Remote not available

When the repo is not available, the cache is correctly served from /server/middleware/repoproxy/proxy.go:266.

If the repo returns 503, while Harbor is still detecting it as unhealthy:

[ERROR] [/server/middleware/repoproxy/proxy.go:142]: failed to proxy manifest, fallback to local, request uri: /v2/local/debian/debian/manifests/latest, error: http status code: 503

If the repo returns 404 for the manifest (e.g., the image has been deleted), it's my code behaves as expected:

[ERROR] [/controller/proxy/controller.go:176]: Artifact not found in remote registry but exists in local cache, serving from local: local/debian/debian:latest

The change is working well. Would it be possible to merge this and release a v2.14.1 as @bupd suggested?
Thanks!

@patsevanton
Copy link

any update?

@Vad1mo Vad1mo requested a review from Copilot October 2, 2025 13:31
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the proxy cache logic to serve locally cached images when they exist locally but are no longer found in the remote repository, aligning implementation with documented behavior.

  • Updates UseLocalManifest function to check for local artifact existence before returning "not found" error
  • Removes automatic deletion logic for locally cached manifests when remote returns 404
  • Updates corresponding test case to expect success instead of error when local cache exists but remote doesn't

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/controller/proxy/controller.go Modified logic to serve from local cache when remote artifact is not found
src/controller/proxy/controller_test.go Updated test case to reflect new behavior of serving from local cache

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Vadim Bauer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/update Update or Fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proxy cache should serve cached images even if remote image is deleted