Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaniko fails to pull from GCR in Gitlab #3328

Open
jgsuess opened this issue Oct 2, 2024 · 13 comments
Open

Kaniko fails to pull from GCR in Gitlab #3328

jgsuess opened this issue Oct 2, 2024 · 13 comments

Comments

@jgsuess
Copy link

jgsuess commented Oct 2, 2024

Actual behavior

In gitlab, kaniko obtains the manifest of an image, but fails to obtain the image from GCR.

Expected behavior
In gitlab, kaniko obtains the image from GCR.

To Reproduce
Steps to reproduce the behavior:

  1. In a Gitlab repository, create the files below.
  2. Observe build failure
build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2-debug
    entrypoint: [""]
  script:
     - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
FROM gcr.io/distroless/java17-debian12:nonroot

Additional Information

  • Dockerfile
    Please provide either the Dockerfile you're trying to build or one that can reproduce this error.
  • Build Context
    Please provide or clearly describe any files needed to build the Dockerfile (ADD/COPY commands)
  • Kaniko Image (fully qualified with digest)
Using docker image sha256:16b383e1c3b259d59f75a2720a45ccf15b3a716cef44c6a5c521ceb471117168 for gcr.io/kaniko-project/executor:v1.23.2-debug with digest gcr.io/kaniko-project/executor@sha256:c3109d5926a997b100c4343944e06c6b30a6804b2f9abe0994d3de6ef92b028e ...
$ /kaniko/executor --context "${CI_PROJECT_DIR}" --dockerfile "${CI_PROJECT_DIR}/Dockerfile" --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
INFO[0000] Using dockerignore file: /builds/australian-e-health-research-centre/digital-health-strengthening-standards-capability/infrastructure/containers/hapi-fhir-jpa-server/.dockerignore 
INFO[0000] Retrieving image manifest gcr.io/distroless/java17-debian12:nonroot 
INFO[0000] Retrieving image gcr.io/distroless/java17-debian12:nonroot from registry gcr.io 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fjava17-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
  • - []
@mzihlmann
Copy link

mzihlmann commented Oct 15, 2024

When running as a gitlab job, kaniko automatically sets authentication for you based on the predefined variables:

  • CI_REGISTRY
  • CI_REGISTRY_USER
  • CI_REGISTRY_PASSWORD

If you need different credentials either inside gitlab or with other registries, you must manually set these credentials.

ie:

build:
  variables:
    DOCKER_CONFIG_JSON: |
      {
          "auths":{
              "$MY_REGISTRY":{
                  "auth":"$MY_AUTH"
              }
          }
      }
  before_script:
    - echo $DOCKER_CONFIG_JSON > /kaniko/.docker/config.json

That's the downside of doing things implicitly, it works out of the box until it doesn't, and then everybody gets confused.

@jameshartig
Copy link

jameshartig commented Oct 15, 2024

It looks like the OP is trying to fetch gcr.io/distroless/java17-debian12:nonroot which doesn't need authentication.

This is a duplicate of #1984

@mzihlmann
Copy link

mzihlmann commented Oct 16, 2024

Indeed! Sorry for not reading carefully enough. For me however, this only happens on gitlab.com with runners from gitlab.com (docker). on our self-hosted gitlab + self-hosted runners (k8s) this is not reproducible. I experimented a bit around and found this workaround:

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2-debug
    entrypoint: [""]
  variables:
    FF_NETWORK_PER_BUILD: true
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

FF_NETWORK_PER_BUILD

but I have not yet understood why.

@mzihlmann
Copy link

mzihlmann commented Oct 16, 2024

As the other thread mentioned this works with 1.7.0 and fails starting from 1.8.0. good opportunity for some good old git bisect action.
v1.7.0...v1.8.0

at first glance this is a bit suspicious 09e70e4

when I short-circuit the resolve function it indeed no longer fails

func resolve(ctx context.Context) authn.Authenticator {
	return authn.Anonymous
}

@mzihlmann
Copy link

running git bisect lead me to this commit 633f555
currently looking at the changes in that commit, it was meant to fix #1856

@jameshartig
Copy link

Looks like when the registry is gcr.io or pkg.dev it ends up calling google.FindDefaultCredentialsWithParams which ends up talking to the GCE metadata server to pull credentials. I speculate this is happening and it's getting an unusable token while running on GitLab's runners.

We ended up fixing this by setting GOOGLE_APPLICATION_CREDENTIALS="/dev/null" which short-circuits that function and returns an error causing the google keychain to fallback to anonymous.

@mzihlmann
Copy link

mzihlmann commented Oct 17, 2024

sorry for not posting yesterday already. I was contacting gitlab to make sure it is ok for me to write this message publicly, but now that the cat is out of the bag I must say it's quite hilarious. kaniko does what it is supposed to do perfectly well in this case 😄.

Luckily the access token it receives can't be used for much, ie. downloading images doesn't work 🤣. I tried to write the issue report into their logging system but got stuck because I couldn't guess their logstream name 🤣.

@mzihlmann
Copy link

I think we can still improve the situation out of the box for users. The fundamental problem is that if authentication works and a token is received it is of course used, but if the permissions on the token don't allow image pulling we simply give up. we never try to run without token as a fallback. Even better, when we first request the token we should already set the correct scope, then it should be denied and everything should work as expected.

@jameshartig
Copy link

I agree the situation can be improved. Currently the list of credential sources is hardcoded. Could that be a flag instead so it could be customized if you don't need specific ones, like Google, for example?

@jgsuess
Copy link
Author

jgsuess commented Oct 18, 2024

At the moment, what is the best recommended workaround for this? It should probably be added to the other bug and added to the documentation on gitlab while this continues. Using the public GCR would likely be a very common application case.

@mzihlmann
Copy link

Good morning,

I did not yet receive an explicit ok from gitlab, but so far they said they are not concerned, so I think you're right it's due time to inform the other channel. I think @jameshartig's solution is the neatest so far as adding the feature flag impacts a lot more than just this bug, at least it also takes way longer to start a runner on their side (hence no default).

I agree the situation can be improved. Currently the list of credential sources is hardcoded. Could that be a flag instead so it could be customized if you don't need specific ones, like Google, for example?

in fear of repeating myself:

That's the downside of doing things implicitly, it works out of the box until it doesn't, and then everybody gets confused.

But I think even though the change would be easy it would be very difficult to get it in as it breaks both the user interface and philosophy.

@jgsuess
Copy link
Author

jgsuess commented Oct 19, 2024

I would agree that implicit is a bad idea. A good documentation that explains how this happens would be the preferred choice from my perspective. Otherwise, when you start trying to understand how it does something you end up with a question. So how does that even work? The behaviour becomes counterintuitive.

@jameshartig
Copy link

I think the default could be what is hardcoded now to not make it more complicated for anyone who is fine with the current keychain. Something like --cred-sources with a default value of "google,ecr,acr,gitlab".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@jameshartig @jgsuess @mzihlmann and others