Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for non-anonymous login for Network Isolated Clusters #5650

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

AlisonB319
Copy link
Collaborator

@AlisonB319 AlisonB319 commented Jan 23, 2025

What type of PR is this?

This PR adds support for non-anonymous ACR pull. It currently is not available, as there is a front end block on using a non-anonymous ACR. At the moment there is no way to get the kubelet identity, so the login cannot be preformed. However, it was decided that we wanted to have the logic in place.

This is a combination of two PRs
#4879
#5508

TODO:
Going to attach a vmss identity through e2es to test out the functionality of oras login

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:

Release note:

none

@AlisonB319 AlisonB319 changed the title Alburgess/oras identity2 feat: add support for non-anonymous login for Network Isolated Clusters Jan 23, 2025
@@ -405,6 +405,40 @@ getPrimaryNicIP() {
echo "$ip"
}

orasLogin() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djsly right now the flow is
check anonymous pull -> success return (this is the current case in prod atm)

check anonymous pull -> failure -> check non-anonymous

In the chat, it was stated to do the anonymous check first, it might be better to do anonymous first since that is what is currently happening in production? When the customer turns off anonymous in the ACR, then the first check should fail and the oras login will take place.

it currently does the anonymous check by pulling a helloworld from the acr

this flow is mostly XInhe's logic, I thought it made sense, but I can adjust as we see fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should download a binary to test access.

Also, I think that they are looking to not support anonymous pull at all in the future, so it might be best to try non-anonymous first ?

local acr_url=$3

echo "${access_retries} retries for acr access check"
sample_image="$acr_url/mcr/hello-world:latest"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this image is used to check that the cache is correct? seems a bit brittle, is there a different way we can check this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we need to do it this way, I'd opt for also removing this image from disk afterwards as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a quick chatgpt check said that I could try pulling a non-existant image
oras pull .azurecr.io/nonexistent:tag
and if it says like error unknown than your logged in, verses an auth error - I'll look into this when I get the e2e testing working

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cam@workbook ~$ oras repo                                                                                                                                                                                   
Repository operations

Usage:
  oras repo [command]

Aliases:
  repo, repository

Available Commands:
  ls          List the repositories under the registry
  tags        Show tags of the target repository

Flags:
  -h, --help   help for repo

Use "oras repo [command] --help" for more information about a command.

would oras repo ls work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if use oras pull .azurecr.io/nonexistent:tag, we need to remove ERR_ORAS_PULL_INCORRECT_CACHE error logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the hello world image, for some reason ever, is removed? I agree with Cameron, that this is brittle.

@@ -683,4 +714,42 @@ removeKubeletFlag() {
fi
}

oras_login_with_identity() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to set +x at the top of this so we don't log the AAD or ACR tokens

Copy link
Collaborator Author

@AlisonB319 AlisonB319 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, for some reason set +x makes the shellspec unit tests very unhappy, so I'll need to dig a bit here.

they stall out and then throw a huge error message

Copy link
Collaborator

@cameronmeissner cameronmeissner Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh I actually ran into this myself recently

I found that if you invoke the function you're trying test using run (as opposed to call) in the spec, then the function will be called within a sub-shell rather than the parent shell where shellspec is actually running - that will prevent the breaking and infinte hang you're seeing

the other option is to just have the caller invoke set instead of doing it directly within the function itself

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see you're already using run in your other specs, so yeah you can just do that to test oras_login_with_kubelet_identity as well

local client_id=$2
local tenant_id=$3

raw_access_token=$(curl "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F&client_id=$client_id" -H Metadata:true -s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to wrap the calls to IMDS + ACR exchange endpoint in retries

@@ -405,6 +405,40 @@ getPrimaryNicIP() {
echo "$ip"
}

orasLogin() {
echo "Checking access to ACR with anonymous pull"
logs_to_events "AKS.CSE.orasLogin.retrycmd_acr_access_check_anon" retrycmd_acr_access_check 10 1 "${BOOTSTRAP_PROFILE_CONTAINER_REGISTRY_SERVER}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keep naming consistent, i.e either consider adding _anon to the actual function call, or remove it from the logs_to_events name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on your thoughts here, having a slightly different name allows me to essentially mock both function calls in the unit test

logs_to_events() {
                logs_to_events() {
                case "$1" in
                    "AKS.CSE.orasLogin.retrycmd_acr_access_check_anon")
                        return $ERR_ORAS_PULL_UNAUTHORIZED
                        ;;
                    "AKS.CSE.orasLogin.oras_login_with_kubelet_identity")
                        return 0
                        ;;
                    "AKS.CSE.orasLogin.retrycmd_acr_access_check_non_anon")
                        return 1 
                        ;;
                    *)
                        return -1
                        ;;
                esac
            }

also while logging, it might make it more clear which attempt its trying, anonymous vs non-anonymous.

However, I am going to tweak the flow so I'll see if I even end up making the retrycmd_acr_access_check call twice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants