-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for non-anonymous login for Network Isolated Clusters #5650
base: master
Are you sure you want to change the base?
Conversation
@@ -405,6 +405,40 @@ getPrimaryNicIP() { | |||
echo "$ip" | |||
} | |||
|
|||
orasLogin() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djsly right now the flow is
check anonymous pull -> success return (this is the current case in prod atm)
check anonymous pull -> failure -> check non-anonymous
In the chat, it was stated to do the anonymous check first, it might be better to do anonymous first since that is what is currently happening in production? When the customer turns off anonymous in the ACR, then the first check should fail and the oras login will take place.
it currently does the anonymous check by pulling a helloworld from the acr
this flow is mostly XInhe's logic, I thought it made sense, but I can adjust as we see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should download a binary to test access.
Also, I think that they are looking to not support anonymous pull at all in the future, so it might be best to try non-anonymous first ?
local acr_url=$3 | ||
|
||
echo "${access_retries} retries for acr access check" | ||
sample_image="$acr_url/mcr/hello-world:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this image is used to check that the cache is correct? seems a bit brittle, is there a different way we can check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we need to do it this way, I'd opt for also removing this image from disk afterwards as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a quick chatgpt check said that I could try pulling a non-existant image
oras pull .azurecr.io/nonexistent:tag
and if it says like error unknown than your logged in, verses an auth error - I'll look into this when I get the e2e testing working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cam@workbook ~$ oras repo
Repository operations
Usage:
oras repo [command]
Aliases:
repo, repository
Available Commands:
ls List the repositories under the registry
tags Show tags of the target repository
Flags:
-h, --help help for repo
Use "oras repo [command] --help" for more information about a command.
would oras repo ls
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if use oras pull .azurecr.io/nonexistent:tag
, we need to remove ERR_ORAS_PULL_INCORRECT_CACHE
error logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the hello world image, for some reason ever, is removed? I agree with Cameron, that this is brittle.
@@ -683,4 +714,42 @@ removeKubeletFlag() { | |||
fi | |||
} | |||
|
|||
oras_login_with_identity() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to set +x
at the top of this so we don't log the AAD or ACR tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, for some reason set +x makes the shellspec unit tests very unhappy, so I'll need to dig a bit here.
they stall out and then throw a huge error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh I actually ran into this myself recently
I found that if you invoke the function you're trying test using run
(as opposed to call
) in the spec, then the function will be called within a sub-shell rather than the parent shell where shellspec is actually running - that will prevent the breaking and infinte hang you're seeing
the other option is to just have the caller invoke set
instead of doing it directly within the function itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see you're already using run
in your other specs, so yeah you can just do that to test oras_login_with_kubelet_identity
as well
local client_id=$2 | ||
local tenant_id=$3 | ||
|
||
raw_access_token=$(curl "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F&client_id=$client_id" -H Metadata:true -s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to wrap the calls to IMDS + ACR exchange endpoint in retries
@@ -405,6 +405,40 @@ getPrimaryNicIP() { | |||
echo "$ip" | |||
} | |||
|
|||
orasLogin() { | |||
echo "Checking access to ACR with anonymous pull" | |||
logs_to_events "AKS.CSE.orasLogin.retrycmd_acr_access_check_anon" retrycmd_acr_access_check 10 1 "${BOOTSTRAP_PROFILE_CONTAINER_REGISTRY_SERVER}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep naming consistent, i.e either consider adding _anon to the actual function call, or remove it from the logs_to_events name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious on your thoughts here, having a slightly different name allows me to essentially mock both function calls in the unit test
logs_to_events() {
logs_to_events() {
case "$1" in
"AKS.CSE.orasLogin.retrycmd_acr_access_check_anon")
return $ERR_ORAS_PULL_UNAUTHORIZED
;;
"AKS.CSE.orasLogin.oras_login_with_kubelet_identity")
return 0
;;
"AKS.CSE.orasLogin.retrycmd_acr_access_check_non_anon")
return 1
;;
*)
return -1
;;
esac
}
also while logging, it might make it more clear which attempt its trying, anonymous vs non-anonymous.
However, I am going to tweak the flow so I'll see if I even end up making the retrycmd_acr_access_check call twice
What type of PR is this?
This PR adds support for non-anonymous ACR pull. It currently is not available, as there is a front end block on using a non-anonymous ACR. At the moment there is no way to get the kubelet identity, so the login cannot be preformed. However, it was decided that we wanted to have the logic in place.
This is a combination of two PRs
#4879
#5508
TODO:
Going to attach a vmss identity through e2es to test out the functionality of oras login
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Requirements:
Special notes for your reviewer:
Release note: