Skip to content

BUG: fix race condition #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sdickhoven
Copy link

@sdickhoven sdickhoven commented Feb 15, 2025

this pr fixes a race condition that occurs when new cni config files are created while the install-cni.sh script is executing anywhere here:

https://github.com/linkerd/linkerd2-proxy-init/blob/cni-plugin/v1.6.0/cni-plugin/deployment/scripts/install-cni.sh#L323-L348

we have observed this race condition several times in our eks clusters over the past few days where we are chaining cilium to the aws vpc cni.

the install-cni.sh script simply fails to patch the cilium cni config sometimes. i.e. cilium and linkerd-cni must be starting up and manipulating /etc/cni/net.d at just about the same time.

i have a temporary workaround for this race condition in the form of the following kustomize patch:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: linkerd-cni
resources:
- manifest.yaml
patches:
- patch: |
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: linkerd-cni
      namespace: linkerd-cni
    spec:
      template:
        spec:
          containers:
          - name: install-cni
            lifecycle:
              postStart:
                exec:
                  command:
                  - /bin/sh
                  - -c
                  - |
                    sleep 10 # wait for `install-cni.sh` to enter into `inotifywait` loop
                    exec &> /tmp/cni_config_repair.out
                    set -x
                    HOST_CNI_NET="${CONTAINER_MOUNT_PREFIX:-/host}${DEST_CNI_NET_DIR:-/etc/cni/net.d}"
                    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
                    while read -r -d $'\0' file
                    do
                      sleep 2 # try to avoid interfering with non-atomic filesystem operations
                      grep -qw linkerd "${file}" && continue || echo "Found unpatched CNI config file: ${file}"
                      tmp_file="$(mktemp -ut linkerd-cni.XXXXXX)"
                      cp -fp "${file}" "${tmp_file}"
                      echo >> "${tmp_file}" # force hash diff
                      mv -f "${tmp_file}" "${file}" # IMPORTANT: use atomic filesystem operation!
                    done

@sdickhoven sdickhoven requested a review from a team as a code owner February 15, 2025 03:24
@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch from bf4f198 to a72d7b3 Compare February 15, 2025 04:56
Signed-off-by: Simon Dickhoven <[email protected]>
@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch from a72d7b3 to ea4cf3b Compare February 18, 2025 15:39
Signed-off-by: Simon Dickhoven <[email protected]>
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdickhoven for this. This certainly makes the logic more robust an less repetitive 👍
Can I ask you to please create a separate PR for the changes unrelated to the fix? We always squash commits when merging, so it'd be nice to have those as a separate commit.
You can disregard for now the CI failure; we're currently addressing that in other PRs.

Signed-off-by: Simon Dickhoven <[email protected]>
@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch from 9c26082 to 438915e Compare February 20, 2025 15:14
@sdickhoven
Copy link
Author

hi @alpeb 👋

i have taken out the unrelated changes. ✅

happy to submit another pr for those unrelated changes but they are 100% cosmetic and 0% functional. so there's really no need.

@sdickhoven
Copy link
Author

sdickhoven commented Feb 21, 2025

by the way, i just thought of a way in which install-cni.sh can go into an infinite patching loop.

specifically, the current sha256sum logic assumes that there is only ever going to be one cni config file.

picture this scenario: two new cni config files are created at very nearly the same time.

let's call them

  • cni1.conflist
  • cni2.conflist

sync() updates cni1.conflist and sets cni_conf_sha to the sha sum for that file.

moving the updated cni1.conflist file into place triggers another inotifywait event.

but before we can deal with that inotifywait event, we first have to deal with the one that is already waiting in the queue for cni2.conflist...

sync() updates cni2.conflist and sets cni_conf_sha to the sha sum for that file.

moving the updated cni2.conflist file into place triggers another inotifywait event.

but before we can deal with that inotifywait event, we first have to deal with the one that is already waiting in the queue for cni1.conflist...

...and since cni_conf_sha now contains the sha sum of cni2.conflist, sync() will not detect that it has already updated cni1.conflist...

so the cycle repeats... forever.

should i fix this behavior in this pr or create a new pr?

it is technically a different error case (not a race condition but an infinite loop).

i would address this by updating the sha sum logic to store all sha sums in a variable so that the sha sum for the correct file can be looked up.

if you want me to submit a separate pr, i'm going to wait until this pr is merged... because the required changes would be different for pre-/post-merge install-cni.sh scripts.

@sdickhoven
Copy link
Author

sdickhoven commented Feb 22, 2025

this is what my fix for the infinite patching loop would look like (using this pr as base):

diff --git a/cni-plugin/deployment/scripts/install-cni.sh b/cni-plugin/deployment/scripts/install-cni.sh
index cdf166c..b6156d7 100755
--- a/cni-plugin/deployment/scripts/install-cni.sh
+++ b/cni-plugin/deployment/scripts/install-cni.sh
@@ -264,14 +264,16 @@ sync() {
 
 # monitor_cni_config starts a watch on the host's CNI config directory
 monitor_cni_config() {
+  local new_sha
   inotifywait -m "${HOST_CNI_NET}" -e create,moved_to,modify |
     while read -r directory action filename; do
       if [[ "$filename" =~ .*.(conflist|conf)$ ]]; then 
         log "Detected change in $directory: $action $filename"
-        sync "$filename" "$action" "$cni_conf_sha"
+        sync "$filename" "$action" "$(jq -r --arg file "$filename" '.[$file] | select(.)' <<< "$cni_conf_sha")"
         # calculate file SHA to use in the next iteration
         if [[ -e "$directory/$filename" ]]; then
-          cni_conf_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
+          new_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
+          cni_conf_sha="$(jq -c --arg file "$filename" --arg sha "$new_sha" '. * {$file: $sha}' <<< "$cni_conf_sha")"
         fi
       fi
     done
@@ -315,7 +317,7 @@ install_cni_bin
 # Otherwise, new CNI config files can be created just _after_ the initial round
 # of patching and just _before_ we set up the `inotifywait` loop to detect new
 # CNI config files.
-cni_conf_sha="__init__"
+cni_conf_sha='{}'
 monitor_cni_config &
 
 # Append our config to any existing config file (*.conflist or *.conf)

this will maintain cni config sha hashes in a json object like this:

{
  "05-cilium.conflist" : "0a08ee0b9360e2ee2c3ed1d83263a3168832101346d0528a2474c3f80b7c73d6",
  "10-aws.conflist"    : "7ed380c9100362003cde9861cc6ef09307245eba3ea963cdba36186a30284acd"
}

the select(.) rejects null as output.

since the string null is not a valid sha sum, it's not really necessary to do this but seems like the right thing to do. 🤷

$ filename=abc
$ jq -r --arg file "$filename" '.[$file]' <<< '{}'
null
$ jq -r --arg file "$filename" '.[$file] | select(.)' <<< '{}'
$ jq -r --arg file "$filename" '.[$file] | select(.)' <<< '{"abc":"def"}'
def

also <<< only works with bash, not sh. if you prefer we can change

... <<< "$cni_conf_sha"

to

... < <(echo "$cni_conf_sha")

or

echo "$cni_conf_sha" | ...

@alpeb
Copy link
Member

alpeb commented Mar 5, 2025

Sorry for the late reply... As you correctly realized, this script was designed supposing the existence of a single cni config file. I'd like to spend more time investigating how linkerd-cni should behave in the presence of multiple such files before evaluating your solution. So let's for now leave your last diff posted as a comment out of the current PR. I'll give another look and round of tests to the current commits in the following days, so we can move forward.

@alpeb
Copy link
Member

alpeb commented Mar 7, 2025

Testing this results in the following output:

$ k -n linkerd-cni logs -f linkerd-cni-sq6cf
[2025-03-07 15:29:23] Wrote linkerd CNI binaries to /host/opt/cni/bin
Setting up watches.
Watches established.
[2025-03-07 15:29:23] Trigger CNI config detection for /host/etc/cni/net.d/10-calico.conflist
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist
Setting up watches.
Watches established.
[2025-03-07 15:29:23] New/changed file [10-calico.conflist] detected; re-installing
[2025-03-07 15:29:23] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-03-07 15:29:23] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "debug",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190"],
    "simulate": false,
    "use-wait-flag": true,
    "iptables-mode": "legacy",
    "ipv6": false
  }
}
[2025-03-07 15:29:23] Created CNI config /host/etc/cni/net.d/10-calico.conflist
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: MODIFY 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: MODIFY /host/etc/cni/net.d/10-calico.conflist; no real changes detected
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: CREATE /host/etc/cni/net.d/10-calico.conflist; no real changes detected
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: MODIFY 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: MODIFY /host/etc/cni/net.d/10-calico.conflist; no real changes detected

Those extra events at the bottom are new with this change. Given that we're now calling monitor_cni_config & earlier, the entire block that follows it should no longer be necessary, right?

@sdickhoven
Copy link
Author

sdickhoven commented Mar 7, 2025

Given that we're now calling monitor_cni_config & earlier, the entire block that follows it should no longer be necessary, right?

which block are you referring to?

this?:

# Append our config to any existing config file (*.conflist or *.conf)
config_files=$(find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \))
if [ -z "$config_files" ]; then
    log "No active CNI configuration files found"
else
  config_file_count=$(echo "$config_files" | grep -v linkerd | sort | wc -l)
  if [ "$config_file_count" -eq 0 ]; then
    log "No active CNI configuration files found"
  else
    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
      while read -r -d $'\0' file; do
        log "Installing CNI configuration for $file"
...

this block is definitely needed!

because cni config files can already exist before install-cni.sh even starts up. and you would never get an event from inotifywait for those files that already exist in /etc/cni/net.d.

the logic is:

  1. set up inotifywait
  2. force any already existing files to trigger inotifywait

i guess that the change in the block that follows monitor_cni_config & is not strictly necessary (because all that really matters is that the script dosn't miss new cni config files).

but i figured that consolidating the patching logic made sense.

does that answer your question? apologies if i did not understand your question correctly.

the log output looks exactly like what i would expect from my changes.

@sdickhoven
Copy link
Author

sdickhoven commented Mar 7, 2025

oh wait... i think i understand what you're asking...

Those extra events at the bottom are new with this change.

you are right. i would not expect those last two events. however, i thought that this was maybe your ci check making sure that trying to update the same file twice would not lead to double-patching. 🤷

if that's not the case then you are correct: those last two events shouldn't happen and i'm not sure where they would be coming from. 😕

...but only the last two. the two events before the last two are expected and a result of this patch. yes.

before this change, monitor_cni_config was not already running when install-cni.sh started patching existing cni config files.

so there wouldn't have been an inotifywait event associated with that activity.

now, when install-cni.sh patches an existing cni config file, inotifywait is already watching for new/modified files.

so the act of patching will trigger inotifywait. that part is new.

so:

1st inotifywatch event comes from:

log "Trigger CNI config detection for $file"
...
mv -f "$tmp_file" "$file"

2nd inotifywait event comes from install_cni_conf actually patching that file and moving the patched file into place:

https://github.com/linkerd/linkerd2-proxy-init/blob/proxy-init/v2.4.2/cni-plugin/deployment/scripts/install-cni.sh#L219

both of these events did not show up before because monitor_cni_config was not running yet (and was not used to accomplish the actual patching).

but, as you can see, the second event is ignored (as expected/intended).

but then there are two more CREATE/MODIFY events at the very end that i can't explain... again, i thought this was a ci check making sure that the shasum logic is doing the right thing. 🤷

@sdickhoven
Copy link
Author

i hope this answers your question. please let me know if i'm not making sense.

@sdickhoven
Copy link
Author

any progress on this?

cni chaining is kind of a thing and it would be good for linkerd to work correctly in environments where cni chaining is already in place... i.e. where linkerd isn't the only chained cni plugin.

chaining cilium to aws vpc cni is definitely a pretty common setup based on what i see in the cilium slack workspace.

let me know if there's anything else i can do to help get this very real race condition fixed.

@alpeb
Copy link
Member

alpeb commented May 26, 2025

Hi, thanks for pinging back; looking again into this...
You're right about requiring the logic to be triggered for the initial file. Quick question: is there a more atomic way to trigger that than having the copy followed by the move? Would just calling touch on the file trigger the sync? (that'll likely require adding another event to the list of inotifywait events).

As a side note, I've just realized libcni recently released a way of doing safe subdirectory-based plugin config loading, which would save us from all this config patching nonsense... but that approach will have to wait till it gets picked up by the major cloud providers. Perhaps we could have users opt into that behavior. Not asking you to implement any of that, just wanted to share what I found out 🙂

@sdickhoven
Copy link
Author

sdickhoven commented May 27, 2025

is there a more atomic way to trigger that than having the copy followed by the move?

yes. good point. touching a file would be cleaner indeed. something like this should work:

root@5a06a8dd2c88:~# inotifywait -m . -e moved_to,close_write &
[1] 211
Setting up watches.
Watches established.

root@5a06a8dd2c88:~# touch ttt
./ CLOSE_WRITE,CLOSE ttt
root@5a06a8dd2c88:~# echo > abc
./ CLOSE_WRITE,CLOSE abc
root@5a06a8dd2c88:~# cat ttt
root@5a06a8dd2c88:~# echo > def
./ CLOSE_WRITE,CLOSE def
root@5a06a8dd2c88:~# touch def
./ CLOSE_WRITE,CLOSE def
root@5a06a8dd2c88:~# touch /tmp/ttt
root@5a06a8dd2c88:~# mv /tmp/ttt .
./ MOVED_TO ttt
root@5a06a8dd2c88:~# rm ttt
root@5a06a8dd2c88:~#

I've just realized libcni recently released a way of doing safe subdirectory-based plugin config loading, which would save us from all this config patching nonsense

😍 that's great! i hope it becomes available soon. that would certainly make things easier.

do you want me to update my pr with the touch logic (which requires an update the inotifywait events we want to watch for)?

also, do you want me to open another pr for the other race condition i pointed out about the infinite patching loop due to the flawed shasum logic?

if so, then i'd want to wait until this pr is merged before i submit the second pr.

@alpeb
Copy link
Member

alpeb commented May 27, 2025

Yes, please update the PR with the touch logic. As for the looping issue, I think we're gonna hold on further logical changes into this script, as doing concurrency handling in bash is getting a little too complex. We'll be considering rewriting this in the future in a proper language, or perhaps start relying on the new libcni feature I mentioned, after getting a clearer picture about its availability.

@sdickhoven
Copy link
Author

sdickhoven commented May 27, 2025

As for the looping issue, I think we're gonna hold on further logical changes into this script, as doing concurrency handling in bash is getting a little too complex.

i personally think that this is a mistake and here's why:

handling the concurrency (whether in bash or in another language) is actually not very complex.

the inotifywait serializes all the filesystem events which makes them very easy to deal with... i.e. the bash code executes entirely sequentially.

the only thing you have to do is make sure that your logic can deal with filesystem events for multiple files in any order.

and i can trivially provoke an infinite patching loop (given the current shasum logic) by running the following two commands in quick succession:

echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "foo"}]}' > /etc/cni/net.d/01-foo.conflist
echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "bar"}]}' > /etc/cni/net.d/02-bar.conflist

...which i just did on one of my worker nodes and it sent linkerd-cni into said infinite patching loop:

[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 01-foo.conflist
[2025-05-27 19:32:18] New/changed file [01-foo.conflist] detected; re-installing
[2025-05-27 19:32:18] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:18] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:18] Created CNI config /host/etc/cni/net.d/01-foo.conflist
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: MODIFY 01-foo.conflist
[2025-05-27 19:32:18] Ignoring event: MODIFY /host/etc/cni/net.d/01-foo.conflist; no real changes detected
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 02-bar.conflist
[2025-05-27 19:32:18] New/changed file [02-bar.conflist] detected; re-installing
[2025-05-27 19:32:18] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:18] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:18] Created CNI config /host/etc/cni/net.d/02-bar.conflist
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: MODIFY 02-bar.conflist
[2025-05-27 19:32:18] Ignoring event: MODIFY /host/etc/cni/net.d/02-bar.conflist; no real changes detected
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 01-foo.conflist
[2025-05-27 19:32:18] New/changed file [01-foo.conflist] detected; re-installing
[2025-05-27 19:32:19] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:19] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:19] Created CNI config /host/etc/cni/net.d/01-foo.conflist
[2025-05-27 19:32:19] Detected change in /host/etc/cni/net.d/: MODIFY 01-foo.conflist
[2025-05-27 19:32:19] Ignoring event: MODIFY /host/etc/cni/net.d/01-foo.conflist; no real changes detected
[2025-05-27 19:32:19] Detected change in /host/etc/cni/net.d/: CREATE 02-bar.conflist
...

...and it's perfectly plausible that two cni plugin files are created in short succession.

the race condition that is fixed by the current version of my pr may be "masking" the infinite patch loop race condition to a large extent.

so fixing this race condition will probably make the other race condition more likely to occur.

anyway, my patch to address the infinite patch loop is quite trivial imo. and i would personally include it in this pr.

but... i don't want to talk you into doing something that you're not comfortable with. so... your call.

log "Trigger CNI config detection for $file"
# The following will trigger the `sync()` function via `inotifywait` in
# `monitor_cni_config()`.
touch "$file"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: there may be filesystems out there that round the mtime attribute to seconds.

see kubernetes/minikube#1594

this may render a touch ineffective (because the old and new mtime could end up being the same which may then not trigger an ATTRIB event) but i have no way of knowing/testing that.

the safest / most universally compatible way to do this is probably what i had before. but this is definitely "cleaner".

up to you if you want to keep the touch logic or revert back to the mv logic. happy to revert if you tell me to.

@sdickhoven
Copy link
Author

but... i don't want to talk you into doing something that you're not comfortable with. so... your call.

also... no rush. ☺️

we currently have the workaround outlined in the pr description deployed in all of our clusters. so we are fine. ✅

if you'd rather rewrite the cni patch logic in go or wait for the new subdirectory-based plugin config, that's fine by us.

but other linkerd-cni users may run into this race condition and won't be able to effectively troubleshoot / mitigate it. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants