Skip to content

Flakey test on kindtest related to slack api #197

@gokken-roko

Description

@gokken-roko

What

kindtest failed with this error.

Run make -C kindtest test
  make -C kindtest test
  shell: /usr/bin/bash -e {0}
  env:
    GIT_SSH_COMMAND: ssh -i /tmp/deploy-key.pem
make: Entering directory '/home/runner/work/meows/meows/kindtest'
mkdir -p /home/runner/work/meows/meows/kindtest/..//tmp/repo
rm -rf /home/runner/work/meows/meows/kindtest/..//tmp/repo/.git /home/runner/work/meows/meows/kindtest/..//tmp/repo/.github
. /home/runner/work/meows/meows/kindtest/..//.secret.env.sh; \
	PATH=/home/runner/work/meows/meows/kindtest/..//tmp/bin:/home/runner/go/bin:/opt/hostedtoolcache/go/1.[2](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:2)3.6/x64/bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
	KINDTEST=1 BIN_DIR=/home/runner/work/meows/meows/kindtest/..//tmp/bin \
	TEST_REPO_WORK_DIR=/home/runner/work/meows/meows/kindtest/..//tmp/repo \
	GITHUB_APP_PRIVATE_KEY_PATH=/home/runner/work/meows/meows/kindtest/..//.secret.private-key.pem \
	go test . -v -timeout=15m -ginkgo.v --ginkgo.fail-fast 
=== RUN   TestOnKind
Running Suite: KindTest Suite - /home/runner/work/meows/meows/kindtest
======================================================================
Random Seed: 1741760459

Will run 15 of 15 specs
------------------------------
[BeforeSuite] 
/home/runner/work/meows/meows/kindtest/suite_test.go:60
testID: kindtest-2025-0[3](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:3)-12-062059
  STEP: checking env variables @ 03/12/25 06:20:59.6[4](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:4)4
This test uses the binaries under /home/runner/work/meows/meows/kindtest/..//tmp/bin
  STEP: initializing github client @ 03/12/2[5](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:5) 06:20:59.644
  STEP: creating test branch in CI test repository @ 03/12/25 0[6](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:7):20:59.644
[email protected]:neco-test/meows-ci
[BeforeSuite] PASSED [2.149 seconds]
------------------------------
meows bootstrap delete namespaces
/home/runner/work/meows/meows/kindtest/bootstrap_test.go:9
• [0.098 seconds]
------------------------------
meows bootstrap create namespaces
/home/runner/work/meows/meows/kindtest/bootstrap_test.go:18
• [0.584 seconds]
------------------------------
meows bootstrap should deploy CRD
/home/runner/work/meows/meows/kindtest/bootstrap_test.go:29
  STEP: applying manifests @ 03/12/25 06:21:02.4[7](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:8)6
• [0.326 seconds]
------------------------------
meows bootstrap should deploy controller successfully
/home/runner/work/meows/meows/kindtest/bootstrap_test.go:36
  STEP: applying manifests @ 03/12/25 06:21:02.[8](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:9)02
  STEP: confirming all controller pods are ready @ 03/12/25 06:21:04.171
• [21.526 seconds]
------------------------------
meows bootstrap should deploy slack-agent successfully
/home/runner/work/meows/meows/kindtest/bootstrap_test.go:46
  STEP: creating secret for slack-agent @ 03/12/25 06:21:24.328
  STEP: applying manifests @ 03/12/25 06:21:24.374
  STEP: confirming all slack-agent pods are ready @ 03/12/25 06:21:24.627
• [10.438 seconds]
------------------------------
meows runner should create runner pods
/home/runner/work/meows/meows/kindtest/runner_test.go:14
  STEP: creating repo-runnerpool1 @ 03/12/25 06:21:34.767
  STEP: creating repo-runnerpool2 @ 03/12/25 06:21:34.8[9](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:10)5
  STEP: creating org-runnerpool1 @ 03/12/25 06:21:35.015
  STEP: confirming all repo-runner1 pods are ready @ 03/12/25 06:21:35.141
  STEP: confirming all repo-runner2 pods are ready @ 03/12/25 06:21:56.056
  STEP: confirming all org-runner1 pods are ready @ 03/12/25 06:21:56.378
• [21.897 seconds]
------------------------------
meows runner should run the job-success on a runner pod and delete the pod immediately
/home/runner/work/meows/meows/kindtest/runner_test.go:46
  STEP: running 'job-success' workflow @ 03/12/25 06:21:56.664
  STEP: checking status @ 03/12/25 06:22:07.406
  STEP: confirming the pod terminating @ 03/12/25 06:22:07.406
- FinishedAt  :  2025-03-12 06:22:06.987127414 +0000 UTC
- DeletedAt   :  2025-03-12 06:22:46 +0000 UTC
  STEP: confirming a slack message is successfully sent @ 03/12/25 06:22:17.365
  STEP: waiting for the pod deleted @ 03/12/25 06:22:17.445
• [30.890 seconds]
------------------------------
meows runner should run the job-cancelled on a runner pod and delete the pod immediately
/home/runner/work/meows/meows/kindtest/runner_test.go:77
  STEP: running 'job-cancelled' workflow @ 03/12/25 06:22:27.554
  STEP: checking status @ 03/12/25 06:22:37.404
  STEP: confirming the pod terminating @ 03/12/25 06:22:37.404
- FinishedAt  :  2025-03-12 06:22:37.320073609 +0000 UTC
- DeletedAt   :  2025-03-12 06:23:16 +0000 UTC
  STEP: confirming a slack message is successfully sent @ 03/12/25 06:22:47.34
  STEP: waiting for the pod deleted @ 03/12/25 06:22:47.398
• [29.948 seconds]
------------------------------
meows runner should run the job-failure on a runner pod and delete the pod after a while
/home/runner/work/meows/meows/kindtest/runner_test.go:[10](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:11)8
  STEP: running 'job-failure' workflow @ 03/12/25 06:22:57.502
  STEP: checking status @ 03/12/25 06:23:07.174
  STEP: checking pdb @ 03/12/25 06:23:07.323
  STEP: trying to evict the pod @ 03/12/25 06:23:07.373
  STEP: confirming the pod terminating @ 03/12/25 06:23:07.441
- FinishedAt  :  2025-03-12 06:23:06.354633775 +0000 UTC
- DeletedAt   :  2025-03-12 06:24:06 +0000 UTC
  STEP: confirming a slack message is successfully sent @ 03/12/25 06:23:36.7
  [FAILED] in [It] - /home/runner/work/meows/meows/kindtest/runner_test.go:148 @ 03/12/25 06:23:36.761
• [FAILED] [39.258 seconds]
meows runner [It] should run the job-failure on a runner pod and delete the pod after a while
/home/runner/work/meows/meows/kindtest/runner_test.go:108

  [FAILED] no match line, pod: %!d(string=kindtest-2025-03-12-062059-test-repo-runner1/repo-runnerpool1-755f9c78c9-8f6tn), stdout: {"level":"info","ts":1741760600.3485188,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":154}}
  {"level":"info","ts":1741760600.776868,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connected","data":{"ConnectionCount":154,"Info":{"url":"wss://wss-primary.slack.com/link/?ticket=225bd57d-bc6f-45b5-9250-1f72466d89b3&app_id=1d30027e21dbebcfa368a86b0fead7a73a81de0f7802e3562a954da8fab959b5"}}}
  {"level":"info","ts":1741760600.7921557,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"incoming_error","data":"read tcp 10.244.2.6:40974->44.235.135.203:443: use of closed network connection"}
  {"level":"info","ts":1741760600.7922094,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":155}}
  {"level":"info","ts":1741760600.9181917,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connection_error","data":"slack rate limit exceeded, retry after 10s"}
  {"level":"info","ts":1741760610.922869,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":2,"ConnectionCount":155}}
  {"level":"info","ts":17417606[11](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:12).38605,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connected","data":{"ConnectionCount":155,"Info":{"url":"wss://wss-primary.slack.com/link/?ticket=a2638008-0a0f-47a7-a1e8-157350406fca&app_id=1d30027e21dbebcfa368a86b0fead7a73a81de0f7802e3562a954da8fab959b5"}}}
  {"level":"info","ts":1741760611.396[12](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:13)25,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"incoming_error","data":"read tcp 10.244.2.6:51536->54.189.250.[13](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:14)5:443: use of closed network connection"}
  {"level":"info","ts":1741760611.3961556,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":156}}
  {"level":"info","ts":1741760611.5207477,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connection_error","data":"slack rate limit exceeded, retry after 10s"}
  {"level":"info","ts":1741760615.177441,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connected","data":{"ConnectionCount":163,"Info":{"url":"wss://wss-primary.slack.com/link/?ticket=8a6cd05e-409e-4cc3-a4e2-ac79b0be3dd5&app_id=1d30027e21dbebcfa368a86b0fead7a73a81de0f7802e3562a954da8fab959b5"}}}
  {"level":"info","ts":1741760615.192736,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":164}}
  {"level":"info","ts":1741760615.6293435,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connected","data":{"ConnectionCount":164,"Info":{"url":"wss://wss-primary.slack.com/link/?ticket=3935bd07-d22a-4a2b-8d76-fc16d2bc9682&app_id=1d30027e21dbebcfa368a86b0fead7a73a81de0f7802e3562a954da8fab959b5"}}}
  {"level":"info","ts":1741760615.6510894,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":165}}
  {"level":"info","ts":1741760616.119527,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connected","data":{"ConnectionCount":165,"Info":{"url":"wss://wss-primary.slack.com/link/?ticket=bfac787b-faff-454f-9b50-cca02b6b4b60&app_id=1d30027e21dbebcfa368a86b0fead7a73a81de0f7802e3562a954da8fab959b5"}}}
  {"level":"info","ts":1741760616.1287186,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"incoming_error","data":"read tcp 10.244.1.3:51090->52.41.113.249:443: use of closed network connection"}
  {"level":"info","ts":1741760616.1287708,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":166}}
  {"level":"info","ts":1741760616.5675313,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connected","data":{"ConnectionCount":166,"Info":{"url":"wss://wss-primary.slack.com/link/?ticket=8568329d-d82c-4e9b-ba77-dd998be72b74&app_id=1d30027e21dbebcfa368a86b0fead7a73a81de0f7802e3562a954da8fab959b5"}}}
  {"level":"info","ts":1741760616.5809798,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connecting","data":{"Attempt":1,"ConnectionCount":167}}
  {"level":"info","ts":1741760616.7099845,"caller":"agent/socket.go:66","msg":"skipped event because type is not interactive","type":"connection_error","data":"slack rate limit exceeded, retry after 10s"}

  Unexpected error:
      <*errors.errorString | 0x197a3c0>: 
      EOF
      {s: "EOF"}
  occurred
  In [It] at: /home/runner/work/meows/meows/kindtest/runner_test.go:[14](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:15)8 @ 03/12/25 06:23:36.761
------------------------------
SSSSSS

Summarizing 1 Failure:
  [FAIL] meows runner [It] should run the job-failure on a runner pod and delete the pod after a while
  /home/runner/work/meows/meows/kindtest/runner_test.go:148

Ran 9 of [15](https://github.com/cybozu-go/meows/actions/runs/13804914883/job/38613815280#step:9:16) Specs in 157.117 seconds
FAIL! -- 8 Passed | 1 Failed | 0 Pending | 6 Skipped
--- FAIL: TestOnKind (157.12s)
FAIL
FAIL	github.com/cybozu-go/meows/kindtest	157.125s
FAIL
make: *** [Makefile:56: test] Error 1
make: Leaving directory '/home/runner/work/meows/meows/kindtest'
Error: Process completed with exit code 2.

The error looks like a kindtest failed because of slack rate limit exceeded. I re-ran this GitHub action workflow, and this error was solved.
But, the reason is not clear, and the test is flaky.

How

Ideas to solve this problem

  • Fix flakey test related to slack API.
    • Retry slack api with interval when rate-limit error.
  • Add document to describe about slack API rate-limit.
  • Clear error message : the developer easily recognize next action when tests flakey fail. (e.g. re-run kindtest)

Checklist

  • Finish implementation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions