Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timed out waiting for scheduling events during reinstall #778

Open
NeonSludge opened this issue Oct 31, 2024 · 2 comments · May be fixed by #794
Open

Timed out waiting for scheduling events during reinstall #778

NeonSludge opened this issue Oct 31, 2024 · 2 comments · May be fixed by #794

Comments

@NeonSludge
Copy link

NeonSludge commented Oct 31, 2024

During a reinstall triggered by changes to installFlags, k0sctl 0.19.0 timed out while waiting for scheduling events and exited with an error:

failed to observe scheduling events after api start-up, you can ignore this check by using --force: context deadline exceeded\ndidn't find any 'Scheduled' kube-system events after ...

It seems that the reinstall phase includes k0sctl trying to look for fresh scheduling-related events in the kube-system namespace:

k0sctl/phase/reinstall.go

Lines 120 to 126 in 9246ddc

log.Infof("%s: waiting for the scheduler to become ready", h)
if err := retry.Timeout(context.TODO(), retry.DefaultTimeout, node.ScheduledEventsAfterFunc(h, time.Now())); err != nil {
if !Force {
return fmt.Errorf("failed to observe scheduling events after api start-up, you can ignore this check by using --force: %w", err)
}
log.Warnf("%s: failed to observe scheduling events after api start-up: %s", h, err)
}

// ScheduledEventsAfterFunc returns a function that returns an error unless a kube-system 'Scheduled' event has occurred after the given time
// The returned function is intended to be used with pkg/retry.
func ScheduledEventsAfterFunc(h *cluster.Host, since time.Time) retryFunc {
return func(_ context.Context) error {
output, err := h.ExecOutput(h.Configurer.KubectlCmdf(h, h.K0sDataDir(), "-n kube-system get events --field-selector reason=Scheduled -o json"), exec.HideOutput(), exec.Sudo(h))
if err != nil {
return fmt.Errorf("failed to get kube system events: %w", err)
}
events := &statusEvents{}
if err := json.Unmarshal([]byte(output), &events); err != nil {
return fmt.Errorf("failed to decode kubectl output for kube-system events: %w", err)
}
for _, e := range events.Items {
if e.EventTime.Before(since) {
log.Tracef("%s: skipping prior event for %s: %s < %s", h, e.InvolvedObject.Name, e.EventTime.Format(time.RFC3339), since.Format(time.RFC3339))
continue
}
log.Debugf("%s: found a 'Scheduled' event occuring after %s", h, since)
return nil
}
return fmt.Errorf("didn't find any 'Scheduled' kube-system events after %s", since)
}
}

This particular cluster has an almost empty kube-system namespace: there are just four CoreDNS pods in it. Nothing really happens there and it can take a while for new scheduling events to show up.

@kke
Copy link
Contributor

kke commented Nov 18, 2024

Looks like k0sctl has to stop waiting for scheduling events in the case of changed installFlags since changing those do not necessarily even trigger any scheduling events.

@kke kke linked a pull request Nov 18, 2024 that will close this issue
@kke
Copy link
Contributor

kke commented Nov 18, 2024

As a workaround --no-wait will skip that part, but the PR #794 should resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants