tls error after system reboot #5124

ckt114 · 2024-10-17T01:58:25Z

Before creating an issue, make sure you've checked the following:

You are running the latest released version of k0s
Make sure you've searched for existing issues, both open and closed
Make sure you've searched for PRs too, a fix might've been merged already
You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.31.1+k0s.1

Sysinfo

`k0s sysinfo`

Total memory: 12.6 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 390.8 GiB (pass)
Relative disk space available for /var/lib/k0s: 85% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-26-amd64 (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

After I restart the OS I get the error below when running any kubectl command.

Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 127.0.1.1, 10.96.0.1, not 192.168.2.10

To fix it I had to run

sudo k0s stop
sudo k0s install controller --single --force
sudo k0s start

again then kubectl works again.

This is my /etc/k0s/k0s.config file.

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  name: k0s
spec:
  api:
    address: 192.168.2.10
    k0sApiPort: 9443
    port: 6443
    sans:
    - 192.168.2.10
  telemetry:
    enabled: false

After the system I rebooted and kubectl throws tls error and I ran sudo k0s kubeconfig admin I saw that the sever cluster address is 127.0.0.1 instead of 192.168.2.10. Also, I don’t know where the 10.96.0.1 IP in the error message comes from.

Steps to reproduce

Install k0s using confit above
Generate kubeconfig file
Reboot system

Expected behavior

K0s should retain the api.address from the installation config.

Actual behavior

K0s reverted the api address to 127.0.0.1 after system reboot instead of retaining the custom 192.168.2.10 api address.

Screenshots and logs

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

juanluisvaladas · 2024-10-21T11:41:58Z

Hi, we haven't seen this before so we believe this has to be triggered by something in your environment.

What happens if you reboot and instead of:

sudo k0s stop
sudo k0s install controller --single --force
sudo k0s start

You just do:

sudo k0s stop
sudo k0s start

IMPORTANT, don't do this immediately after the reboot, give it some time, maybe 5 minutes after the reboot because we suspect it may be a timing issue regarding network interfaces not being ready just yet.

Finally would it be possible to provide k0s logs after the reboot?

Tokynet · 2024-10-29T04:17:41Z

FWIW, I installed latest k0s on Sunday.
Rebooted my masternode today and ran into this issue.

k0sctl version
version: v0.19.2
commit: 081dfeb

Cluster definition

kind: Cluster
metadata:
  name: kruzter
spec:
  hosts:
  - ssh:
      address: 192.168.55.248
      user: root
      port: 22
    role: controller
  - ssh:
      address: 192.168.55.251
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.55.252
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.55.253
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.55.254
      user: root
      port: 22
    role: worker
  k0s:
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          kubeProxy:
            disabled: false
            mode: iptables
          kuberouter:
            autoMTU: true
            mtu: 0
            peerRouterASNs: ""
            peerRouterIPs: ""
          podCIDR: 10.244.0.0/16
          provider: custom
          serviceCIDR: 10.96.0.0/12
        podSecurityPolicy:
          defaultPolicy: 00-k0s-privileged
        storage:
          type: etcd
        telemetry:
          enabled: true

FWIW, it did get fixed after I did:

k0s stop
k0s start

After a few mins (all the time it took me to edit this post :) ) all the nodes became available on their own:

> k get nodes
NAME         STATUS   ROLES    AGE   VERSION
k8sworker1   Ready    <none>   25h   v1.31.1+k0s
k8sworker2   Ready    <none>   25h   v1.31.1+k0s
k8sworker3   Ready    <none>   25h   v1.31.1+k0s
k8sworker4   Ready    <none>   25h   v1.31.1+k0s

jnummelin · 2024-10-30T11:27:55Z

This really sounds like a timing issue, k0s maybe starts before the network has assigned the address to interface(s).

In which infra are you guys seeing this?

The k0s generated systemd unit does have dependency on network-online target:

After=network-online.target 
Wants=network-online.target

Maybe in your case that is not enough for some reason. 🤔

To test if that is the case, you could try to add some ExecStartPre to dump out the interface info before k0s actually starts. Something like:

ExecStartPre=-/usr/sbin/ip a s > /root/ip.info

That could give us some hints if this is actually the case.

You can also look for the critical chain of services with something like:

systemd-analyze critical-chain k0scontroller.service

That shows a tree in which order things got started in reboot. Note: you need to analyse this after the reboot, NOT after the manual restart of k0s.

This is for example what I see on Ubuntu:

root@mothership:/# systemd-analyze critical-chain k0scontroller.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

k0scontroller.service @10min 24.386s
└─basic.target @8.069s
  └─sockets.target @8.068s
    └─uuidd.socket @8.062s
      └─sysinit.target @8.036s
        └─cloud-init.service @5.640s +2.390s
          └─systemd-networkd-wait-online.service @3.806s +1.830s
            └─systemd-networkd.service @3.731s +72ms
              └─network-pre.target @3.727s
                └─cloud-init-local.service @2.251s +1.474s
                  └─systemd-remount-fs.service @755ms +31ms
                    └─systemd-fsck-root.service @696ms +54ms
                      └─systemd-journald.socket @574ms
                        └─-.mount @471ms
                          └─-.slice @471ms

jnummelin · 2024-10-30T11:31:21Z

Also, I don’t know where the 10.96.0.1 IP in the error message comes from

@ckt114 That is the default cluster internal svc address for the API

ckt114 · 2024-10-30T14:20:17Z

@jnummelin I added the ExecStartPre to /etc/systemd/system/k0scontroller.service, daemon-reload, and rebooted my system, not k0s, but nothing output to /root/ip.info.

This is the hierarchy of my k0scontroller service.

k0scontroller.service +7ms
└─network-online.target @1.707s
  └─network.target @1.707s
    └─networking.service @1.573s +133ms
      └─apparmor.service @1.550s +20ms
        └─local-fs.target @1.550s
          └─run-credentials-systemd\x2dtmpfiles\x2dsetup.service.mount @1.554s
            └─local-fs-pre.target @207ms
              └─keyboard-setup.service @163ms +43ms
                └─systemd-journald.socket @159ms
                  └─-.mount @123ms
                    └─-.slice @123ms

This is my k0scontroller.service

[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s

After=network-online.target
Wants=network-online.target

[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStartPre=-/usr/sbin/ip a s > /root/ip.info 2>&1
ExecStart=/usr/local/bin/k0s controller --single=true
Environment=""

RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always

[Install]
WantedBy=multi-user.target

jnummelin · 2024-10-31T10:51:41Z

but nothing output to /root/ip.info.

Seems that systemd does not like to write files like this. I should've tried to actually to run this myself as it actually fails:

Error: either "dev" is duplicate, or "/root/ip.info" is a garbage.

🤦

So just remove the file direction, what you'll get is the output of that command in the journal logs.

Your service hierarchy looks correct to me.

ckt114 added the bug Something isn't working label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tls error after system reboot #5124

tls error after system reboot #5124

ckt114 commented Oct 17, 2024 •

edited

Loading

juanluisvaladas commented Oct 21, 2024

Tokynet commented Oct 29, 2024 •

edited

Loading

jnummelin commented Oct 30, 2024 •

edited

Loading

jnummelin commented Oct 30, 2024

ckt114 commented Oct 30, 2024 •

edited

Loading

jnummelin commented Oct 31, 2024

tls error after system reboot #5124

tls error after system reboot #5124

Comments

ckt114 commented Oct 17, 2024 • edited Loading

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

juanluisvaladas commented Oct 21, 2024

Tokynet commented Oct 29, 2024 • edited Loading

jnummelin commented Oct 30, 2024 • edited Loading

jnummelin commented Oct 30, 2024

ckt114 commented Oct 30, 2024 • edited Loading

jnummelin commented Oct 31, 2024

ckt114 commented Oct 17, 2024 •

edited

Loading

Tokynet commented Oct 29, 2024 •

edited

Loading

jnummelin commented Oct 30, 2024 •

edited

Loading

ckt114 commented Oct 30, 2024 •

edited

Loading