Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls error after system reboot #5124

Open
4 tasks done
ckt114 opened this issue Oct 17, 2024 · 6 comments
Open
4 tasks done

tls error after system reboot #5124

ckt114 opened this issue Oct 17, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@ckt114
Copy link

ckt114 commented Oct 17, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.31.1+k0s.1

Sysinfo

`k0s sysinfo`
Total memory: 12.6 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 390.8 GiB (pass)
Relative disk space available for /var/lib/k0s: 85% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-26-amd64 (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

After I restart the OS I get the error below when running any kubectl command.

Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 127.0.1.1, 10.96.0.1, not 192.168.2.10

To fix it I had to run

sudo k0s stop
sudo k0s install controller --single --force
sudo k0s start

again then kubectl works again.

This is my /etc/k0s/k0s.config file.

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  name: k0s
spec:
  api:
    address: 192.168.2.10
    k0sApiPort: 9443
    port: 6443
    sans:
    - 192.168.2.10
  telemetry:
    enabled: false

After the system I rebooted and kubectl throws tls error and I ran sudo k0s kubeconfig admin I saw that the sever cluster address is 127.0.0.1 instead of 192.168.2.10. Also, I don’t know where the 10.96.0.1 IP in the error message comes from.

Steps to reproduce

  1. Install k0s using confit above
  2. Generate kubeconfig file
  3. Reboot system

Expected behavior

K0s should retain the api.address from the installation config.

Actual behavior

K0s reverted the api address to 127.0.0.1 after system reboot instead of retaining the custom 192.168.2.10 api address.

Screenshots and logs

No response

Additional context

No response

@ckt114 ckt114 added the bug Something isn't working label Oct 17, 2024
@juanluisvaladas
Copy link
Contributor

Hi, we haven't seen this before so we believe this has to be triggered by something in your environment.

What happens if you reboot and instead of:

sudo k0s stop
sudo k0s install controller --single --force
sudo k0s start

You just do:

sudo k0s stop
sudo k0s start

IMPORTANT, don't do this immediately after the reboot, give it some time, maybe 5 minutes after the reboot because we suspect it may be a timing issue regarding network interfaces not being ready just yet.

Finally would it be possible to provide k0s logs after the reboot?

@Tokynet
Copy link

Tokynet commented Oct 29, 2024

FWIW, I installed latest k0s on Sunday.
Rebooted my masternode today and ran into this issue.

k0sctl version
version: v0.19.2
commit: 081dfeb

Cluster definition

kind: Cluster
metadata:
  name: kruzter
spec:
  hosts:
  - ssh:
      address: 192.168.55.248
      user: root
      port: 22
    role: controller
  - ssh:
      address: 192.168.55.251
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.55.252
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.55.253
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.55.254
      user: root
      port: 22
    role: worker
  k0s:
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          kubeProxy:
            disabled: false
            mode: iptables
          kuberouter:
            autoMTU: true
            mtu: 0
            peerRouterASNs: ""
            peerRouterIPs: ""
          podCIDR: 10.244.0.0/16
          provider: custom
          serviceCIDR: 10.96.0.0/12
        podSecurityPolicy:
          defaultPolicy: 00-k0s-privileged
        storage:
          type: etcd
        telemetry:
          enabled: true

FWIW, it did get fixed after I did:

k0s stop
k0s start

After a few mins (all the time it took me to edit this post :) ) all the nodes became available on their own:

> k get nodes
NAME         STATUS   ROLES    AGE   VERSION
k8sworker1   Ready    <none>   25h   v1.31.1+k0s
k8sworker2   Ready    <none>   25h   v1.31.1+k0s
k8sworker3   Ready    <none>   25h   v1.31.1+k0s
k8sworker4   Ready    <none>   25h   v1.31.1+k0s

@jnummelin
Copy link
Member

jnummelin commented Oct 30, 2024

This really sounds like a timing issue, k0s maybe starts before the network has assigned the address to interface(s).

In which infra are you guys seeing this?

The k0s generated systemd unit does have dependency on network-online target:

After=network-online.target 
Wants=network-online.target 

Maybe in your case that is not enough for some reason. 🤔

To test if that is the case, you could try to add some ExecStartPre to dump out the interface info before k0s actually starts. Something like:

ExecStartPre=-/usr/sbin/ip a s > /root/ip.info

That could give us some hints if this is actually the case.

You can also look for the critical chain of services with something like:

systemd-analyze critical-chain k0scontroller.service

That shows a tree in which order things got started in reboot. Note: you need to analyse this after the reboot, NOT after the manual restart of k0s.

This is for example what I see on Ubuntu:

root@mothership:/# systemd-analyze critical-chain k0scontroller.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

k0scontroller.service @10min 24.386s
└─basic.target @8.069s
  └─sockets.target @8.068s
    └─uuidd.socket @8.062s
      └─sysinit.target @8.036s
        └─cloud-init.service @5.640s +2.390s
          └─systemd-networkd-wait-online.service @3.806s +1.830s
            └─systemd-networkd.service @3.731s +72ms
              └─network-pre.target @3.727s
                └─cloud-init-local.service @2.251s +1.474s
                  └─systemd-remount-fs.service @755ms +31ms
                    └─systemd-fsck-root.service @696ms +54ms
                      └─systemd-journald.socket @574ms
                        └─-.mount @471ms
                          └─-.slice @471ms

@jnummelin
Copy link
Member

Also, I don’t know where the 10.96.0.1 IP in the error message comes from

@ckt114 That is the default cluster internal svc address for the API

@ckt114
Copy link
Author

ckt114 commented Oct 30, 2024

@jnummelin I added the ExecStartPre to /etc/systemd/system/k0scontroller.service, daemon-reload, and rebooted my system, not k0s, but nothing output to /root/ip.info.

This is the hierarchy of my k0scontroller service.

k0scontroller.service +7ms
└─network-online.target @1.707s
  └─network.target @1.707s
    └─networking.service @1.573s +133ms
      └─apparmor.service @1.550s +20ms
        └─local-fs.target @1.550s
          └─run-credentials-systemd\x2dtmpfiles\x2dsetup.service.mount @1.554s
            └─local-fs-pre.target @207ms
              └─keyboard-setup.service @163ms +43ms
                └─systemd-journald.socket @159ms
                  └─-.mount @123ms
                    └─-.slice @123ms

This is my k0scontroller.service

[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s

After=network-online.target
Wants=network-online.target

[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStartPre=-/usr/sbin/ip a s > /root/ip.info 2>&1
ExecStart=/usr/local/bin/k0s controller --single=true
Environment=""

RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always

[Install]
WantedBy=multi-user.target

@jnummelin
Copy link
Member

but nothing output to /root/ip.info.

Seems that systemd does not like to write files like this. I should've tried to actually to run this myself as it actually fails:

Error: either "dev" is duplicate, or "/root/ip.info" is a garbage.

🤦

So just remove the file direction, what you'll get is the output of that command in the journal logs.

Your service hierarchy looks correct to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants