Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to start k0s on arch fresh installation #3351

Closed
4 tasks done
hehehenri opened this issue Aug 7, 2023 · 23 comments
Closed
4 tasks done

Failed to start k0s on arch fresh installation #3351

hehehenri opened this issue Aug 7, 2023 · 23 comments
Assignees
Labels
bug Something isn't working Stale

Comments

@hehehenri
Copy link

hehehenri commented Aug 7, 2023

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.4.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 27 Jul 2023 22:02:18 +0000 x86_64 GNU/Linux
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo

Version

v1.27.4+k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "9053313309a029437f8acf5805b6351c0af0e835b945656df826f90d015fde8c" (from machine) (pass)
Total memory: 38.4 GiB (pass)
Disk space available for /var/lib/k0s: 735.0 GiB (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.4.7-arch1-1 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  Executable in path: modprobe: /usr/bin/modprobe (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (assumed) (pass)
    cgroup controller "freezer": available (assumed) (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

Got error: status: can't do http request: /run/k0s/status.sock status after running sudo k0s status on a fresh install.

$ journalctl -u k0scontroller

Aug 06 23:37:25 archlinux k0s[170281]: Error: error detecting local IP: lookup localhost: Try again
Aug 06 23:37:25 archlinux systemd[1]: k0scontroller.service: Main process exited, code=exited, statu>
Aug 06 23:37:25 archlinux systemd[1]: k0scontroller.service: Failed with result 'exit-code'.

Steps to reproduce

  1. curl -sSLf https://get.k0s.sh | sudo sh
  2. sudo k0s install controller --single
  3. sudo k0s start
  4. sudo k0s status

Expected behavior

Version: v1.27.4+k0s.0
Process ID: 4315
Role: controller
Workloads: true
SingleNode: true
Kube-api probing successful: true
Kube-api probing last error:

Actual behavior

error: status: can't do http request: /run/k0s/status.sock status

Screenshots and logs

No response

Additional context

It worked after creating the config file

mkdir -p /etc/k0s
k0s config create > /etc/k0s/k0s.yaml
@hehehenri hehehenri added the bug Something isn't working label Aug 7, 2023
@jnummelin
Copy link
Member

hmm, looking where the error stems from:

addrs, err := resolver.LookupIPAddr(ctx, "localhost")

That errors out for some reason. Is the machine configured to have any mapping for localhost?

@twz123 twz123 self-assigned this Aug 7, 2023
@hehehenri
Copy link
Author

hmm, looking where the error stems from:

addrs, err := resolver.LookupIPAddr(ctx, "localhost")

That errors out for some reason. Is the machine configured to have any mapping for localhost?

I thought about the same thing and added the localhost entry to /etc/hosts file, then reinstalled k0s from scratch again but I still got the same error running sudo k0s status.

This is my /etc/hosts

# Static table lookup for hostnames.
# See hosts(5) for details.

127.0.0.1 localhost

@kke
Copy link
Contributor

kke commented Aug 8, 2023

The http error in k0s status just means k0s isn't running, it uses local unix sockets, no hostname lookup.

sudo k0s reset may be required between installation attempts.

The lookup happens when k0s tries to generate SAN certificate:

	hostnames := []string{
		"kubernetes",
		"kubernetes.default",
		"kubernetes.default.svc",
		"kubernetes.default.svc.cluster",
		fmt.Sprintf("kubernetes.svc.%s", c.ClusterSpec.Network.ClusterDomain),
		"localhost",
		"127.0.0.1",
	}

	localIPs, err := detectLocalIPs(ctx)
	if err != nil {
		return fmt.Errorf("error detecting local IP: %w", err) // <-- here the error is decorated
	}
	hostnames = append(hostnames, localIPs...)
	hostnames = append(hostnames, c.ClusterSpec.API.Sans()...)

        ....
func detectLocalIPs(ctx context.Context) ([]string, error) {
	resolver := net.DefaultResolver

	addrs, err := resolver.LookupIPAddr(ctx, "localhost") /// <-- this triggers the error
	if err != nil {
		return nil, err
	}

It appears k0s adds localhost and 127.0.0.1 anyway and whatever comes from detectLocalIPs and spec.api.sans (could contain duplicates).

This happens every time a k0s controller is started and having /etc/k0s/k0s.yaml or not shouldn't make any difference.

The Try again in the error message kind of hints that maybe k0s could retry the lookup.

@twz123
Copy link
Member

twz123 commented Aug 10, 2023

A failure to lookup localhost smells a bit like a more general name resolution issue to me. Note that it's not resolving to an empty list of addresses (which would be fine for k0s), but it errors out with a hard error. What results do you get when trying to lookup localhost with other tools, say nslookup or dig? Do other names work, like your machine's hostname or, say, github.com? There are a variety of things that could influence name resolution. To name a few:

  • some problematic settings in /etc/resolv.conf
  • some problematic settings in /etc/nsswitch.conf
  • some issues with systemd-resolved

I prepared PR #3366 that'll ignore and log the localhost resolution error. However, if resolving localhost continues to fail, I'm pretty sure that other components will fail as well. So not sure if the PR will help much. I quickly checked the sources, there are quite a few references to localhost, e.g. in some etcd certificates, konnectivity, kube-proxy, NLLB. They all assume that localhost resolves to something in the loopback range of 127.x.x.x.

@twz123
Copy link
Member

twz123 commented Aug 11, 2023

I can repro this on an Arch Linux VM. So I have something to test on.

@twz123
Copy link
Member

twz123 commented Aug 11, 2023

The error doesn't occur on main, I bisected it and found out that #3115 fixed it. Probably something in musl. @ncopa Do you maybe remember some noteworthy changes in musl that would affect name resolution of localhost that landed in Alpine 3.18?

Maybe this?

It also makes a number of other bug fixes and improvements in DNS and related functionality, including making both the modern and legacy API results differentiate between NODATA and NxDomain conditions so that the caller can handle them differently.

@twz123
Copy link
Member

twz123 commented Aug 14, 2023

Note that I cannot repro this with the docker.io/library/archlinux Docker image. My host machine is running a 6.1 kernel, the Arch VM is running 6.4. I also tried it in another VM using Linux 6.4.8 #1-NixOS SMP PREEMPT_DYNAMIC Thu Aug 3 08:26:15 UTC 2023 x86_64 GNU/Linux. Works. So it seems to me that there is some peculiarity in the way Arch builds/configures the kernel that triggers this?

@ncopa
Copy link
Collaborator

ncopa commented Aug 15, 2023

I was able to reproduce this in arch linux vm as well. However, when I added 127.0.0.1 localhost to /etc/hosts it started up.

I also notice that the docker image has localhost in its /etc/hosts.

$ docker run --rm -it archlinux grep 
localhost /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback

I think this is related systemd's /etc/nsswitch.conf:

hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns

@ncopa
Copy link
Collaborator

ncopa commented Aug 15, 2023

This is the problem: https://www.openwall.com/lists/musl/2022/08/31/5 caused by bug in musl libc when there is a search . in /etc/resolv.conf. (introduced in systemd to workaround glibc behavior)

It does not look like the fix was backported to alpine 3.17. I suppose we should make sure we use alpine:3.18 which has the fix.

ncopa added a commit to ncopa/k0s that referenced this issue Aug 15, 2023
The 3.18 includes a fix for a bug in musl libc when there is a
`search .` in /etc/resolv.conf.

Fixes: k0sproject#3351
Ref: https://www.openwall.com/lists/musl/2022/08/31/5

Signed-off-by: Natanael Copa <[email protected]>
@twz123
Copy link
Member

twz123 commented Aug 16, 2023

Great finding. Note to self: We link statically using CGO, which also means we're using musl for name resolution, not the Go network stack (#3384 (comment)).

ncopa added a commit that referenced this issue Aug 17, 2023
[release-1.27] Use go with alpine 3.18 (#3351)
@github-actions
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Sep 15, 2023
@twz123 twz123 removed the Stale label Sep 18, 2023
@xrendan
Copy link

xrendan commented Oct 12, 2023

I'm also seeing this (or similar on arch), I was seeing:

> sudo k0s status

Error: status: can't get "status" via "/run/k0s/status.sock": Get "http://localhost/status": dial unix /run/k0s/status.sock: connect: no such file or directory

Running sysinfo

> sudo k0s sysinfo
...
Name resolution: localhost: error: lookup localhost: no such host
...

Which led me to https://wiki.archlinux.org/title/Network_configuration#localhost_is_resolved_over_the_network

my fix was to add static resolution of localhost to the hosts file:

127.0.0.1        localhost
::1              localhost

I know nothing of the internals of k0s, but it should use the system's name resolution.

@ncopa
Copy link
Collaborator

ncopa commented Oct 16, 2023

I know nothing of the internals of k0s, but it should use the system's name resolution.

The prize for that is dynamic linking to system libc, which brings another set of problems.

@twz123
Copy link
Member

twz123 commented Oct 19, 2023

Is Golang's native DNS resolver emulating the myhostname setting from /etc/nsswitch.conf? It would be an option to use that instead of musl's DNS resolver.

Also, are there more distros out there that are configured in a similar manner by default (i.e. not having localhost in /etc/hosts but relying on this being resolved via some NSS module)? I'm curious about the reasoning behind that. This makes DNS resolution for statically linked executables quite tricky, as we're seeing here.

Dynamic linking against the system's glibc somehow defeats k0s's goal of being zero-dependency. This would directly impact non-glibc based distros, as they woudn't have the default dynamic linker and other shared libraries in place to even start k0s.

@ncopa
Copy link
Collaborator

ncopa commented Oct 24, 2023

The way nss works is that you can build any dynamic library and configure it in nsswitch.conf. You cannot really replicate that in go without making it load dynamic libs. In this case it is the nss-mymachines. So the logic in go resolver would need to be: parse nsswitch.conf, and emulate every currently available and future module out there. I'd say that is not sustainable.

But that said, it is quite possible that go resolver already has dirty hacks for this.

@twz123
Copy link
Member

twz123 commented Oct 24, 2023

In this case it is the nss-mymachines

I thought it would be myhostname?

But that said, it is quite possible that go resolver already has dirty hacks for this.

That's what I somehow suspect. Arch's default config for DNS resolution is a real challenge for statically linked applications. All those statically compiled Go binaries out there would suffer from this problem if Go wouldn't have some "emulation" for this. (OTOH, maybe they do suffer from it. I didn't check.)

@ncopa
Copy link
Collaborator

ncopa commented Oct 24, 2023

I thought it would be myhostname?

Yes. im mixing things up.

I suppose you could put it this way, if your distro does not have localhost in /etc/hosts, it is either:

  • broken by default
  • does not support static binaries

I see no reason why anybody (go or musl libc) should implement and maintain lots of questionable code for a problem that can be solved my adding a simple line in a text file.

I also don't think it is worth dropping support for everything except glibc/systemd systems, only to avoid adding a line in a text file.

So I think the fix here is to either add localhost to you /etc/hosts file or build the binaries yourself.

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Nov 23, 2023
@twz123
Copy link
Member

twz123 commented Nov 24, 2023

I agree that there's probably not much k0s can do for configurations that rely on glibc NSS plug-ins for name resolution, at least not with the precompiled binaries that k0s ships via GitHub releases. For folks who really need this, there might still be the possibility to build k0s themselves, dynamically linking against glibc.

However, as always, we should still add some notes about this in the external runtime dependencies section of the docs, and maybe also include a link to those docs somewhere in the k0s sysinfo command.

@github-actions github-actions bot removed the Stale label Nov 24, 2023
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Dec 25, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2024
twz123 pushed a commit to twz123/k0s that referenced this issue Jan 10, 2024
The 3.18 includes a fix for a bug in musl libc when there is a
`search .` in /etc/resolv.conf.

Fixes: k0sproject#3351
Ref: https://www.openwall.com/lists/musl/2022/08/31/5

Signed-off-by: Natanael Copa <[email protected]>
(cherry picked from commit 1217281)
@twz123
Copy link
Member

twz123 commented Feb 7, 2024

I suppose you could put it this way, if your distro does not have localhost in /etc/hosts, it is either:

  • broken by default
  • does not support static binaries

Actually, I've read a bit in that Arch bug about resolving localhost over the network, and the "Arch way" of adding a line to a text file seems to be to use a stub resolver, i.e. systemd-resolved, which will resolve magic names like localhost locally without reaching out to the network.

People who do not want stub resolvers on their systems are free to add these lines if this is enough to correct their problem with software that does not respect the NSS APIs.

Our current documentation is misleading, the best practice is to have a stub resolver enabled and not to add some lines from the past in /etc/hosts. I don't think we have anything to correct here.

Cheers.

@lchunleo
Copy link

lchunleo commented Feb 17, 2024

Am using the latest version of the k0s on rhel 8.

i encountered the issue (Error: status: can't get "status" via "/run/k0s/status.sock": Get "http://localhost/status": dial unix /run/k0s/status.sock: connect: no such file or directory
) when trying to do : k0s status
i checked that my hosts file contains the:
127.0.0.1 localhost
::1 localhost

but it does not work. am i supposed to do something to the /etc/resolv.conf . Am unsure what to do to it can advise?

@twz123
Copy link
Member

twz123 commented Feb 19, 2024

Am using the latest version of the k0s on rhel 8.

i encountered the issue (Error: status: can't get "status" via "/run/k0s/status.sock": Get "http://localhost/status": dial unix /run/k0s/status.sock: connect: no such file or directory ) when trying to do : k0s status i checked that my hosts file contains the: 127.0.0.1 localhost ::1 localhost

but it does not work. am i supposed to do something to the /etc/resolv.conf . Am unsure what to do to it can advise?

@lchunleo Your problem is most likely unrelated to this specific issue. The message you're seeing is a strong indicator that k0s is not running. Please check the k0s logs for any errors. If your problem persists, consider to file a new issue or feel free to reach out via the forums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

7 participants