-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to start k0s on arch fresh installation #3351
Comments
hmm, looking where the error stems from:
That errors out for some reason. Is the machine configured to have any mapping for |
I thought about the same thing and added the localhost entry to This is my
|
The http error in
The lookup happens when k0s tries to generate SAN certificate: hostnames := []string{
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
fmt.Sprintf("kubernetes.svc.%s", c.ClusterSpec.Network.ClusterDomain),
"localhost",
"127.0.0.1",
}
localIPs, err := detectLocalIPs(ctx)
if err != nil {
return fmt.Errorf("error detecting local IP: %w", err) // <-- here the error is decorated
}
hostnames = append(hostnames, localIPs...)
hostnames = append(hostnames, c.ClusterSpec.API.Sans()...)
....
func detectLocalIPs(ctx context.Context) ([]string, error) {
resolver := net.DefaultResolver
addrs, err := resolver.LookupIPAddr(ctx, "localhost") /// <-- this triggers the error
if err != nil {
return nil, err
} It appears k0s adds This happens every time a k0s controller is started and having The |
A failure to lookup localhost smells a bit like a more general name resolution issue to me. Note that it's not resolving to an empty list of addresses (which would be fine for k0s), but it errors out with a hard error. What results do you get when trying to lookup localhost with other tools, say
I prepared PR #3366 that'll ignore and log the localhost resolution error. However, if resolving localhost continues to fail, I'm pretty sure that other components will fail as well. So not sure if the PR will help much. I quickly checked the sources, there are quite a few references to localhost, e.g. in some etcd certificates, konnectivity, kube-proxy, NLLB. They all assume that localhost resolves to something in the loopback range of 127.x.x.x. |
I can repro this on an Arch Linux VM. So I have something to test on. |
The error doesn't occur on main, I bisected it and found out that #3115 fixed it. Probably something in musl. @ncopa Do you maybe remember some noteworthy changes in musl that would affect name resolution of localhost that landed in Alpine 3.18? Maybe this?
|
Note that I cannot repro this with the |
I was able to reproduce this in arch linux vm as well. However, when I added I also notice that the docker image has localhost in its
I think this is related systemd's
|
This is the problem: https://www.openwall.com/lists/musl/2022/08/31/5 caused by bug in musl libc when there is a It does not look like the fix was backported to alpine 3.17. I suppose we should make sure we use alpine:3.18 which has the fix. |
The 3.18 includes a fix for a bug in musl libc when there is a `search .` in /etc/resolv.conf. Fixes: k0sproject#3351 Ref: https://www.openwall.com/lists/musl/2022/08/31/5 Signed-off-by: Natanael Copa <[email protected]>
Great finding. Note to self: We link statically using CGO, which also means we're using musl for name resolution, not the Go network stack (#3384 (comment)). |
[release-1.27] Use go with alpine 3.18 (#3351)
The issue is marked as stale since no activity has been recorded in 30 days |
I'm also seeing this (or similar on arch), I was seeing:
Running sysinfo
Which led me to my fix was to add static resolution of localhost to the hosts file:
I know nothing of the internals of k0s, but it should use the system's name resolution. |
The prize for that is dynamic linking to system libc, which brings another set of problems. |
Is Golang's native DNS resolver emulating the Also, are there more distros out there that are configured in a similar manner by default (i.e. not having localhost in /etc/hosts but relying on this being resolved via some NSS module)? I'm curious about the reasoning behind that. This makes DNS resolution for statically linked executables quite tricky, as we're seeing here. Dynamic linking against the system's glibc somehow defeats k0s's goal of being zero-dependency. This would directly impact non-glibc based distros, as they woudn't have the default dynamic linker and other shared libraries in place to even start k0s. |
The way nss works is that you can build any dynamic library and configure it in nsswitch.conf. You cannot really replicate that in go without making it load dynamic libs. In this case it is the nss-mymachines. So the logic in go resolver would need to be: parse nsswitch.conf, and emulate every currently available and future module out there. I'd say that is not sustainable. But that said, it is quite possible that go resolver already has dirty hacks for this. |
I thought it would be myhostname?
That's what I somehow suspect. Arch's default config for DNS resolution is a real challenge for statically linked applications. All those statically compiled Go binaries out there would suffer from this problem if Go wouldn't have some "emulation" for this. (OTOH, maybe they do suffer from it. I didn't check.) |
Yes. im mixing things up. I suppose you could put it this way, if your distro does not have localhost in /etc/hosts, it is either:
I see no reason why anybody (go or musl libc) should implement and maintain lots of questionable code for a problem that can be solved my adding a simple line in a text file. I also don't think it is worth dropping support for everything except glibc/systemd systems, only to avoid adding a line in a text file. So I think the fix here is to either add localhost to you |
The issue is marked as stale since no activity has been recorded in 30 days |
I agree that there's probably not much k0s can do for configurations that rely on glibc NSS plug-ins for name resolution, at least not with the precompiled binaries that k0s ships via GitHub releases. For folks who really need this, there might still be the possibility to build k0s themselves, dynamically linking against glibc. However, as always, we should still add some notes about this in the external runtime dependencies section of the docs, and maybe also include a link to those docs somewhere in the |
The issue is marked as stale since no activity has been recorded in 30 days |
The 3.18 includes a fix for a bug in musl libc when there is a `search .` in /etc/resolv.conf. Fixes: k0sproject#3351 Ref: https://www.openwall.com/lists/musl/2022/08/31/5 Signed-off-by: Natanael Copa <[email protected]> (cherry picked from commit 1217281)
Actually, I've read a bit in that Arch bug about resolving localhost over the network, and the "Arch way" of adding a line to a text file seems to be to use a stub resolver, i.e.
Cheers. |
Am using the latest version of the k0s on rhel 8. i encountered the issue (Error: status: can't get "status" via "/run/k0s/status.sock": Get "http://localhost/status": dial unix /run/k0s/status.sock: connect: no such file or directory but it does not work. am i supposed to do something to the /etc/resolv.conf . Am unsure what to do to it can advise? |
@lchunleo Your problem is most likely unrelated to this specific issue. The message you're seeing is a strong indicator that k0s is not running. Please check the k0s logs for any errors. If your problem persists, consider to file a new issue or feel free to reach out via the forums. |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.27.4+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
Got
error: status: can't do http request: /run/k0s/status.sock status
after runningsudo k0s status
on a fresh install.Steps to reproduce
Expected behavior
Version: v1.27.4+k0s.0
Process ID: 4315
Role: controller
Workloads: true
SingleNode: true
Kube-api probing successful: true
Kube-api probing last error:
Actual behavior
error: status: can't do http request: /run/k0s/status.sock status
Screenshots and logs
No response
Additional context
It worked after creating the config file
mkdir -p /etc/k0s k0s config create > /etc/k0s/k0s.yaml
The text was updated successfully, but these errors were encountered: