-
Notifications
You must be signed in to change notification settings - Fork 733
Open
Description
Description
When I was building test nodes using lxd and running the etcd test program in the tutorial, I encountered problems with the nodes not communicating properly.
2025-03-14 10:33:49.711760 W | rafthttp: health check for peer 9b116f88cab4dc9 could not connect: dial tcp 10.148.173.139:2380: getsockopt: connection refused
2025-03-14 10:33:49.716165 W | rafthttp: health check for peer 5aa594b5d9b66c42 could not connect: dial tcp 10.148.173.249:2380: getsockopt: connection refused
2025-03-14 10:33:49.722938 W | rafthttp: health check for peer 7f6143cbd22aca00 could not connect: dial tcp 10.148.173.95:2380: getsockopt: connection refused
2025-03-14 10:33:49.723323 W | rafthttp: health check for peer f82e563e5c75137e could not connect: dial tcp 10.148.173.142:2380: getsockopt: connection refused
Root Cause
After my debugging, I found that n1 incorrectly resolved its own address to 127.0.1.1.As a result, the node uses 127.0.1.1 as the bind address, causing connections from other nodes to be rejected.The reason for this is rule 127.0.1.1 in /etc/hosts.This rule is written by lxd when it is created, some other containers like docker don't have this rule.
2025-03-14 03:16:43.405882 W | etcdmain:no data-dir provided,using default data-dir./n1.etcd
2025-03-14 03:16:43.405906 W | embed:expected Ip in URL for binding(http://n1:2380)
2025-03-14 03:16:43.405925 W | embed:expected Ip in URL for binding(http://n1:2379)
2025-03-14 03:16:43.406274 I | embed:listening for peers on http://n1:2380
2025-03-14 03:16:43.406384 I | embed:listening for client requests on n1:2379
2025-03-14 03:16:43.439387 I | pkg/netutil:resolving n1:2380 to 127.0.1.1:2380
The lxd documentation proves this (127.0.0.1 in the documentation is actually 127.0.1.1 in the code)
https://github.com/canonical/lxd-imagebuilder/blob/7574f6883f23d88716937e5951de7f4d5301ca93/doc/reference/lxd-imagebuilder/generators.md?plain=1#L90-L95
https://github.com/canonical/lxd-imagebuilder/blob/7574f6883f23d88716937e5951de7f4d5301ca93/generators/hosts.go#L37-L38
A possible fix
Just delete the "127.0.1.1 hostname" rule,nodes can correctly resolve their DNS, or more trouble in the configuration manually set the detailed address
Other related DNS issues
By the way, when configuring the search domain, if "sudo resolvectl status lxdbr0" displays the correct search domain but "ping n1" still fails to resolve to "ping n1.lxd", It can be resolved by adding a Domains, such as “Domains=lxd”, to the “/etc/systemd/resolved.conf”.
Metadata
Metadata
Assignees
Labels
No labels