Replies: 1 comment
-
/cc @theblobinthesky Maybe you have an idea where to start debugging, @Hialus or @bensofficial? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a question regarding Hazelcast in a multi-instance setup: How can I debug if the Hazelcast nodes do not connect to each other?
Artemis setup
I want to try out the LocalCI setup with the build-agent running in a separate VM. We deploy our Artemis as a container instead of directly via the WAR file. Since Artemis is running in a container, it cannot bind directly to the public address of the machine, but Hazelcast allows for different bind-interfaces and public addresses with the addition of a configuration parameter in Artemis [1].
[1] develop...feature/general/hazelcast-public-address, built container image
ghcr.io/uni-passau-artemis/artemis:8.2.2-hazelcast
Artemis main instance:
hazelcast.interface
: 0.0.0.0hazelcast.publicAddress
: 172.16.1.10:5701Artemis build agent:
hazelcast.interface
: 0.0.0.0hazelcast.publicAddress
: 172.16.1.20:5701In the log and in the Eureka web interface I can now see that both instances correcly connect to the registry:
However, for example in the log of artemis-container I then get
and every two minutes the same (on the build agent the same with swapped IPs). So it looks like the Hazelcast cluster does not form, but there is no error log explaining what might go wrong. It looks like it correctly uses the public addresses as defined in the config to try to connect.
Using
nmap
I verified that the port 5701 is open on both machines when checking from the respective other one (i.e. there should not be a firewall that blocks the communication). I also get an error log in Artemis that the Hazelcast REST API is not enabled when I try tocurl http://172.16.0.{10,20}:5701/hazelcast/rest/
, so it seems the public address/port is successfully routed to Artemis itself and Artemis listens on it.Artemis config
Sanity-check using plain Hazelcast container
To check if this is an issue with Hazelcast itself, or the way it is used in Artemis, I used the plain Hazelcast containers on both hosts to check if they can connect using the public/private IP mapping.
hazelcast.yml
identical on both hostsThen I start the containers on the two hosts:
In the log I can see that the two instances successfully connect:
Network traffic
Using the Hazelcast containers, I can see network traffic to the destination port 5702 on the other host as expected.
When I
tcpdump
the traffic to5701
to check for potential connections Artemis (tries) to open, there is nothing.Also using
strace -f -t -e network -p $ARTEMIS_PID
, I can see nothing at the time when Artemis logsAdding Hazelcast cluster member
.Beta Was this translation helpful? Give feedback.
All reactions