Multi-node setup with Hazelcast and Artemis in container not working #11097

b-fein · 2025-07-03T17:34:47Z

b-fein
Jul 3, 2025
Collaborator

I have a question regarding Hazelcast in a multi-instance setup: How can I debug if the Hazelcast nodes do not connect to each other?

Artemis setup

I want to try out the LocalCI setup with the build-agent running in a separate VM. We deploy our Artemis as a container instead of directly via the WAR file. Since Artemis is running in a container, it cannot bind directly to the public address of the machine, but Hazelcast allows for different bind-interfaces and public addresses with the addition of a configuration parameter in Artemis [1].

[1] develop...feature/general/hazelcast-public-address, built container image ghcr.io/uni-passau-artemis/artemis:8.2.2-hazelcast

Artemis main instance:

running in Artemis VM
hazelcast.interface: 0.0.0.0
hazelcast.publicAddress: 172.16.1.10:5701

Artemis build agent:

running in Build Agent VM
hazelcast.interface: 0.0.0.0
hazelcast.publicAddress: 172.16.1.20:5701

In the log and in the Eureka web interface I can now see that both instances correcly connect to the registry:

Current Registry members: [172.16.1.10, 172.16.1.20]

However, for example in the log of artemis-container I then get

Current Hazelcast members: [172.16.1.10]
Adding Hazelcast cluster member 172.16.1.20:5701

and every two minutes the same (on the build agent the same with swapped IPs). So it looks like the Hazelcast cluster does not form, but there is no error log explaining what might go wrong. It looks like it correctly uses the public addresses as defined in the config to try to connect.

Using nmap I verified that the port 5701 is open on both machines when checking from the respective other one (i.e. there should not be a firewall that blocks the communication). I also get an error log in Artemis that the Hazelcast REST API is not enabled when I try to curl http://172.16.0.{10,20}:5701/hazelcast/rest/, so it seems the public address/port is successfully routed to Artemis itself and Artemis listens on it.

Artemis config

spring:
  hazelcast:
    interface: 0.0.0.0  # use all interfaces inside the container (10.89.0.*)
    publicAddress: 172.16.1.10:5701  # or 172.16.1.20 on the other host

eureka:
  client:
    enabled: true
    service-url:
      defaultZone: https://admin:${jhipster.registry.password}@172.16.1.10:10443/eureka
  instance:
    prefer-ip-address: true
    ip-address: "172.16.1.10"  # or 172.16.1.20 on the other host
    appname: Artemis
    instanceId: Artemis:artemis-test  # or Artemis:artemis-buildagent on the other host

Sanity-check using plain Hazelcast container

To check if this is an issue with Hazelcast itself, or the way it is used in Artemis, I used the plain Hazelcast containers on both hosts to check if they can connect using the public/private IP mapping.

hazelcast.yml identical on both hosts

hazelcast:
  network:
    join:
      multicast:
        enabled: false
      tcp-ip:
        enabled: true
        member-list:
          - 172.16.1.10:5702
          - 172.16.1.20:5702

Then I start the containers on the two hosts:

# on host 1 with IP 172.16.1.10
podman run --rm --name member1 -e HZ_NETWORK_PUBLICADDRESS=172.16.1.10:5702 -p 5702:5701 -v "$(pwd)"/hazelcast.yml:/opt/hazelcast/hazelcast.yml -e HAZELCAST_CONFIG=hazelcast.yml docker.io/hazelcast/hazelcast:5

# on host 2 with IP 172.16.1.20
podman run --rm --name member2 -e HZ_NETWORK_PUBLICADDRESS=172.16.1.20:5702 -p 5702:5701 -v "$(pwd)"/hazelcast.yml:/opt/hazelcast/hazelcast.yml -e HAZELCAST_CONFIG=hazelcast.yml docker.io/hazelcast/hazelcast:5

In the log I can see that the two instances successfully connect:

Members {size:2, ver:2} [
        Member [172.16.1.10]:5702 - 0ed987bc-6342-4b63-acbc-2e74f80b9440
        Member [172.16.1.20]:5702 - da27ec4b-39ee-4561-b18f-2428dc1cd623 this
]

Network traffic

Using the Hazelcast containers, I can see network traffic to the destination port 5702 on the other host as expected.

When I tcpdump the traffic to 5701 to check for potential connections Artemis (tries) to open, there is nothing.
Also using strace -f -t -e network -p $ARTEMIS_PID, I can see nothing at the time when Artemis logs Adding Hazelcast cluster member.

$\Rightarrow$ I’m not even sure any more if Artemis even tries to connect to the other node.

b-fein · 2025-07-03T17:35:32Z

b-fein
Jul 3, 2025
Collaborator Author

/cc @theblobinthesky

Maybe you have an idea where to start debugging, @Hialus or @bensofficial?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-node setup with Hazelcast and Artemis in container not working #11097

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi-node setup with Hazelcast and Artemis in container not working #11097

Uh oh!

Uh oh!

b-fein Jul 3, 2025 Collaborator

Artemis setup

Artemis config

Sanity-check using plain Hazelcast container

Network traffic

Replies: 1 comment

Uh oh!

b-fein Jul 3, 2025 Collaborator Author

b-fein
Jul 3, 2025
Collaborator

b-fein
Jul 3, 2025
Collaborator Author