Edge Replication: connect: cannot assign requested address #24274

C-Aniruddh · 2023-06-06T23:53:07Z

I have a database setup with 3 InfluxDB nodes in a Primary / Replica configuration. The primary node is where I am sending reads / writes, and maintaining the replica nodes as backups. Within this configuration, I have four buckets in each node. So my total replication count is 4 (buckets) * 2 (replicas) = 8.

I have a writer that is writing ~300,000 points / second across all the buckets to the master node (of variable sizes, some in a few KBs). When the master starts to replicate data to the replicas, I start seeing the following error:

ts=2023-06-06T22:39:47.378875Z lvl=error msg="Error in replication stream" log_id=0iGfpo8l000 service=replications replication_id=0b50a3b294085000 error="Post \"http://<ip-address>:8086/api/v2/write?bucket=f096d4afe945d373&org=af991c6c2990a8b9\": dial tcp <ip-address>:8086: connect: cannot assign requested address" retries=2495

This error generally starts showing up after around 20-30 seconds.

From what I know, this is happening because there are too many TCP connections being made to the replicas on the master node and in doing so, I exhaust all available local ephemeral ports. A few fixes I have tried on my end is setting the following parameters:

net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_max_tw_buckets=20000
net.ipv4.tcp_fastopen=3
echo '1024 65000' > /proc/sys/net/ipv4/ip_local_port_range

I have also increased the ulimits in my docker compose:

ulimits:
  nofile:
    soft: 400000
    hard: 800000

With these fixes, I still see the error after a while, and then when one of these TCP connections is available, the retry count resets back to 0.

I have a hunch that this can be resolved by modifying the HTTP client parameters such as MaxIdleConnsPerHost, but I do not have access to modify the replication writeAPI client parameters as needed from outside the codebase.

For further clarification on setup, all of the InfluxDB nodes are on separate VMs (but they are on the same local network, with < 0.5ms ping).

Steps to reproduce:
List the minimal actions needed to reproduce the behavior.

Setup a master influxdb node, and two replica nodes on 3 different servers.
Setup the replications of 4 buckets across 2 replicas on the master
Write huge amount of variable sized data to all the buckets
Wait for a minute, to start seeing errors.

Expected behavior:
Expected behavior is that the data is transported to the replicas using a shared connection pool or is configured to reuse TCP connections.

Actual behavior:
Large amount of TCP connections causing unavailability of open port for binding target address.

Environment info:

System info: Run uname -srm and copy the output here

Linux 5.15.0-71-generic x86_64

InfluxDB version: Run influxd version and copy the output here

InfluxDB v2.7.0 (git: 85f725f8b9) build_date: 2023-04-05T15:32:25Z

Other relevant environment details: Container runtime, disk info, etc

Config:
Copy any non-default config values here or attach the full config as a gist or file.

Logs:
Include snippet of errors in log.

Performance:
Generate profiles with the following commands for bugs related to performance, locking, out of memory (OOM), etc.

# Commands should be run when the bug is actively happening.
# Note: This command will run for ~30 seconds.
curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=30s"
iostat -xd 1 30 > iostat.txt
# Attach the `profiles.tar.gz` and `iostat.txt` output files.

The text was updated successfully, but these errors were encountered:

C-Aniruddh · 2023-06-07T20:19:15Z

PR #23997 seems to resolve the issue.

C-Aniruddh closed this as completed Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edge Replication: connect: cannot assign requested address #24274

Edge Replication: connect: cannot assign requested address #24274

C-Aniruddh commented Jun 6, 2023

C-Aniruddh commented Jun 7, 2023

Edge Replication: connect: cannot assign requested address #24274

Edge Replication: connect: cannot assign requested address #24274

Comments

C-Aniruddh commented Jun 6, 2023

C-Aniruddh commented Jun 7, 2023