Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ public synchronized void init() throws IOException {
storageProperties.getTypes().get(HDFS_TYPE.getValue());
org.apache.hadoop.conf.Configuration configuration = new org.apache.hadoop.conf.Configuration();
configuration.set("fs.defaultFS", hdfsStorageProperties.getEndpoint());

// Connection timeout configuration - fail fast on unreachable nodes
configuration.set("ipc.client.connect.timeout", "10000"); // default: 20000ms, override to 10s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be config driven?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for getting them from config. Can be injected through li internal config.

configuration.set(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these values are scaled down to fail fast? What if it succeeds with retries or higher value config?

Copy link
Collaborator Author

@cbb330 cbb330 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are up retries, block retries, and client retries. IO retries may retry several times to the same host. where block retries will retry the same block across data nodes, but client retries will retry f the namenode and refresh block locations.

So say a host is down, IO retries will continue to fetch the host even if it is not responding.

So say a a block is missing, we would not want to check all data nodes for said block, but instead re-request the namenode for these locations.

I'm not particularly knowledgeable here so offline I've requested some HDFS SME to review

"ipc.client.connect.max.retries", "3"); // default: 10, override to 3 per address

// Socket timeout configuration - fail fast per datanode attempt
configuration.set(
"dfs.client.socket-timeout", "30000"); // default: 60000ms, override to 30s per node

fs = FileSystem.get(configuration);
}

Expand Down