-
Notifications
You must be signed in to change notification settings - Fork 4k
AWS peer discovery: add multi-hostname path support #14705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
AWS peer discovery: add multi-hostname path support #14705
Conversation
Paths when is_list(Paths) -> | ||
?LOG_DEBUG("AWS peer discovery using multiple hostname paths"), | ||
Paths; | ||
_Invalid -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we get a non list configuration? Can this check be pushed to the cuttlefish schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is already handled in the cuttlefish schema but I was trying to be extra defensive here which is probably unnecessary. In theory if the env variable was somehow updated to be an invalid value (e.g. non-list) after cuttlefish checks but before peer discovery starts, this would allow fallback instead of failure. But that's quite impractical, will remove the _Invalid
case to simplify.
deps/rabbitmq_peer_discovery_aws/src/rabbit_peer_discovery_aws.erl
Outdated
Show resolved
Hide resolved
true -> List; | ||
_ -> "" | ||
end | ||
catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you think there would be a need for a try/catch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to guard against get_value/2
, not latin1_char_list
. I'll update the try
to scope this correctly like this (sorry for the confusion):
-spec get_hostname(path(), props()) -> string().
get_hostname([], _Props) ->
""; %% Handle empty paths gracefully
get_hostname(Path, Props) ->
List = try
lists:foldl(fun get_value/2, Props, Path)
catch
_:_ -> ""
end,
case io_lib:latin1_char_list(List) of
true -> List;
_ -> ""
end.
The reason why a catch for get_value/2
is needed is I noticed that if an invalid path is provided (e.g. this one when a third network interface doesn't exist on the instance)
cluster_formation.aws.hostname_path.2 = networkInterfaceSet,3,privateIpAddressesSet,1,privateIpAddress
I see this startup failure:
[error] <0.208.0> BOOT FAILED
[error] <0.208.0> ===========
[error] <0.208.0> Exception during startup:
[error] <0.208.0>
[error] <0.208.0> error:function_clause
[error] <0.208.0>
[error] <0.208.0> lists:nth_1/2, line 301
[error] <0.208.0> args: [1,[]]
[error] <0.208.0> rabbit_peer_discovery_aws:get_value/2, line 416
[error] <0.208.0> lists:foldl_1/3, line 2151
[error] <0.208.0> rabbit_peer_discovery_aws:get_hostname/2, line 406
[error] <0.208.0> rabbit_peer_discovery_aws:-extract_unique_hostnames/2-lc$^1/1-1-/4, line 465
[error] <0.208.0> rabbit_peer_discovery_aws:-extract_unique_hostnames/2-lc$^1/1-1-/4, line 465
[error] <0.208.0> rabbit_peer_discovery_aws:extract_unique_hostnames/2, line 465
[error] <0.208.0> rabbit_peer_discovery_aws:get_hostname_name_from_reservation_set/2, line 327
During hostname migration scenarios we may encounter invalid paths like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just account for that case in get_value/2
if it is a possibility?
https://github.com/rabbitmq/rabbitmq-server/pull/14705/files#r2417353623
try/catch
should be very, very rarely used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try catches like that are a bit dangerous as you are catching 'everything' that could go wrong there, even cases where a crash might be preferred to catch a bug. Train yourself to think of ways to not use them, unless cases where you are forced to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, in this case I wonder if there's a better way that doesn't add significant complexity though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be aware that the genie is extremely lazy with whitespace and will gladly add space characters on blank lines, at the end of lines, etc.
get_hostname_paths() -> | ||
M = ?CONFIG_MODULE:config_map(?BACKEND_CONFIG_KEY), | ||
UsePrivateIP = get_config_key(aws_use_private_ip, M), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is extraneous whitespace here - in the diff, I see four space characters here.
true -> List; | ||
_ -> "" | ||
end | ||
catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deps/rabbitmq_peer_discovery_aws/src/rabbit_peer_discovery_aws.erl
Outdated
Show resolved
Hide resolved
Adds support for multiple hostname paths in the AWS peer discovery plugin to enable zero-downtime rolling upgrades during hostname migration scenarios. The implementation allows RabbitMQ nodes to discover peers using multiple hostname paths, ensuring cluster formation succeeds even when nodes are configured with different hostname paths during rolling upgrades. Example usage: cluster_formation.aws.hostname_path.1 = privateDnsName cluster_formation.aws.hostname_path.2 = privateIpAddress
ca6b404
to
175a956
Compare
catch | ||
_:_ -> false | ||
end | ||
end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid try catch statements, i.e you could do:
case string:to_integer(Str) of
{_, ""} -> true;
_ -> false
end
true -> List; | ||
_ -> "" | ||
end | ||
catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try catches like that are a bit dangerous as you are catching 'everything' that could go wrong there, even cases where a crash might be preferred to catch a bug. Train yourself to think of ways to not use them, unless cases where you are forced to.
true -> List; | ||
_ -> "" | ||
end | ||
catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just account for that case in get_value/2
if it is a possibility?
https://github.com/rabbitmq/rabbitmq-server/pull/14705/files#r2417353623
try/catch
should be very, very rarely used.
end. | ||
|
||
-spec get_value(string()|integer(), props()) -> props(). | ||
get_value(_, []) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_value(Key, []) when is_integer(Key) -> | |
[]; | |
get_value(Key, Props) when is_integer(Key) -> | |
{"item", Props2} = lists:nth(Key, Props), | |
Props2; | |
get_value(Key, Props) -> | |
Value = proplists:get_value(Key, Props), | |
sort_ec2_hostname_path_set_members(Key, Value). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second clause would be redundant, no? It should always be covered by the first clause
(suggested edited)
Proposed Changes
This PR adds support for multiple hostname paths in the AWS peer discovery plugin to enable zero-downtime rolling upgrades during hostname migration scenarios. The implementation allows RabbitMQ nodes to discover peers using multiple hostname paths, ensuring cluster formation succeeds even when nodes are configured with different hostname paths during rolling upgrades.
Backward Compatibility
Existing single
hostname_path
configuration continues to work (unchanged).We fallback to single path behavior when no numbered paths are configured.
Configuration Examples
Multiple paths for zero-downtime migration
Note: This follows the existing pattern we have for
classic_config
:Single path (backward compatible)
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
document