Skip to content

Commit 3337ce0

Browse files
authored
Remove LocalVersions.clear() (#1278)
This changes behavior so that we won't remove all resources that were added by a given agent from the relay Config when that agent disconnects gracefully. This could be disrupting in the usual case where the agent disconnects simply due to a rollout or rescheduling event. The LocalVersion struct exits scope when the connection is closed so we don't need to clear the local state it has either. Since the Cluster (EndpointSet) resource is versioned deterministically by hashing the endpoints, and updated as a batch, we don't have to worry about old endpoints lingering if we miss an update when one agent shuts down before another picks up. In the case of the Datacenter resource this is handled separately through the `client_disconnected()` method on the `Configuration` trait, which in the case of a relay will remove the disconnecting agent from its DatacenterMap. We also need to make sure we remove the client from ClusterMap.localities so that the next agent can take over, so I've added the `remove_contributor()` method that is called in `client_disconnected()`.
1 parent 1543057 commit 3337ce0

File tree

4 files changed

+18
-18
lines changed

4 files changed

+18
-18
lines changed

crates/xds/src/config.rs

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -53,22 +53,6 @@ impl LocalVersions {
5353
panic!("unable to retrieve `{ty}` versions, available versions are {versions:?}");
5454
}
5555
}
56-
57-
#[inline]
58-
pub fn clear<C: crate::config::Configuration>(
59-
&self,
60-
config: &Arc<C>,
61-
remote_addr: Option<std::net::IpAddr>,
62-
) {
63-
for (type_url, map) in &self.versions {
64-
let mut map = map.lock();
65-
let remove = map.keys().cloned().collect::<Vec<_>>();
66-
if let Err(error) = config.apply_delta(type_url, vec![], &remove, remote_addr) {
67-
tracing::warn!(%error, count = remove.len(), type_url, "failed to remove resources upon connection loss");
68-
}
69-
map.clear();
70-
}
71-
}
7256
}
7357

7458
pub struct ClientState {

crates/xds/src/server.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -636,8 +636,6 @@ impl<C: crate::config::Configuration> AggregatedControlPlaneDiscoveryService for
636636
tracing::info!("xds stream terminated");
637637
}
638638

639-
local.clear(&config, Some(remote_addr));
640-
641639
res
642640
}
643641
.instrument(tracing::trace_span!("handle_delta_discovery_response")),

src/config.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,13 @@ impl quilkin_xds::config::Configuration for Config {
335335
dc.remove(ip);
336336
});
337337
}
338+
if let Some(cl) = self.dyn_cfg.clusters() {
339+
cl.modify(|cl| {
340+
// Make sure we remove this agent as a contributor to any locality that it has
341+
// contributed endpoints for
342+
cl.remove_contributor(Some(ip));
343+
});
344+
}
338345
}
339346
}
340347

src/net/cluster.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -547,6 +547,17 @@ where
547547
ret
548548
}
549549

550+
#[inline]
551+
pub fn remove_contributor(&self, remote_addr: Option<std::net::IpAddr>) {
552+
self.localities.retain(|k, v| {
553+
let keep = *v != remote_addr;
554+
if !keep {
555+
tracing::debug!(locality=?k, ?remote_addr, "removing locality contributor");
556+
}
557+
keep
558+
});
559+
}
560+
550561
#[inline]
551562
pub fn remove_locality(
552563
&self,

0 commit comments

Comments
 (0)