-
Notifications
You must be signed in to change notification settings - Fork 118
feat(persistence): update replica health condition as atomic operation #1879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
bors try |
tryBuild succeeded: |
/// Information about children. | ||
pub children: Vec<ChildInfo>, | ||
} | ||
|
||
impl Display for NexusInfo { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
write!(f, "{{children: [")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just call the debug impl here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just call the debug impl here?
Sure, will remove this, was using during initial testing.
nexus_info.children.iter_mut().for_each(|c| { | ||
if c.uuid == uuid { | ||
c.healthy = *healthy; | ||
} | ||
}); | ||
|
||
let mut txn = NexusInfoTxn { | ||
key_info: &mut persistent_nexus_info, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this cross call mutation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutating this in the failure case to set the new flag's value in the in-core object key_info
here, just in case required to debug.
c85178e
to
c9a0c46
Compare
bors try |
tryBuild succeeded: |
c9a0c46
to
c436075
Compare
bors try |
tryBuild succeeded: |
Fail the transaction operation and shutdown the nexus to avoid succeeding IO via remaining replicas. This ensures data integrity in case the replica being marked here has been picked up as source of truth elsewhere for this volume. Signed-off-by: Diwakar Sharma <[email protected]>
c436075
to
f20e613
Compare
bors try |
tryBuild succeeded: |
//impl Display for NexusInfo { | ||
// fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
// write!(f, "{{children: [")?; | ||
// | ||
// for (i, cinfo) in self.children.iter().enumerate() { | ||
// if i != 0 { | ||
// write!(f, ", ")?; | ||
// } | ||
// write!(f, "uuid: {}, healthy: {}", cinfo.uuid, cinfo.healthy)?; | ||
// } | ||
// write!( | ||
// f, | ||
// "], clean_shutdown: {}, do_self_shutdown: {}}}", | ||
// self.clean_shutdown, self.do_self_shutdown | ||
// ) | ||
// } | ||
//} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
})?; | ||
|
||
if !txn_resp.succeeded() { | ||
if let TxnOpResponse::Get(g) = &txn_resp.op_responses()[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check if len > 1?
|
||
if !txn_resp.succeeded() { | ||
if let TxnOpResponse::Get(g) = &txn_resp.op_responses()[0] { | ||
if let Some(kv) = g.kvs().first() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to take here rather copying the value on return
// This is needed because absence of above response data shouldn't get | ||
// implicitly assumed as success. With this the top level caller can | ||
// compare this returned blank value with some expected value. | ||
return Ok(Some(vec![])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we return a StoreError here as this seems like an etcd bug if we hit this?
if let Some(current_value) = txn_resp { | ||
let val = serde_json::from_slice::<NexusInfo>(¤t_value).unwrap(); | ||
|
||
warn!("current state found: key - {key}, value - {val:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to after the if statement?
name: self.name.clone(), | ||
}); | ||
} | ||
error!("{self:?}: failed to persist nexus info transaction: {err}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the same log once strategy we have in save
}); | ||
} else { | ||
// Don't need to check individual op responses. | ||
debug!(?key, "{self:?}: state save transaction successful"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as save
debug!(?key, "{self:?}: state save transaction successful"); | |
trace!(?key, "{self:?}: the state was saved successfully"); |
@@ -1,7 +1,7 @@ | |||
//! Definition of a trait for a key-value store together with its error codes. | |||
|
|||
use async_trait::async_trait; | |||
use etcd_client::{Compare, Error, TxnOp, TxnResponse}; | |||
use etcd_client::Error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed now I hope?
ops_success: Vec<TxnOp>, | ||
ops_failure: Option<Vec<TxnOp>>, | ||
) -> Result<TxnResponse, StoreError> { | ||
new_value: &[u8], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be &impl StoreValue
?
pub struct NexusInfoTxn<'a> { | ||
pub key_info: &'a mut PersistentNexusInfo, | ||
// Expected value for the key. | ||
pub expected: NexusInfo, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any of these need to be pub?
Fail the transaction operation and shutdown the nexus to avoid succeeding IO via remaining replicas. This ensures data integrity in case the replica being marked here has been picked up as source of truth elsewhere for the volume.