-
Notifications
You must be signed in to change notification settings - Fork 53
flux-content: support new checkpoints list command #6798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
68cf097
to
cdd8810
Compare
re-pushed, rebasing on top of #6772 |
4ed972b
to
9bb06d9
Compare
110b841
to
d53113d
Compare
d53113d
to
95f0f74
Compare
95f0f74
to
8934f84
Compare
removed WIP, now that #6772 has been merged |
Would it be useful to add an option to Also is the |
Good point. I originally wrote this back when "any checkpoint format" was something we were still allowing / supporting, but we've moved on from that notion.
Hmmm, I didn't think of it that way. I think the |
8934f84
to
f6a2bff
Compare
I had forgotten we have a checkpoint version number, so I guess it wouldn't be horrible to up the version. We could return an array of checkpoints vs. a single one. But the big negative is the libkvs |
f6a2bff
to
79daf5b
Compare
Re-pushed, going w/ this default output. It's admittedly hand created. I ponder if remaking
I ended up keeping the |
Nice improvement on the command output!
I guess I wasn't thrilled with the SQL query change that I had to look up (because I am a complete SQL noob), given that the index probably isn't necessary given the expected small size of the query result. Easier to just filter it on the client side IMHO. We own the clients and servers as well as the convenience API (internal only), so I think we could change it all we want. But if you feel strongly, this is OK I guess. We can always fix it later. |
Don't feel strongly. I struggled the most with how to convert these API functions
i.e. they only return one checkpoint entry's info. Hmmmm, pondering this for a second, perhaps we could adjust the API to be similar to the |
Sounds good. Just a convenience library, so whatever is going to work out for the localized use cases is fine IMHO. |
43ccae7
to
89efe1b
Compare
re-pushed, going with a revamped
As expected it does lead to a healthy amount of "churn". Less in the code than I expected, but a lot more in the tests than I expected (alot of tests call checkpoint-get and parsed the response via |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6798 +/- ##
=======================================
Coverage 83.90% 83.90%
=======================================
Files 540 540
Lines 90539 90616 +77
=======================================
+ Hits 75963 76028 +65
- Misses 14576 14588 +12
🚀 New features to boost your workflow:
|
hit this build failure, it's a very new test from #6911 so just documenting in case this is a racy pattern (could hypotheticlaly just be a timeout)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I had a couple of minor suggestions that you can ignore if you want.
if (json_unpack (checkpoint, | ||
"{s:i s:s}", | ||
"version", &version, | ||
"rootref", &tmp_rootref) < 0) | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this set errno=EPROTO on error?
if (json_unpack (checkpoint, | ||
"{s:i s?f}", | ||
"version", &version, | ||
"timestamp", &ts) < 0) | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set errno?
if (json_unpack (checkpoint, | ||
"{s:i s?i}", | ||
"version", &version, | ||
"sequence", &seq) < 0) | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set errno
int kvs_checkpoint_lookup_get_checkpoints (flux_future_t *f, | ||
const json_t **checkpoints); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this one could just be called kvs_checkpoint_lookup_get()
since it returns the whole enchilada?
src/common/libkvs/kvs_checkpoint.c
Outdated
if (flux_rpc_get_unpack (f, | ||
"{s:o}", | ||
"value", &o) < 0) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe check that o
is an array and that it contains at least one entry, then fail with EPROTO if so?
Some users access element 0 without checking if it is NULL.
checkpoint_get | jq -r .value | jq -r .rootref >rootref.out && | ||
checkpoint_get | jq -r .value[0] | jq -r .rootref >rootref.out && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for later cleanup, but these jq invocations could easily be combined, e.g.
checkpoint_get | jq -r .value[0].rootref
Two potential gotchas with the way these tests are structured:
- I think without
-e
, jq returns success even if the requested key/array entry is not found - A pipeline succeeds if the last command in in succeeds
Not saying these tests are broken but they might be a little fragile as is.
Problem: Several functions in the checkpoint API still take a key parameter, even though the key is no longer used. Remove the key input parameter in kvs_checkpoint_commit() and kvs_checkpoint_lookup(). Update all callers accordingly.
Problem: The internal KVS checkpoint API only supports the retrieval of a single checkpoint. However, some content backing modules may store multiple checkpoints. Update the KVS checkpoint API to support an array of checkpoints to be returned and parsing functions to parse the individual entries in the array. Note that the content backing modules do not yet support returning multiple entries. This update is in preparation for that future change. Update all callers accordingly.
Problem: There is currently no way to get multiple checkpoints from the content modules. Update checkpoint lookup to return all checkpoints in a json array. Update all clients to handle new protocol response.
Problem: There is no way for a user to list the checkpoints that are available for recovering from. Support a new "flux content checkpoints" command that will list all of the available checkpoints for the currently configured content backing store.
Problem: There is no documentation for the new flux content checkpoints command. Add it to the flux-content(1) manpage.
Problem: There are no tests for the new flux content checkpoints command. Add coverage in all content backing store tests.
89efe1b
to
feeadef
Compare
Built on top of #6772. So I'll list this as WIP for the time being.
Problem: There is currently no way to get multiple checkpoints from the content modules.
Support an optional "index" key when getting content checkpoints. If the index is not available, return ENOENT to the caller. Support a new "flux-content checkpoints" command.