Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement snapshot/incremental reload functionality #293

Merged
merged 7 commits into from
Aug 31, 2023

Conversation

skypodolsky
Copy link

@skypodolsky skypodolsky commented Aug 28, 2023

The main concept is to create a reload script when the snapshot is created or changes its state, allowing the kernel to call it before the root file system is mounted. This functionality has been implemented in libelastio-snap as it's used in both elastio and elioctl.

Works everywhere but Fedora 38, Ubuntu 20.04 and Ubuntu 22.04. The relevant issue is here: #292,

Closes #289

#include "libelastio-snap.h"

#define RELOAD_SCRIPT_PATH "/etc/elastio/dla"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does dla stand for here? Also I read this as the path to the reload script is /etc/elastio/dla, but looking at how this is used, it looks more like that's a directory that contains reload scripts.

I think this would benefit from an explanatory comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, to me, it's also unclear what exactly dla means, but this directory name was used initially in dattobd. I'll add an explanatory comment.

return 0;
}

return 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know nothing about the established conventions in this code, so I won't second guess them here. But the returning 0 for success and non-zero on error is a bit shocking to my hasnt-written-C-code-in-ages sensibilities. The purpose of this comment is just to make sure this is the right convention for the code here. Elsewhere in this change set I see -1 used as an error signal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -1 makes indeed more sense here. Replaced this.

return -1;
}

ret = sscanf(buf, "/usr/bin/elioctl reload-%s -c %u %s %s %u %s", mode, &rp->cache_size, rp->bdev, rp->cow, &rp->minor, ignore_errors);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to make this assumption about the path to elioctl?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we see that installation scripts use exactly this path:

dist/initramfs/dracut/module-setup.sh:    inst /usr/bin/elioctl
dist/initramfs/initramfs-tools/hooks/elastio-snap:copy_exec /usr/bin/elioctl
dist/initramfs/initrd/boot-elastio-snap.sh:#%programs: /usr/bin/elioctl

But I think it would be prettier defining this path above near RELOAD_SCRIPT_PATH

@skypodolsky
Copy link
Author

@e-kov what do you think about this PR?

Copy link
Collaborator

@e-kov e-kov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skypodolsky I'm glad this laconic change makes incrementals/snapshots functionality working after reboot.
However there are few concerns:

  1. Logging. I'd like to have it, but not from the library or not to stdout. See corresponding comment.
  2. Docs. It would be great to update some docs (doc/elioctl.8 or README.md) with an info about reload scripts in the dla dir which are managed by the library. There is almost nothing about it even now, when users of this driver can manage some scripts there manually. Maybe this info can be added to the description of the reload-* commands.
  3. Possibility to generate incorrect reload scripts after deletion of the existing ones. Here we can just stop to update them or generate them correctly using another sources. See appropriate comments.

Feel free to ignore my comments or use your judgement to handle some. The PR is already approved by respected experienced developers in the field of drivers and backups 😉

Comment on lines 259 to 266
struct reload_script_params rp;
if (elastio_snap_get_reload_params(minor, &rp))
printf("get script params failed\n");

strcpy(rp.cow, cow);

if (elastio_snap_set_reload_params(minor, true, &rp))
printf("update script params failed\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you should not call elastio_snap_set_reload_params in case if elastio_snap_get_reload_params returns an error. I've removed /etc/elastio/dla/reload_0.sh after few snapshot and it's appeared again after transition-to-snapshot. As result, new reload_0.sh contains incorrect params:

/usr/bin/elioctl reload-snapshot -c 0 ^A /cow3 0 
^@

I'd suggest to use info about a snap from /proc/elastio-snap-info or by an info IOCTL and rewrite reload script each time without parsing the existing one (or maybe not???).

[nit] Also I don't know what ^@ does mean. Does it something legitimate as EOL or not. It always appears in the reload_[M].sh. However scripts are working fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @e-kov, thank you for such a detailed review. Indeed, I think that these symbols you saw were caused by incorrect parsing of a script file. I have reconsidered this place: now, if an error took place while parsing the script file, we don't allow any transitions.

Comment on lines 234 to 239
struct reload_script_params rp;
if (elastio_snap_get_reload_params(minor, &rp))
printf("get script params failed\n");

if (elastio_snap_set_reload_params(minor, false, &rp))
printf("update script params failed\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you should not call elastio_snap_set_reload_params in case if elastio_snap_get_reload_params returns an error. It's same as in the comment for elastio_snap_transition_snapshot.

Comment on lines 285 to 292
struct reload_script_params rp;
if (elastio_snap_get_reload_params(minor, &rp))
printf("get script params failed\n");

rp.cache_size = cache_size;

if (elastio_snap_set_reload_params(minor, rp.snapshot, &rp))
printf("update script params failed\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about elastio_snap_set_reload_params call after failed elastio_snap_get_reload_params call.

strcpy(rp.cow, cow);

if (elastio_snap_set_reload_params(minor, true, &rp))
printf("update script params failed\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging. It was not present here before. Logically small wrapper around IOCTLs has nothing to log. And it looks fine when this library is used from elioctl... But it is a big problem for elastio CLI. Uncontrolled by elastio writes to STDOUT will break JSON outputs. Hence all "job's" output is screwed up and not suitable then for the support folks.
As an option, STDERR, is not a good option too. In this case we'll break progress reporting of elastio. It's not so dramatic, but still a bug.
I'd suggest to write errors into dmesg. Another option is to return errors w/o logging here and then log appropriate messages from elioctl or whatever uses this library.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that clarification, I didn't know such a thing about elastio. I have replaced all logging with an error that is returned if script-related stuff didn't work properly.

@skypodolsky skypodolsky merged commit 8d287ba into master Aug 31, 2023
33 checks passed
@skypodolsky skypodolsky deleted the feat/289-reload-snapshots branch August 31, 2023 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reload snapshots and incrementals on reboot automatically
4 participants