Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite copy attempts when tmp and final dir are the same, but different #256

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Marandil
Copy link

@Marandil Marandil commented May 23, 2021

See Chia-Network/chia-blockchain#5863

To elaborate on the title, when plotting and final and tmp2 dirs are the same canonically, but their paths are given differently (e.g. one expllicitly, the other as '.'), the code enters an infinite loop trying to copy "./plot-k32-[...].plot.2.tmp" to "$PWD/plot-k32-[...].plot.2.tmp" with an obvious error that the file exists.

In the issue above (in the chia-blockchain) I suggested that files should simply be renamed when both paths point to the same filesystem, but apparently std::filesystem doesn't expose st_dev field explicitly, so instead the code attempts to create a hard link and reverts back to copying when it fails (e.g. when the files reside on different filesystems, or the filesystem doesn't support hard links).

To summarize the changes:

  • All dirnames are first canonicalized to ensure differently given paths to the same directory are treated correctly.
  • Before the final rename, the .plot.2.tmp file is first hardlinked from final dir to tmp2 dir. If the hard linking fails, the previous copy routine is used. Final rename requires either a successful copy or a hard link.
  • The final rename or copy & rename step has been reworked. The code now always attempts to rename first, falling back to copy only in case of a rename failure. Copy no longer uses two steps and creates the final file directly. If this is seen as incorrect / undesireable, please write so, this needs an appropriate comment in the code and possibly another solution. Removal of the final rename handles some of the bizzare cases where fs::rename doesn't recognize that the underlying filesystems for temp 2 and final directories are the same.

The code may still fail, if the tmp2 dir and final dir are essentially the same, but given through different canonical paths. An example of this happening is mounting the same block device under different mount points, e.g. mounting /dev/sda1 as /mnt/plots and /mnt/tmp2, and passing the mount points accordingly. The potential fixes for this are:

  • Always attempt to hardlink final plot to tmp2 plot before going the copy route; still requires care in case the hard link fails, but the directories are essentially the same;
  • Use std::filesystem::equivalent instead of comparing the parent paths;
  • Use pure stat syscall without the std::filesystem wrapper to compare st_dev of tmp2 and final dirs; if they match rename directly from tmp2 to final plot file, otherwise go safely the copy & rename route

The code should now properly handle cases where the same physical directory is mounted in different logical locations, although platform-dependent caveats may prove to make them suboptimal. It seems the original idea of recognizing both /mnt/plots and /mnt/tmp2 as the same filesystem and using hard links/rename in those cases doesn't work in many cases due to Linux deliberate (?) limitations (c.f. https://unix.stackexchange.com/a/380033). That's why I ended up with direct copy instead of copy + rename.

@arvidn
Copy link
Contributor

arvidn commented Jun 9, 2021

perhaps this is a dumb question, but wouldn't it be a lot simpler and more robust to just rename the file, and if that fails, we can assume the files are on different volumes and fall back to copy.

I don't quite understand why we check:

if (tmp_2_filename.parent_path() == final_filename.parent_path())

and only attempt the rename in that case.

To cover the case where the final path points to the same file, I suppose we would technically have to stat() and look at the inode number. On the other hand, is this a case likely to come up in practice (except when the user is trying to mess things up)?

@Marandil
Copy link
Author

Marandil commented Jul 1, 2021

perhaps this is a dumb question, but wouldn't it be a lot simpler and more robust to just rename the file, and if that fails, we can assume the files are on different volumes and fall back to copy.

I initially thought that std::filesystem::rename would succeed in moving cross-volume, but it appears it's platform dependent, and at least on Linux it actually fails with errno 18, so you're right this could be used.

I'm not sure why the original authors went with this implementation, but now I see no reason not to go with the "try rename then copy" route. I'll check it out, test & update the request afterwards.

@Marandil
Copy link
Author

Marandil commented Jul 1, 2021

Alright @arvidn, I've tested & pushed the simpler solution. From what I can tell, it appears that rename is more or less equal to hardlink + remove, and neither works reliably across mount points. I've updated the request post accordingly.

Nevertheless, changing the "copy & rename" fallback strategy to simply "copy" ought to be enough to achieve "correctness" in all cases, there may be performance (or temporary space) penalties in the fallback cases.

We might still want to check e.g. if the directory paths given are equivalent via fs::equivalent and then select one of them as the canonical, but this should be beyond the scope here.

@github-actions
Copy link

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants