Skip to content

Avoid global umask for setting file mode. #7547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ryan-clancy
Copy link

This PR updates the method for setting the permissions on cache_path after calling shutil.move. The call to shutil.move may not preserve permissions if the source and destination are on different filesystems. Reading and resetting umask can cause race conditions, so directly read what permissions were set for the temp_file instead.

This fixes #7536.

@@ -411,11 +412,14 @@ def temp_file_manager(mode="w+b"):
# GET file object
fsspec_get(url, temp_file, storage_options=storage_options, desc=download_desc, disable_tqdm=disable_tqdm)

# get the permissions of the temp file
temp_file_mode = stat.S_IMODE(os.stat(temp_file.name).st_mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't necessarily return the user's umask no ?

we need the user's umask since some people use datasets with shared disks and use group permissions (see #2157 #5799)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking more at the history of this code, I am not sure the umask block below is needed at all anymore and could just be deleted.

with FileLock(lock_path):
if resume_download:
incomplete_path = cache_path + ".incomplete"
@contextmanager
def _resumable_file_manager():
with open(incomplete_path, "a+b") as f:
yield f
temp_file_manager = _resumable_file_manager
if os.path.exists(incomplete_path):
resume_size = os.stat(incomplete_path).st_size
else:
resume_size = 0
else:
temp_file_manager = partial(tempfile.NamedTemporaryFile, dir=cache_dir, delete=False)
resume_size = 0
# Download to temporary file, then copy to cache dir once finished.
# Otherwise you get corrupt cache entries if the download gets interrupted.
with temp_file_manager() as temp_file:
logger.info(f"{url} not found in cache or force_download set to True, downloading to {temp_file.name}")
# GET file object
if scheme == "ftp":
ftp_get(url, temp_file)
elif scheme not in ("http", "https"):
fsspec_get(url, temp_file, storage_options=storage_options, desc=download_desc)
else:
http_get(
url,
temp_file,
proxies=proxies,
resume_size=resume_size,
headers=headers,
cookies=cookies,
max_retries=max_retries,
desc=download_desc,
)
logger.info(f"storing {url} in cache at {cache_path}")
shutil.move(temp_file.name, cache_path)
umask = os.umask(0o666)
os.umask(umask)
os.chmod(cache_path, 0o666 & ~umask)
is the code just after the umask was added and NamedTemporaryFile was used to create the files. NamedTemporaryFile always set 600 permissions and didn't respect umask so umask code was added to fix the permissions.

d536e37 removed the NamedTemporaryFile and moved to open(), which respects the umask.

We should just be able to remove this temp_file_mode + updated chmod lines and then remove:

umask = os.umask(0o666)
os.umask(umask)
os.chmod(cache_path, 0o666 & ~umask)

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Errno 13] Permission denied: on .incomplete file
2 participants