Skip to content

Add is_file property to RemoteData #6807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GeigerJ2
Copy link
Contributor

No description provided.

@GeigerJ2 GeigerJ2 changed the title Add is_file property to RemoteData. Add is_file property to RemoteData Mar 25, 2025
@GeigerJ2 GeigerJ2 requested a review from agoscinski March 25, 2025 14:57
Copy link

codecov bot commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 78.31%. Comparing base (660fec7) to head (7b80993).

Files with missing lines Patch % Lines
src/aiida/orm/nodes/data/remote/base.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6807      +/-   ##
==========================================
+ Coverage   78.31%   78.31%   +0.01%     
==========================================
  Files         566      566              
  Lines       42762    42770       +8     
==========================================
+ Hits        33484    33491       +7     
- Misses       9278     9279       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least extend the tests/orm/nodes/data/test_remote.py tests if they work with RemoteData constructed from a file (We can extend remote_data_factory by an argument and then parametrize the tests). Also cmdline/commands/test_data.py::TestVerdiDataRemote. Also we might want to extend the test tests/engine/daemon/test_execmanager.py::test_upload_file_copy_operation_order

We need to also get more consistent with remote data documentation since it not always refers to files.

Also methods getfile, listdir, listdir_withattributes, from RemoteData should return a dedicated error message if the target is not a directory and we need to document this error.

Sadly we cannot use type checker to enforce error capture in this case (if developer uses function listdir please ensure that developer is capturing a specific error e.g. ErrorIsFile). One can only do this when an additional argument is passed but that would be weird since the user is also exposed to this extra arg even if we give it a default argument

    @overload
    def listdir(self, file_type: Literal[Type.FILE]) -> Never:
        ...
    @overload
    def listdir(self, file_type: Literal[Type.FOLDER]) -> int:
        ...
    def listdir(self, file_type: Type) -> Never | int:
        if file_type == Type.FILE:
            raise ValueError()
        elif file_type == Type.FOLDER:
            return 5 # everything fine
        else:
            assert_never(self._file_tye)

remote_data = remote_data_factory(mode=mode)
assert remote_data.is_file is False

remote_data = remote_data_factory(mode=mode, store=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we store the node

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we store it, we cannot use set_remote_path anymore. Also, in the current remote_data_factory, I'm not sure if we can directly pass a path to a file, due to the hardcoded call:

        (tmp_path / 'file.txt').write_bytes(content)

def is_file(self):
"""Return whether the ``RemoteData`` points to a file, rather than a folder."""
if self.is_cleaned:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should ignore here is_cleaned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I was also unsure if this is a case we should cover.

@GeigerJ2
Copy link
Contributor Author

Hi @agoscinski, thanks for the input! Agree with all points, this was a very low-effort PR :D Though, the more I think about this, the more I think we should solve this issue properly by introducing a RemoteSinglefileData class... If RemoteData is assumed to point to a directory throughout the whole code base, it might otherwise give us lots of headache. In that case, rather than adding the is_file check to RemoteData like I did in this PR, along with introducing RemoteSinglefileData, I'd disallow even constructing a RemoteData with a path to a file in the first place. What do you think?

@agoscinski
Copy link
Contributor

I agree. Probably we should create a base class for RemoteData and RemotesinglefileData that contains the get_authinfo and get_size_on_disk methods.

Something I have not thought through because I am not sure how the migration of the database works: Maybe we can create a new class RemoteFolderData that replaces RemoteData so we can use RemoteData as base class. It is not a breaking change if a database migration fixes it. In this case we just need change the class by reconstruction the objects in the migration process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants