Skip to content

Documented API update from RFC 0072 #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mesemus
Copy link

@mesemus mesemus commented Jan 19, 2025

Description

This PR contains documentation of API changes proposed in RFC 0072.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

  • I'm aware of the code of conduct.
  • I've created logical separate commits and followed the commit message format.
  • I've targeted the master branch.
  • If this documentation change impacts the current release of InvenioRDM, I will backport it to the production branch following approval or indicate to a maintainer that it should be backported.

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@mesemus
Copy link
Author

mesemus commented Jan 19, 2025

Connected pull requests:

RFC 0072
Implementation in invenio-records-resources
S3 multipart implementation
invenio-rdm-records - depends on this PR

...
]
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add in that file size and checksum are optional parameters?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@tmorrell tmorrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be included in v13. There is a bit on the permissions that should be checked before merging.

from invenio_administration.generators import Administration

class MyRepositoryPermissionPolicy(RDMRecordPermissionPolicy):
can_draft_create_files = RDMRecordPermissionPolicy.can_draft_transfer_files + [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these permissions have been implemented. It works out of the box for me without modifying permissions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look at it today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmorrell

You are right, thanks for pointing this out. @slint - please have a look at what decision you prefer:

For FETCH and REMOTE transfers, one needs to specify a list of domains inside invenio.cfg (RECORDS_RESOURCES_FILES_ALLOWED_DOMAINS for fetch, RECORDS_RESOURCES_FILES_ALLOWED_REMOTE_DOMAINS for remote) from which users can fetch remote files/link to. This list is initially empty, meaning no one can use fetch or remote. If this variable is defined, due to my omission, anyone who can edit a record could use (via API - no support in UI at the moment) these transports.

As these operations might be dangerous (depletion of background workers/invalidation of repository best practices), it would be better to disable these in RDM permissions and let the repository administrator enable them by defining explicit RDM permissions. This process isn't straightforward at the moment as care must be taken when overriding can_xyz which is used elsewhere.

The decision needed:

  1. Should we add these restricted permissions? It's a simple PR going as:
# inside invenio_rdm_records/services/permissions.py

- can_draft_create_files = can_review
+ can_draft_create_files = [
        IfTransferType(LOCAL_TRANSFER_TYPE, can_review),
        IfTransferType(MULTIPART_TRANSFER_TYPE, can_review),
        SystemProcess(),
    ]

And users would then inherit from RDM permissions as outlined in: https://github.com/inveniosoftware/docs-invenio-rdm/blob/8fefe9e26604a3685a517f59209b714805b5a515/docs/reference/file_storage.md#security-1

  1. Keep the current can_draft_create_files form, and explain in docs that if repository administrator wants to tighten security, they can do it by overriding as:
# inside repository invenio.cfg
+ can_draft_create_files = [
        IfTransferType(LOCAL_TRANSFER_TYPE, can_review),
        IfTransferType(MULTIPART_TRANSFER_TYPE, can_review),
        IfTransferType(REMOTE_TRANSFER_TYPE, can_something_else),
        SystemProcess(),
    ]
  1. Do something else, such as allow fetch & remote only for administration access by default:
# inside invenio_rdm_records/services/permissions.py
+ can_draft_create_files = [
        IfTransferType(LOCAL_TRANSFER_TYPE, can_review),
        IfTransferType(MULTIPART_TRANSFER_TYPE, can_review),
        IfTransferType(FETCH_TRANSFER_TYPE, [Administration()]),
        IfTransferType(REMOTE_TRANSFER_TYPE, [Administration()]),
        SystemProcess(),
    ]

What would you prefer? Thanks.

Copy link
Member

@slint slint Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear options :)

I'm leaning towards 1., not allowing the Remote and Fetch transfer types at all, because this is already a pretty advanced feature and it's very likely that one will need to change both configuration + permissions anyways.

Once we have a better idea on how to integrate these features in the UI, we could explore the path for making the config ergonomics for the future better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slint I've added:

Please have a look if can be merged.

@tmorrell tmorrell added this to v13.0.0 Jun 10, 2025
@tmorrell tmorrell moved this to 👀 In review in v13.0.0 Jun 10, 2025
@mesemus mesemus force-pushed the contribution-file-storage branch from ca49043 to 8fefe9e Compare June 11, 2025 11:03
@mesemus mesemus marked this pull request as ready for review June 11, 2025 11:04
@fenekku
Copy link
Collaborator

fenekku commented Jun 11, 2025

Note that in upcoming "Reference" section refactor, File storage is going to be moved around/edited for clarity of audience and purpose. Currently it mixes REST API usage, with explanation of internals, with configuration for site administrators. You can separate those concerns out if you want, otherwise I will if that reorg lands after this PR.

@mesemus
Copy link
Author

mesemus commented Jun 23, 2025

Note that in upcoming "Reference" section refactor, File storage is going to be moved around/edited for clarity of audience and purpose. Currently it mixes REST API usage, with explanation of internals, with configuration for site administrators. You can separate those concerns out if you want, otherwise I will if that reorg lands after this PR.

@fenekku I'd prefer if you could separate the concerns and move them around as I have not been following the changes to the documentation and would probably put those inconsistently with the rest of the documentation :) Thanks.

@fenekku
Copy link
Collaborator

fenekku commented Jun 25, 2025

Note that in upcoming "Reference" section refactor, File storage is going to be moved around/edited for clarity of audience and purpose. Currently it mixes REST API usage, with explanation of internals, with configuration for site administrators. You can separate those concerns out if you want, otherwise I will if that reorg lands after this PR.

@fenekku I'd prefer if you could separate the concerns and move them around as I have not been following the changes to the documentation and would probably put those inconsistently with the rest of the documentation :) Thanks.

No problem! It's been done now, so you can rebase. The separation is not yet ideal, but keeps content close enough to where it should be. If any issue with the rebase let me know, and I can help.

@mesemus mesemus force-pushed the contribution-file-storage branch 2 times, most recently from 467cd68 to 4e7825b Compare June 27, 2025 10:12
@mesemus mesemus force-pushed the contribution-file-storage branch from 4e7825b to 69b741a Compare June 27, 2025 10:22
@mesemus
Copy link
Author

mesemus commented Jun 27, 2025

No problem! It's been done now, so you can rebase. The separation is not yet ideal, but keeps content close enough to where it should be. If any issue with the rebase let me know, and I can help.

@fenekku Rebased, please have a look if it was done correctly. Thanks.

Copy link
Collaborator

@fenekku fenekku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebase and content is really good. Let's see if @slint or @tmorrell have remaining comments and then we can merge.


- *Local*, which represents the files that are managed by the InvenioRDM instance,
independently of the backend.
- *Fetch*, these are files that are not immediately managed by the instance as they need to be downloaded first.
This means that they will eventually become _local_ files.
- *Multipart*, these are files that are uploaded in parts. User can upload parts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- *Multipart*, these are files that are uploaded in parts. User can upload parts
- *Multipart*, these are files that are uploaded in parts. Users can upload parts

"type": "F",
"url": "https://example.org/files/dataset.zip?token=<auth token>"
}
"metadata": {...}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that addition in the RFC. Is this new? (I thought metadata was an auto-filled read-only field post upload). Actually, perhaps we can remove that section ("Example of selecting transfer type ... " with code block) since it's covered below for each individual case with appropriate content?

At this point an asynchronous task will be launched and the file will be transported into
the InvenioRDM instance. Once the file transfer is completed, the status field will be
changed to `completed`. At this point the `storage_class` of the files has also changed
changed to `completed`. At this point the `transfer.type` of the files has also changed
to `L`. The status can be checked using the _files_ url (`/api/records/{id}/draft/files`).
Note, until all the files have been transferred (i.e. their status is `completed`) the
record cannot be published.

More over, while files are being transferred requests to the `content` and `commit`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
More over, while files are being transferred requests to the `content` and `commit`
Moreover, while files are being transferred requests to the `content` and `commit`

1. **Network Access**: The file’s URL is reachable from the user’s network.
2. **No Sensitive Data**: The URL does not include any sensitive information (such as tokens).

By default, Invenio refuses references to external files. Files can only be referenced
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to keep the whole doc in the context of InvenioRDM

Suggested change
By default, Invenio refuses references to external files. Files can only be referenced
By default, InvenioRDM refuses references to external files. Files can only be referenced

@@ -3,25 +3,45 @@
Two different concepts are involved in the storing of files in InvenioRDM. One is the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well rename this file: file_transfer.md and title it File transfer since that's really what you've eloquently made clear with this and the RFC 👍

@fenekku fenekku mentioned this pull request Jun 27, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

4 participants