-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration of CERN Digitized Videos #1926
Comments
Current state of the project: Note: everything was tested using only a local instance/sandbox. Not tested with real CDS Videos. The migration service fetches (digital memory project and public and not migrated and not announcement) records from (real) CDS and parse them properly (see the documentation to explore all the options regarding single record or multiple records processing). Most tags are now identified and treated (please see the CDS-Dojson PR with all the fixes and updates) to create a The upload function is also working and properly triggers the upload process to CDS Videos. The status of the records check is working and the chunk process also works, making sure you only transfer 10 videos at a time. A Additionally, the script that generates the CDS Things that still need to be done:
|
Useful links
Data model changes
The first step is to analyze the data model of CDS Videos and understand what changes should be done. Given that, in the future, we will migrate CDS Videos to CDS, the data model changes should be compatible with the InvenioRDM data model (and custom fields).
Extra fields
We should evaluate if these extra fields could go to a JSON blob field, allowing key/values, and the impact of this solution on search capabilities.
Category
It makes sense to import them with
category:CERN
, given that these are CERN official videos.Owners
It is not yet clear who should the owner of these records and who can edit metadata. To be discussed and decided.
For curation, we should probably create a group "multimedia curators" and decide who goes in.
Considerations
There are duplicated videos: same videos, already in CDS Videos, have been re-digitized. They have the same recid. Both videos, old and new, should be kept. We need to check if the data model supports it.
Same for metadata, metadata of existing videos should be enriched by the newly digitized ones.
Relevant code
We should re-use cds-dojson module and the fields rules for CDS Videos.
See documentation of
dojson
: https://dojson.readthedocs.io/en/latest/usage.html for examples.We will create a branch e.g.
digitization-2023
in cds-dojson, where we can apply the modifications to the CDS Videos schema and add new conversion rules.We should update the README to explain why the new branch, with relevant links to the digitization project/process.
cds-dojson
usage example:The text was updated successfully, but these errors were encountered: