Skip to content

videos: transform doi and collections #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2025

Conversation

zubeydecivelek
Copy link
Contributor

@zubeydecivelek zubeydecivelek commented Jun 10, 2025

closes CERNDocumentServer/cds-videos#2041
closes CERNDocumentServer/cds-videos#2042

New rules

  • Tag 0247 (DOI) transformed as DOI if it's starting with 10.17.181, or alternate_identifiers with DOI scheme.
  • Tag 980 collections transformed as collections
  • Tag 964 transformed as _curation.964 with marc tags (964__a:....)
  • Tag 853 transformed as _curation.853 with marc tags (853__a:....)
  • Tag 336 transformed as _curation.336 with marc tags (336__a:....)

Improvements

  • Tag269(imprint) b name of publication transformed as contributor with role Producer
  • Tag041(language) if we have multiple languages, first one used as a main language and others added asadditional_languages

@for_each_value
def collection_tags(self, key, value):
"""Translates collection_tags."""
collection_mapping = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have also this collection and records are identified there with the following query: 490:'CERN Accelerator School' and 690C:'TALK'. For these records, we will need to add also the extra tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'll check the 490:'CERN Accelerator School' and 690C:'TALK' and add the tag for this collection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

490 is Series and it doesn’t have a rule yet, I'm adding this to my TODO, and add it later with 490 is it okay? And do we need to check both 490 and 690 or only 490 is enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm because our search query gives back only video content, having only the 490 check should be enough. You can verify that you get all the records of the above collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ntarocco any comment on how to store the collection tags? Do you think it would be better to keep somewhat the tree structure e.g tags: ['Lectures/Academic training lectures'] or a flat representation is enough e.g tags: ['Lectures', 'Academic training lectures']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzacharo I checked the collection and 350/396 record is in our search query, 20 of them doesn’t have video files. I don’t know the rest 26 records😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 20 missing, maybe ask Rene to see if they can retrieve the video, otherwise we will need to import them as metadata only or in a format that we could potentially in the future edit them. For the missing 26 records, we need to understand why...

Comment on lines +690 to +733
if doi.startswith("10.17181"):
return doi
Copy link
Contributor Author

@zubeydecivelek zubeydecivelek Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just use the same DOI value and this DOI is keeping the cds record. It's redirecting to cds record, and after migration cds record will redirect to videos. Is this okay or do we need to update datacite? @zzacharo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to gather all the related records and update datacite in fact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not missing we have only one record with 10.17181 doi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we have more with a different DOI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I guess those records don’t have videos
search

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the list down to see what we will do with these records, maybe add an issue or worth actually sending the list to Jens. Btw, https://cds.cern.ch/record/276197 this one also has a link to CERN library catalogue so if we migrate them we need to keep the link!

@zubeydecivelek zubeydecivelek force-pushed the new-rules branch 4 times, most recently from d8052cd to d832319 Compare June 13, 2025 12:59
"TP": "Lectures,Talks Seminars and Other Events,Teacher Programmes",
"e-learning": "Lectures,E-learning modules",
"E-LEARNING": "Lectures,E-learning modules",
"Restricted_ATLAS_Talks": "Lectures,Restricted ATLAS Talks",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that this can be just

Suggested change
"Restricted_ATLAS_Talks": "Lectures,Restricted ATLAS Talks",
"Restricted_ATLAS_Talks": "Lectures, ATLAS Talks",

with just the proper restrisctions but let's leave it like this for now and discuss it when see the collection tree

@zubeydecivelek zubeydecivelek force-pushed the new-rules branch 2 times, most recently from dbb8965 to 5dcb91b Compare June 13, 2025 14:05
@zzacharo zzacharo merged commit b09f00e into CERNDocumentServer:master Jun 13, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement rule for DOI Implement transformation for collections
2 participants