Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to fix duplicated URLs annotations #57

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dominikl
Copy link
Member

The script generates SQL commands to delete duplicated URLs.

E.g. to remove duplicated gene URLs like http://ncbi... / https://ncbi... run
python fix_annotations.py ".+ncbi\.nlm\.nih\.gov\/gene\/(?P<ID>.+)" "openmicroscopy.org/mapr/gene"

Output:

-- Symbol: FAM96B - Annotation ID: 6733961
-- URLs: ['https://www.ncbi.nlm.nih.gov/gene/51647', 'http://www.ncbi.nlm.nih.gov/gene/51647'] - keep: https://www.ncbi.nlm.nih.gov/gene/51647
DELETE FROM annotation_mapvalue mv WHERE mv.annotation_id = 6733961 AND mv.value = 'http://www.ncbi.nlm.nih.gov/gene/51647';

-- Symbol: TLR10 - Annotation ID: 6742661
-- URLs: ['https://www.ncbi.nlm.nih.gov/gene/81793', 'http://www.ncbi.nlm.nih.gov/gene/81793'] - keep: https://www.ncbi.nlm.nih.gov/gene/81793
DELETE FROM annotation_mapvalue mv WHERE mv.annotation_id = 6742661 AND mv.value = 'http://www.ncbi.nlm.nih.gov/gene/81793';
...

Tested on pilot-idr150, see for example gene annotation FAM96B
Screenshot 2023-08-22 at 11 37 48

IDR (https://idr.openmicroscopy.org/webclient/?show=image-12715003 ):
Screenshot 2023-08-22 at 11 38 43

I've not tested it on compounds or other URLs yet.

@dominikl
Copy link
Member Author

Also are there another annotation types which need to be considered, cell lines, etc. /cc @francesw @sbesson ? Maybe add some examples which could be used for testing to this PR? Thanks!

@sbesson
Copy link
Member

sbesson commented Aug 22, 2023

To the best of my knowledge, genes, compounds, phenotypes & antibodies are the primary categories where the map annotations include URLs linking to reference resources transformed into icons via the omero-mapr app.

@dominikl
Copy link
Member Author

Added compounds, phenotypes & antibodies to the script. But couldn't detect any duplicated annotation URLs issue with respect to that. Gene map annotations seems to be only place with this issue.

@sbesson
Copy link
Member

sbesson commented Aug 22, 2023

Do you have a rough estimate of how many genes would be updated by this cleanup script ?

@dominikl
Copy link
Member Author

Running it against idr, that would be: 16643 map annotations. Quite a lot. But the output looks reasonable, I've not seen anything obviously wrong.

@dominikl
Copy link
Member Author

One potential issue might be, although the "NamedValue" with the URL will be deleted, the index of the other NamedValues of the MapAnnotation won't be updated (unless that happens automatically on the DB level!?). But I can't imagine that this will be a problem, is it?

@sbesson
Copy link
Member

sbesson commented Aug 22, 2023

But I can't imagine that this will be a problem, is it?

It should not but it's definitely worth testing by creating a test MapAnnotation with several mapValue elements and deleting them.

From https://github.com/ome/openmicroscopy/blob/4cf02434d556ebc24a3a8ccf333d9af78b8d4bf4/sql/psql/OMERO5.4__0/schema.sql#L64, the combination of annotation ID and the mapValue index must be unique for each row as it constitutes the primary key but there does not seem to be any constraint that the index value must be in the [0 N] range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants