-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to fix duplicated URLs annotations #57
base: master
Are you sure you want to change the base?
Conversation
To the best of my knowledge, genes, compounds, phenotypes & antibodies are the primary categories where the map annotations include URLs linking to reference resources transformed into icons via the omero-mapr app. |
Added compounds, phenotypes & antibodies to the script. But couldn't detect any duplicated annotation URLs issue with respect to that. Gene map annotations seems to be only place with this issue. |
Do you have a rough estimate of how many genes would be updated by this cleanup script ? |
Running it against idr, that would be: 16643 map annotations. Quite a lot. But the output looks reasonable, I've not seen anything obviously wrong. |
One potential issue might be, although the "NamedValue" with the URL will be deleted, the index of the other NamedValues of the MapAnnotation won't be updated (unless that happens automatically on the DB level!?). But I can't imagine that this will be a problem, is it? |
It should not but it's definitely worth testing by creating a test MapAnnotation with several mapValue elements and deleting them. From https://github.com/ome/openmicroscopy/blob/4cf02434d556ebc24a3a8ccf333d9af78b8d4bf4/sql/psql/OMERO5.4__0/schema.sql#L64, the combination of annotation ID and the mapValue index must be unique for each row as it constitutes the primary key but there does not seem to be any constraint that the |
The script generates SQL commands to delete duplicated URLs.
E.g. to remove duplicated gene URLs like
http://ncbi...
/https://ncbi...
runpython fix_annotations.py ".+ncbi\.nlm\.nih\.gov\/gene\/(?P<ID>.+)" "openmicroscopy.org/mapr/gene"
Output:
Tested on pilot-idr150, see for example gene annotation FAM96B
IDR (https://idr.openmicroscopy.org/webclient/?show=image-12715003 ):
I've not tested it on compounds or other URLs yet.