Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate records with incorrect versioning - NEON #54

Open
1 of 3 tasks
aclum opened this issue Dec 21, 2023 · 12 comments
Open
1 of 3 tasks

Duplicate records with incorrect versioning - NEON #54

aclum opened this issue Dec 21, 2023 · 12 comments
Assignees
Labels

Comments

@aclum
Copy link
Contributor

aclum commented Dec 21, 2023

This is a generic issue for when there are multiple records of a given workflow execution activity that are not versioned correctly. Specifically when there are two records with different IDs instead of the IDs being incremented.
We need to:

  • determine why this happened
  • fix bugs if necessary
  • clean up records on the file system and in mongo

Example I can see two directories on CFS and there are two records in mongo.
Two mags_activity_set on the file system and in mongo w/a query of {'was-informed_by' : 'nmdc:omprc-11-14ermv40'}
nmdc:wfmag-11-wpcgh271.1
nmdc:wfmag-11-v5475v49.1

@aclum
Copy link
Contributor Author

aclum commented Dec 21, 2023

related to microbiomedata/issues#547

@mbthornton-lbl
Copy link
Contributor

@Michal-Babins any chance that this might be related?
#22

@Michal-Babins
Copy link

Yes, it very well might be. I would check with Shane.

@aclum
Copy link
Contributor Author

aclum commented Feb 9, 2024

This is still happening. We should address this before we process anything else or we are just creating a cleanup exercise for ourselves. @Michal-Babins Do you have time to work on this next sprint?

@aclum
Copy link
Contributor Author

aclum commented Feb 9, 2024

TRiP that ran a few days ago has 3 annotations and 3 MAGs
aclum@perlmutter:login13:/global/cfs/cdirs/m3408/results/nmdc:omprc-11-9mvz7z22> ls -ltr
total 9
drwxrws--- 2 nmdcda m3408 4096 Jan 11 13:16 nmdc:wfrqc-11-t0tvnp52.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 13:11 nmdc:wfrbt-11-pmdhac23.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 14:54 nmdc:wfmgas-11-rcs4bt79.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 22:03 nmdc:wfmgan-11-4sc85678.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 22:04 nmdc:wfmgan-11-mmt28267.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 22:04 nmdc:wfmgan-11-hdaenp36.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 22:13 nmdc:wfmag-11-m8tn3y26.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 22:13 nmdc:wfmag-11-zcwca422.1
drwxrws--- 2 nmdcda m3408 4096 Feb 7 22:13 nmdc:wfmag-11-9dgz7m72.1

@Michal-Babins
Copy link

I generated records of duplicated found in annotation and mags, @mbthornton-lbl do we want to add these json dumps to the re-iding workflow and do it al in one sweep?

@mbthornton-lbl
Copy link
Contributor

Yes.

@Michal-Babins
Copy link

I added those here:
f86f4f6

@mbthornton-lbl
Copy link
Contributor

@Michal-Babins delete-old-records for the duplicates has been applied to the Napa DB instance

@ssarrafan
Copy link

Appears to be active. Moving to next sprint.

@ssarrafan
Copy link

@mbthornton-lbl can you please check to see if this got merged?
FYI @aclum

@aclum
Copy link
Contributor Author

aclum commented Mar 22, 2024

It appears we are done with the first two parts of this, Shane's new MAG runs incremented correctly. We still need to do data cleanup but that can wait until a future sprint. Let's backlog the remaining work for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

4 participants