-
Notifications
You must be signed in to change notification settings - Fork 43
Description
I hope this message finds you well. I am currently working on a genome analysis project inspired by your paper "Taurine pangenome uncovers a segmental duplication upstream of KIT associated with depigmentation in white-headed cattle." Specifically, I am following a similar approach by partitioning the entire pangenome into windows and calculating the Jaccard similarity within each window.
However, I have encountered a significant challenge during this process. The command odgi extract -i pig.renamed.sorted.og -r A#chr1:1-1000 -o chr1_1_1000.window_subgraph.og consumes a substantial amount of computational resources. Extracting just one subgraph from the full pangenome graph requires around 10 CPU cores. My pangenome consists of 30 genomes, each approximately 2G in size. As a result, extracting subgraphs in 1000 bp windows across the entire graph becomes computationally intensive and difficult to scale.
I was wondering if you might have any suggestions or alternative strategies to reduce the computational burden of this step, or to make the extraction process more efficient.
Thank you very much for your time and any insights you can share.