Skip to content

Conversation

@bkille
Copy link
Contributor

@bkille bkille commented Feb 5, 2025

Many CIGAR strings were being parsed multiple times. This PR adds a hashmap which stores recently parsed CIGARs. Right now, the hashmap is cleared once it reaches some predefined size (1000 entries), but that should likely be changed to be dependent on some property of the input. The fetch_cigar_ops

Also, as far as I understand it, the use of RefCell means that the Impg struct cannot be shared across threads, so if that's an issue other options can be explored. The only reason for using RefCell was to avoid making the entire struct mutable for all calls. Another option could be to just keep the hashmap completely separate from the impg struct...

Main:

        Command being timed: "./target/release/impg-main partition --paf-file test-data/primates16.20231205_wfmash-v0.12.5.paf --sequence-prefix chm13#1 --window-size 1000000000 -v 1 --max-depth 20 --min-transitive-region-size 500 --min-distance-between-ranges 10 --min-mask-proximity 5000"
        User time (seconds): 11772.75
        System time (seconds): 150.78
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:19:17
        Maximum resident set size (kbytes): 164952

This PR:

Command being timed: "./target/release-with-debug/impg partition --paf-file test-data/primates16.20231205_wfmash-v0.12.5.paf --sequence-prefix chm13#1 --window-size 1000000000 -v 1 --max-depth 20 --min-transitive-region-size 500 --min-distance-between-ranges 10 --min-mask-proximity 5000"
        User time (seconds): 2747.47
        System time (seconds): 173.49
        Percent of CPU this job got: 92%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 52:40.53
        Maximum resident set size (kbytes): 3269220

@AndreaGuarracino
Copy link
Member

I will hold with this one for now, as tests are not 100% clear, particularly regarding how many CIGAR strings to cache. The more is not the better, of course ;(

@AndreaGuarracino AndreaGuarracino marked this pull request as draft February 17, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants