-
Notifications
You must be signed in to change notification settings - Fork 4
cdskit rmseq
Kenji Fukushima edited this page Mar 6, 2023
·
2 revisions
cdskit rmseq removes a subset of sequences by using a sequence name regex and by detecting problematic sequence characters.
cdskit rmseq -s input.fasta --seqname "Arabidopsis_thaliana.*" --problematic_percent 50 -o output.fasta
>Aquilegia_coerulea_1
AGAGTTCAATATGCTTTGAGTCGAATTCGTAACAATGCTAGAAATCTTCTTACTCTTGAT
>Aquilegia_coerulea_2
AGAGTTCAATATGCTTTAAGTCGAATTCGAAACAATGCTAGAAATCTTCTCACTCTGGAT
>Aquilegia_coerulea_3
AGAGTTCAATATGCTTTAAGTCGAATTCGTAACAATGCAAGAAATCTTCTTACACTTGAT
>Hylocereus_undatus_1
AGGGTCCAATATGTTCTGAGCCGTATCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>Hylocereus_undatus_2
AGGGTTCAATACGTTCTGAGCCGTATCCGTAATGCTGCAAGGCATCTTCTTACCCTGGAT
>Hylocereus_undatus_3
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGCGGCAAGGCACCTTCTCACTCTGGAT
>Arabidopsis_thaliana_1
AGAGTTCAATATACACTTAGCAGAATCCGTAATGCTGCAAGAGAACTCTTAACTCTTGAT
>Arabidopsis_thaliana_2
AGAGTGCAGTACTCTCTTAGCCGTATCCGTAATGCTGCTAGAGATCTTTTGACTCTTGAT
>Aquilegia_coerulea_1
AGAGTTCAATATGCTTTGAGTCGAATTCGTAACAATGCTAGAAATCTTCTTACTCTTGAT
>Aquilegia_coerulea_2
AGAGTTCAATATGCTTTAAGTCGAATTCGAAACAATGCTAGAAATCTTCTCACTCTGGAT
>Aquilegia_coerulea_3
AGAGTTCAATATGCTTTAAGTCGAATTCGTAACAATGCAAGAAATCTTCTTACACTTGAT
>Hylocereus_undatus_2
AGGGTTCAATACGTTCTGAGCCGTATCCGTAATGCTGCAAGGCATCTTCTTACCCTGGAT