-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Hi,
I would like to understand why the graph I generated using the tutorial commands looks different from the one shown in your example.
./pggb -i data/HLA/DRB1-3123.fa.gz -n 12 -t 16 -V 'gi|568815561' -o out -M
The tutorial uses the above command. Since gi|568815561 does not comply with the naming convention, I renamed the sequences using:
zcat DRB1-3123.fa.gz | awk '/^>/{print ">sample"++i"#1#chr6"} !/^>/' > renamed.fa
This generated a new FASTA file, and I then created a new .fai index with faidx. After that, I ran:
pggb -i ./HLA/renamed.fa -n 12 -t 16 -V 'sample4' -o ./out -M
However, the resulting 2D plot is still different from the one in the tutorial.
I noticed that the output image name looks like this:
DRB1-3123.fa.gz.pggb-E-s5000-l15000-p80-n10-a0-K16-k8-w50000-j5000-e5000-I0-R0-N.smooth.chop.og.lay.draw_mqc
and includes parameters that differ from the tutorial command. So I tried:
pggb -i ./HLA/renamed.fa -n 12 -t 16 -V 'sample4' -o ./out -M
pggb -i ./HLA/renamed.fa -p 80 -n 12 -K 16 -k 8 -t 16 -V 'sample4' -o ./out -M
But the graph still looks different.
I also tried specifying the provided PAF file under /data/paf (and updated the names accordingly), but the generated graph still turned out relatively simple.
Perhaps I could get the exact command used to generate the sample graph? I’d like to understand how this dataset was processed to produce a graph with so many cycles.
