Skip to content

Haploid Haplotype Reconstruction #88

@gsc74

Description

@gsc74

What is your question?
@eblerjana I am working on reconstructing a haploid haplotype using the imputed genotypes from PanGenie. Currently, I am using the following commands:

PanGenie -i Reads.fq -r MHC-CHM13.ref.fa -v MHC_49-MC.vcf -o temp/APD_PG -t32 && bgzip temp/APD_PG_genotyping.vcf
tabix -p vcf temp/APD_PG_genotyping.vcf.gz && rm -rf APD_rec_PG.fasta
bcftools view -e 'GT="het"' temp/APD_PG_genotyping.vcf.gz | bgzip > temp/APD_PG_genotyping_no_homo.vcf.gz && tabix -p vcf temp/APD_PG_genotyping_no_homo.vcf.gz
bcftools consensus -f MHC-CHM13.ref.fa -o Rec_PG.fasta temp/APD_PG_genotyping_no_homo.vcf.gz

In the above commands, I am using haploid reads to obtain genotypes, then filtering the heterozygous variants, and finally using the filtered genotypes to reconstruct the haploid haplotype from the imputed filtered genotypes.

My question is: Is this the correct way to use PanGenie to reconstruct haplotypes? The input VCF is a phased diploid VCF generated by the minigraph-cactus pipeline and preprocessed with the "prepare-mc-vcf" pipeline.

If applicable: which version of PanGenie are you using?
v3.1.0

If applicable: how did you run PanGenie?
Please provide the command lines used. Did you run it using Singularity?
I've used conda to install PanGenie

If applicable: what data are you running PanGenie on?
Which species are you analyzing? Which input reads are used? How does the input VCF look like (number of input samples, how was it produced etc.)?
MHC VCF file generated using Minigraph-Cactus pipeline and preprocessed using "prepare-mc-vcf" pipeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions