Skip to content

AlgoLab/pangebin

Repository files navigation

Pangebin

Setup Python virtual environment

  • [For dev] Create the conda environment

    conda env create -n pangebin-dev -f config/condaenv_311-dev.yml
  • [For dev] Activate the conda environment

    conda activate pangebin-dev

Usage

Example with the dataset SAMN16357463:

dataset_dir="test/SAMN16357463"

Note: each script has a command clean

./script.sh clean $dataset_dir
  1. standardize GFA assembly graphs

    ./test/std_asm_graph.sh run $dataset_dir
  2. make pangenome graph with nextflow (make sur you have installed the command for the nextflow profile)

    ./test/pangenome.sh run $dataset_dir
  3. make pan-assembly graph

    ./test/panassembly.sh run $dataset_dir
  4. Obtain the GC probability scores of the fragments

    ./test/gc_prob_scores.sh run $dataset_dir
  5. Obtain gene density on the fragments

    1. Map the gene on the contigs from the two assemblers

      ./test/gene_mapping.sh blast $dataset_dir
    2. Filter the gene mappings

      ./test/gene_mapping.sh filter $dataset_dir
    3. Obtain the gene density on the fragments

      ./test/frag_gene_densities.sh run $dataset_dir
  6. Obtain the seed from positive gene densities

    ./test/fragment_seeds.sh run $dataset_dir
  7. Execute PlasBin-Flow modified for pan-assembly

    ./test/plasbin_panasm.sh run $dataset_dir

Pangebin-PlasBin-flow conversion

Use PlasBin-flow inputs to Pangebin

# pbf : PlasBin-flow
# pg : Pangebin
# Convert plasmidness
pangebin utils pbf-comp plm pbf_plasmidness.tsv pg_plasmidness.tsv
# Convert seeds
pangebin utils pbf-comp seeds pbf_seeds.tsv pg_seeds.tsv
# Recompute GC contents
pangebin sub gc from-gfa graph.gfa pg_gc_scores.tsv
# Run Pangebin on assembly graph
pangebin sub plasbin asm graph.gfa pg_seeds.tsv pg_gc_scores.tsv pg_plasmidness.tsv --outdir pg_outdir

Convert Pangebin outputs to PlasBin-flow output

pangebin utils pbf-comp bins pg_outdir pbf_bins.tsv

Going further into the details

Understanding GFA tags system:

About

Extracting plasmid contigs from Pangenome graphs of bacterial isolates.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •