-
[For dev] Create the conda environment
conda env create -n pangebin-dev -f config/condaenv_311-dev.yml
-
[For dev] Activate the conda environment
conda activate pangebin-dev
Example with the dataset SAMN16357463
:
dataset_dir="test/SAMN16357463"
Note: each script has a command
clean
./script.sh clean $dataset_dir
-
standardize GFA assembly graphs
./test/std_asm_graph.sh run $dataset_dir
-
make pangenome graph with nextflow (make sur you have installed the command for the nextflow profile)
./test/pangenome.sh run $dataset_dir
-
make pan-assembly graph
./test/panassembly.sh run $dataset_dir
-
Obtain the GC probability scores of the fragments
./test/gc_prob_scores.sh run $dataset_dir
-
Obtain gene density on the fragments
-
Map the gene on the contigs from the two assemblers
./test/gene_mapping.sh blast $dataset_dir
-
Filter the gene mappings
./test/gene_mapping.sh filter $dataset_dir
-
Obtain the gene density on the fragments
./test/frag_gene_densities.sh run $dataset_dir
-
-
Obtain the seed from positive gene densities
./test/fragment_seeds.sh run $dataset_dir
-
Execute PlasBin-Flow modified for pan-assembly
./test/plasbin_panasm.sh run $dataset_dir
# pbf : PlasBin-flow
# pg : Pangebin
# Convert plasmidness
pangebin utils pbf-comp plm pbf_plasmidness.tsv pg_plasmidness.tsv
# Convert seeds
pangebin utils pbf-comp seeds pbf_seeds.tsv pg_seeds.tsv
# Recompute GC contents
pangebin sub gc from-gfa graph.gfa pg_gc_scores.tsv
# Run Pangebin on assembly graph
pangebin sub plasbin asm graph.gfa pg_seeds.tsv pg_gc_scores.tsv pg_plasmidness.tsv --outdir pg_outdir
pangebin utils pbf-comp bins pg_outdir pbf_bins.tsv
Understanding GFA tags system: