-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Hi :)
I am annotating VCF files from SAVANA CNA caller using annotSV and am a bit curious as to how the split function works.
It seems very useful to have one row in the df corresponding to each gene effect, instead of each SV, however I seem to have som problem getting AnnotSV to split the data like that:
With each SV having numerous gene names listed.
Is this simply because I run larger CNAs through the algorithm, and so it cannot separate them into singular genes? Or am I not setting it up correctly?
Here is the script I have run:
#!/bin/bash
#SBATCH --job-name=annotsv_81_split
#SBATCH --output=annotsv_81_split.out
#SBATCH --error=annotsv_81_split.err
#SBATCH --time=4:00:00
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --account=Renal_long_read
####################################### File Paths and Directories #######################################
INPUT_VCF="/faststorage/project/Renal_long_read/derived_data/jesperj/savana/CNA_analysis/SAVANA_CNA_output/patient_81/81_segmented_absolute_copy_number.vcf"
FIXED_VCF="/faststorage/project/Renal_long_read/derived_data/jesperj/savana/CNA_analysis/SAVANA_CNA_output/patient_81/81_segmented_absolute_copy_number_fixed.vcf"
BASE_OUTPUT_DIR="/faststorage/project/Renal_long_read/derived_data/jesperj/savana/annotation/Patient_81_split"
OUTPUT_DIR="$BASE_OUTPUT_DIR"
OUTPUT_TSV="$OUTPUT_DIR/81_CNA_annotated.tsv"
OUTPUT_VCF="$OUTPUT_DIR/81_CNA_annotated.vcf"
GENOME_BUILD="GRCh38"
Directory for annotation files
ANNOTSV_ANNOTATIONS="/home/jesperjespersen/AnnotSV_annotations"
####################################### Create Output Directory #######################################
mkdir -p "$OUTPUT_DIR"
####################################### Input File Check and VCF Header Fix #######################################
if [[ ! -f "$INPUT_VCF" ]]; then
echo "Error: Input VCF file not found!"
exit 1
fi
Fix VCF header if the FORMAT field is missing
grep "^#CHROM" "$INPUT_VCF" | grep -q "FORMAT" ||
awk 'BEGIN {OFS="\t"}
/^#CHROM/ { print $0, "FORMAT", "SAMPLE"; next }
!/^#/ { print $0, "GT", "0/1" }' "$INPUT_VCF" > "$FIXED_VCF"
If no change was made, copy the original file to FIXED_VCF
if [[ ! -s "$FIXED_VCF" ]]; then
cp "$INPUT_VCF" "$FIXED_VCF"
fi
####################################### Run AnnotSV with Split Analysis #######################################
AnnotSV
-SVinputFile "$FIXED_VCF"
-genomeBuild "$GENOME_BUILD"
-annotationsDir "$ANNOTSV_ANNOTATIONS"
-outputFile "$OUTPUT_TSV"
-outputDir "$OUTPUT_DIR"
-annotationMode split
-vcf 1
Move the generated VCF to the final location (if created)
GENERATED_VCF="${OUTPUT_DIR}/$(basename "$FIXED_VCF" .vcf).annotated.vcf"
if [[ -f "$GENERATED_VCF" ]]; then
mv "$GENERATED_VCF" "$OUTPUT_VCF"
else
echo "Warning: AnnotSV did not generate a VCF file as expected."
fi
Best regards
Jesper :)