-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Hello!
I have a BED file from gistic which I converted to the format accepted by AnnotSV. It contains these columns:
chr chromStart chromEnd SVTYPE Samples_ID q.values cn_confidency
Each sample is in a line of the file, even if they have the same chromStart and chromEnd
I have used:
AnnotSV -SvinputFile input..bed -annotationsDir /annotsv/AnnotSV_annotations/ -genomeBuild GRCh38 -samplesidBEDcol 5 -svtBEDcol 4 -outputFile output.tsv
The annotations work fine, but I find two suggestions that could be improved upon:
First, my extra columns q.values and cn_confidency gets renamed to user#1 and user#2. For two columns that's not a problem, but for many it might be slightly boring to rename. I think they should be named the same as the original automatically
Second, the CNVs are duplicated for each sample that has it, even if the SV_start and SV_end are the exact same. Is there a way to collapse all exact intervals by using comma? So if samples A B and C have the same variant with same type, SAMPLES_ID show "A, B,C". Maybe even merge different intervals according to a user-defined distance/overlap percentage?