Skip to content

Colapsing multisample BED file #292

@GACGAMA

Description

@GACGAMA

Hello!
I have a BED file from gistic which I converted to the format accepted by AnnotSV. It contains these columns:

chr chromStart chromEnd SVTYPE Samples_ID q.values cn_confidency

Each sample is in a line of the file, even if they have the same chromStart and chromEnd

I have used:
AnnotSV -SvinputFile input..bed -annotationsDir /annotsv/AnnotSV_annotations/ -genomeBuild GRCh38 -samplesidBEDcol 5 -svtBEDcol 4 -outputFile output.tsv

The annotations work fine, but I find two suggestions that could be improved upon:

First, my extra columns q.values and cn_confidency gets renamed to user#1 and user#2. For two columns that's not a problem, but for many it might be slightly boring to rename. I think they should be named the same as the original automatically

Second, the CNVs are duplicated for each sample that has it, even if the SV_start and SV_end are the exact same. Is there a way to collapse all exact intervals by using comma? So if samples A B and C have the same variant with same type, SAMPLES_ID show "A, B,C". Maybe even merge different intervals according to a user-defined distance/overlap percentage?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions