-
Notifications
You must be signed in to change notification settings - Fork 4
Insertions
When looking for the alignment on blast it is possible that the length of the matching was longer that allele schema. This is because new nucleotide are present on the sample and when blast look for the matching find some insertions in the sample.
The information about this scenario is pop out in a file called insertions.tsv.
Inside of this file will have the information when a deletion is occurred in the sample. It is a tabulate separated file with this heading.
Core Gene | Sample Name | Insertion item | Allele | Contig | Bitscore |
---|
Query length | Contig length | New sequence length | Mismatch | gaps |
---|
Contig start | Contig end | New sequence |
---|
Core Gene is the name of the gene in the Schema.
Sample Name is the name of the sample file.
Insertion item contains the information about the impact of this insertion. When insertion occurs it will impact on the translated to protein. The protein length from the sample can be longer that the one in the schema (it will be named as ALM) or on the contrary shorter (ASM).
Could be possible that the same protein could generated by other sample, then to keep track on this effect they are named with the name of the core gene plus a sequential number. The information of this field will be like this:
ALM_INSERT_lmo0088_1.
The ALM will indicate that protein is longer that in the schema. It has been a insertion in the gene from the sample and lmo0088_1 shows that the core gene name is lmo0088 and the "1" means that the new protein founded it is not the same as the previous one founded for this core gene.
Allele. Indicates the allele number in the core gene schema that blast was identified as better match.
Contig. It is the contig name in the sample
Bitscore. It is bitscore provided by blast when looking for the matching
Query lenght. It is the allele core gene length.
Contig length. It is the length that matches in the sample.
New sequence length. It is the new length that