-
Notifications
You must be signed in to change notification settings - Fork 0
Preparation and Analysis
When modelling is carried out by MODELLER (ASMC default), the percentage of identity between the target sequences and the reference structure(s) is calculated. This information is used to determine the reference(s) that will be used to model and align the target. Active sites are extracted on the basis of the alignment between the targets and their respective reference.
To extract and cluster the active sites from a MSA with multiple reference sequences, user must first generate the file identity_targets_refs.tsv
using the following subcommand:
usage: asmc identity [-h] (-s | -m ) (-r | -R )
options:
-h, --help show this help message and exit
-s , --seqs multi fasta file
-m , --models file containing all PDB paths
-r , --ref-str file containing the reference structure paths
-R , --ref-seq file containing the reference sequences id
The identity percentage can be calculated using either target sequences, if the user has run the ASMC using a set of sequences, or target 3D structures, if the user has run the ASMC using pre-built 3D models (MODELLER, AlphaFold...).
User must provide a set of homologous protein sequences and a reference sequence file called by the --ref-seq
option.
asmc identity -s sequences.fasta --ref-seq ref_seq.txt
User must provide a set of homologous protein structures and a reference structure file called by the --ref-str
option.
asmc identity -r models.txt --ref-str ref_str.txt
The subcommand extract
extracts the lines of groups_x_min_y.tsv
that contain a specific amino acid or residue type at a queried position.
usage: asmc extract [-h] -f -p -a [-g]
options:
-h, --help show this help message and exit
-f , --file tsv file from run_asmc.py
-p , --position position where to find the specified amino acid type, e.g: 5
-a , --aa-type amino acid type to search, must be either 1-letter amino acid, 'aromatic', 'acidic', 'basic', 'polar' or 'hydrophobic'
-g , --group group id, if not used, search in all groups
The position numbering corresponds to the position in the active site sequences within groups_x_min_y.tsv
; e.g, if the user is looking for a tyrosine (Y) at position 5, the command line is as follows:
asmc extract -f groups_x_min_y.tsv -p 5 -a Y
Outputs are displayed in the stdout.
The subcommand compare
returns the comparison of active sites present within groups_x_min_y.tsv
.
usage: asmc compare [-h] -f1 -f2 -id
options:
-h, --help show this help message and exit
-f1 Group file 1
-f2 Group file 2
-id identity_targets_refs.tsv
User must provide a TSV file for each clustering method (MSA, structure, pairwise) and the identity_targets_refs.tsv
called by the -id
option.
asmc compare -f1 groups_x_min_y.tsv -f2 groups_a_min_b.tsv -id identity_targets_refs.tsv
The output file is named active_site_checking.tsv
.
The subcommand unique
returns the unique active sites per group and some statistics.
usage: asmc unique [-h] -f
Returns the unique active sites per group and some statistics
options:
-h, --help show this help message and exit
-f , --file tsv group file with all active sites from asmc run
User must provide the file groups_x_min_y.tsv
to the following subcommand:
asmc unique -f groups_x_min_y.tsv
The output files are unique_sequences.tsv
and groups_stats.tsv
.
The subcommand pymol
returns the path of the script to load into the Pymol console. It runs some Pymol commands to show the superposed active site residues. To visualise the active sites:
- Open Pymol and set a directory containing all the ASMC outputs as working directory
- Use the command
run <path>/ASMC/asmc/zoom_active_site.py
to load the functions - Use the command
target ID
, where ID is the ID of a built model, to load the target model and his reference structure - Use the command
active_site
to zoom on the two active sites
The last command displays the list of corresponding positions in the Pymol console, e.g:
Ref - Target
189 SER - 94 VAL
190 THR - 95 SER
191 GLY - 96 SER
192 ILE - 97 ILE
193 CYS - 98 CYS
197 SER - 102 ALA
200 LEU - Gap
202 PHE - 103 ALA
235 THR - 131 ASP
268 PRO - 163 PRO
271 GLN - 166 GLN
272 TYR - 167 TYR
275 TYR - Gap
278 GLU - 170 SER
The subcommand to_xlsx
transforms a TSV file into a XLSX file. In the new file, each position in the sequence is in its own column. The colors refers to the Weblogo 3 "Chemistry (AA)" color scheme, resumed hereafter:
Chemical properties | 1-letter code aminoacids | Colors |
---|---|---|
Polar | G,S,T,Y,C | |
Neutral | Q,N | |
Basic | K,R,H | |
Acidic | D,E | |
Hydrophobic | A,V,L,I,P,W,F,M |
usage: asmc to_xlsx [-h] [-o] -f
options:
-h, --help show this help message and exit
-f, --file Group tsv file
-o, --outdir output name (default: <input_name>.xlsx)
User must provide a TSV file and run the following command:
asmc to_xlsx -f groups_x_min_y.tsv
The output file is now an XLSX file, which can be opened with a spreadsheet program.