1+
12This is swMutSel, a program to estimate fitnesses of amino acids in protein-
23coding genes using the evolution model of Halpern and Bruno (1998) and Tamuri et
34al. (2012, 2014). The program takes as input an alignment of protein-coding
45gene sequences and a phylogeny (tree) of the sequences, and outputs the
56fitnesses of each amino acid at each location in the protein-coding gene.
67
7- DOWNLOAD
8- https://github.com/tamuri/swmutsel/releases/download/v1.0/swmutsel.jar
98
109SYNOPSIS
10+
11+ Analyse Data Using the SwMutSel Model:
12+
1113 java -jar swmutsel.jar
1214 -name <run_name>
1315 -sequences <sequence_file_name>
1416 -tree <tree_file_name | tree_newick_string>
1517 -geneticcode <standard | vertebrate_mit | plastid>
1618 [-penalty mvn,<sigma> | dirichlet,<alpha>]
1719 [-kappa <kappa>]
18- [-pi T,C,A,G ]
19- [-scaling branch_scaling_factor]
20- [-fitness A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V]
21- [-fix mutation|branches|all]
20+ [-pi <T>,<C>,<A>,<G> ]
21+ [-scaling < branch_scaling_factor> ]
22+ [-fitness <site>, A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V [-fitness ...], ... ]
23+ [-fix mutation|branches|all [-fix mutation|branches|all], ... ]
2224 [-threads <cpu_cores>]
2325 [-distributed -host <host>:<port> [-host <host>:<port>], ...]
2426 [-sites <site>|<site_range>]
2527 [-restart-opt <no_of_restarts> [-restart-int <n_iterations>]]
2628 [-clademodel clade_label,clade_label[,clade_label[,...]]]
29+ [-hessian]
2730 [-help]
2831
32+ Simulate Data Using the SwMutSel Model:
33+
34+ java -jar swmutsel.jar
35+ -simulate
36+ -name <run_name>
37+ -tree <tree_file_name | tree_newick_string>
38+ -geneticcode <standard | vertebrate_mit | plastid>
39+ -sites <number_of_sites>
40+ -kappa <kappa>
41+ -pi <T>,<C>,<A>,<G>
42+ -scaling <branch_scaling_factor>
43+ [-fitness A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V [-fitness ...], ...]
44+ [-fitnessfile <filename> [-fitnessfile <filename>], ...]
45+ [-clademodel <clade_labels>]
46+ [-shiftfrac <percentage>]
47+
2948OPTIONS
3049 Required
3150 -n, -name
3251 Specifies name for the run. Output files are prefixed with this name.
3352 Be careful! The program will overwrite files with the same name.
34-
53+
3554 -s, -sequences
3655 Coding sequences alignment file name in PHYLIP format. Spaces are
3756 not allowed in sequence names.
38-
57+
3958 -t, -tree
4059 Newick-formatted tree file name. The tree string can be supplied
4160 instead e.g. "-tree (A:0.1,(B:0.1,C:0.1));". Spaces are not allowed
4261 in the string.
43-
62+
4463 -gc, -geneticcode
4564 The genetic code for the coding-sequences:
46-
65+
4766 -gc standard : The Standard Code
4867 -gc vertebrate_mit : The Vertbrate Mitochondrial Code
4968 -gc plastid : The Bacterial, Archaeal and Plant Plastid Code
50-
69+
5170 Model Parameters
5271 -p, -penalty
5372 The penalty to use for the penalised likelihood method. If not
5473 supplied, the usual (unpenalised) maximum likelihood method is used.
5574 Valid options are:
56-
75+
5776 -p mvn,<s> : Multivariate normal penalty with variance 2*<s>^2
5877 -p dirichlet,<a> : Dirichlet-based penalty with shape <a>
59-
78+
6079 -k, -kappa
6180 The starting parameter value for the transition-transversion rate
6281 ratio. If you "-fix mutation" the parameter will not be estimated.
6382 DEFAULT: 1.0
64-
83+
6584 -pi
6685 The starting parameter value for nucleotide base frequencies. Must
6786 be comma-separated with order T,C,A,G. The values are normalised to
6887 sum to 1. If you "-fix mutation" the parameter will not be estimated.
6988 DEFAULT: [0.25]
70-
89+
7190 -c, -scaling
7291 The starting parameter value for branch scaling factor (applied to
7392 all branches). If you "-fix mutation" the parameter will not be
7493 estimated.
7594 DEFAULT: 1.0
76-
95+
7796 -f, -fitness
7897 Comma-separated fitness parameters in canonical amino acid order.
7998 It is recommended that you do not construct these by hand but
@@ -83,27 +102,23 @@ OPTIONS
83102 Optimisation
84103 -fix
85104 Indicate whether you want the program to skip estimation of
86- mutational parameters, branch lengths, both or no optimisation at
87- all.
88-
105+ mutational parameters, branch lengths or all parameters. For example,
106+ if you want to calculate fitness only: "-fix mutation -fix branches"
107+
89108 -fix mutation : Fix the values (k, pi, c) of the mutational
90109 model.
91110
92111 -fix branches : Fix the branch lengths on the tree.
93112
94- -fix mutation,branches : Fix both mutational model parameters and
95- branch lengths, and only estimate the
96- fitness parameters.
97-
98113 -fix all : Calculate the log-likelihood only.
99-
114+
100115 -restart-opt
101116 Specifies the number of optimiser restarts for site-wise fitness
102117 parameter estimation. The is to prevent estimates being stuck at a
103118 local optima. The program will restart fitness estimation, with
104119 random initial values, the specified number of times.
105120 DEFAULT: 1
106-
121+
107122 -restart-int
108123 Specify how often to estimate fitness parameters with multiple
109124 restarts. Restarting the fitness parameter estimation is expensive
@@ -112,25 +127,25 @@ OPTIONS
112127 single round is one iteration of mutation, branch length and fitness
113128 estimation.
114129 DEFAULT: 5
115-
130+
116131 -sites
117132 Specify a single site, or a range of sites, for site-wise fitness
118133 estimation. If you provide this option, you implicitly fix the
119134 mutation and branch length parameters, "-fix mutation,branches".
120135 A range is specified using a dash e.g. "-sites 10-20" will estimate
121136 the site-wise fitnesses for all sites between site 10 and site 20,
122137 inclusive.
123-
138+
124139 Parallelisation
125140 -T, -threads
126141 Specify the number of cores to use for multi-threaded operation.
127-
142+
128143 -D, -distributed
129144 Indicate the program will run in distributed mode. This requires the
130145 initialisation of (usually) multiple slaves. Each slave will have
131146 an associated IP address (or hostname) and port (which you supply
132147 using "-H")
133-
148+
134149 -H, -host
135150 If the program is running in distributed mode (using the "-D" option),
136151 supply slaves' host IP and port using "-H <slave_ip>:<port>"
@@ -175,5 +190,6 @@ CITATION
175190
176191 Tamuri AU, Goldman N and dos Reis M. (2014) A penalized likelihood method
177192 for estimating the distribution of selection coefficients from
178- phylogenetic data. Genetics.
193+ phylogenetic data. Genetics, 197: 257-271.
194+
179195
0 commit comments