From de55308127a9b9f789624a20ad8a16020142af53 Mon Sep 17 00:00:00 2001 From: ASLeonard Date: Wed, 24 Apr 2019 11:57:09 +0100 Subject: [PATCH] improved functionality --- README.md | 73 +++++++++++++++++++++++------------ includes/interface_model.hpp | 17 ++++---- makefile | 12 +++--- scripts/interface_analysis.py | 25 ++++++------ scripts/interface_methods.py | 33 +++++++++++----- scripts/interface_plotting.py | 48 +++++++++++++++++------ src/interface_model.cpp | 1 - src/interface_simulator.cpp | 27 +++++-------- 8 files changed, 146 insertions(+), 90 deletions(-) diff --git a/README.md b/README.md index d8df350..194037b 100644 --- a/README.md +++ b/README.md @@ -3,41 +3,71 @@ [![Build Status](https://travis-ci.org/ASLeonard/polyomino_interfaces.svg?branch=master)](https://travis-ci.org/ASLeonard/polyomino_interfaces) ![License Badge](https://img.shields.io/github/license/ASLeonard/polyomino_interfaces.svg?style=flat) -Polyomimo lattice self-assembly model with binary string interfaces. +Code repository for the generalised polyomimo lattice self-assembly model with binary string interfaces. + + ### Core modules The majority of polyomino core features are found in the `polyomino_core' submodule, designed to provide a single, consistent set of definitions in all models. +Assembly algorithms, phenotype classifications, and selection dynamics can be found within this repository. A direct link to the current core development is [here](https://github.com/ASLeonard/polyomino_core). ### Installation -Building the simulator requires a c++ compiler that has general support for c++17. It has been tested with g++7 and greater, and clang++6 and greater. The compiling itself is taken care of automatically through the makefile. The recommended steps are as follows -``` +Building the simulator requires a c++ compiler that has general support for c++17. It has been tested with g++7 and greater, and clang++6 and greater. The compiling itself is taken care of through the makefile. The recommended steps are as follows +```shell git clone --recurse-submodules https://github.com/ASLeonard/polyomino_interfaces cd polyomino_interfaces make ``` -At which point the simulation program is in `/bin/ProteinEvolution`. -Errors at this stage probably indicate the compiler (or the CXX environment variable) is not modern enough. +At which point the simulation program is callable at `./bin/ProteinEvolution`. +Errors at this stage probably indicate the compiler (set through the CXX environment variable) is not modern enough. -Several compiler flags can be added depenending on what is available on the user's system if desired, like g++ has support for multi-threading (-fopenmp) and link-time optimization (-flto). +Several compiler flags can be added depending on what is available on the user's system if desired, like g++ has support for multi-threading (-fopenmp) and link-time optimization (-flto), but are not required to be used. #### Python requirements -The analysis and plotting has been tested with the following versions, although many older/newer versions are likely to work. These are common packages, but not always installed by default. -+ python (3.6.5) +The analysis and plotting is mostly handled in python3, and should be useable with any python version 3.6+ (tested on 3.6.5). +Some common additional packages have been used, but are not always installed by default. Many older/newer versions are likely to work, but versions used during development were: + Numpy (1.16.2) + Matplotlib (3.03) + SciPy (1.2.1) (only necessary for scripts within **polyomino_core**, not used by default) #### Testing -The install, c++, and python can be tested after making, by calling `make test`. This will run an evolution simulation with default parameters, analyse the generated files, save the analysis, and then erase all the generated files and analysis. +The installation, c++ program, and python can be tested after making, by calling `make test`. This will run an evolution simulation with default parameters, analyse the generated files, save the analysis, and then erase all the generated files and analysis. Note this erases generated files by pattern matching, so will erase any generated txt or pkl files within the root directory. -#### Directory layout -The main folders of interest are bin/ and scripts/, although a more curious user can modify the c++ in the other folders. + +### Simulations +The code is split into two main parts. The polyomino evolutions are written in c++, while the analysis is in python3. The c++ code, in the executable **ProteinEvolution**, can be run directly, with parameters mostly detailed by calling `./bin/ProteinEvolution -H`. There is one main parameter that is hardcoded in the c++, the interface length, found in `includes/interface_model.hpp` as `using interface_type = ...`. Details on how to change it are described there. -+ bin/ - + exectuable -+ build/ (internal compiling only) -+ scripts/ +A simple example has been provided below to automatically generate, analyse, and plot some data. The default parameters are sufficient to generate some interesting data, and can and should be modified to run over more simulations and different parameters. They can be found in `scripts/interface_analysis.py`, as a dictionary within the `runEvolutionSequence()` method. + +The files are generated in the calling directory unless the default parameter for file_path is changed, and can be large (>100s of MB) for larger choices of population size and longer simulations. + +#### Example usage +A very minimal example can be achieved from the root directory with +```shell +python3 scripts/interface_analysis.py [optional parameter for number of simulations, e.g. 10] +python3 scripts/interface_plotting.py +``` + +### Plotting +Several visualising methods have been included to interpret the generated data, in the `interface_plotting` file. They mostly involve loading simulation results, called through e.g. `loadPickle(params...)`, and passing those to the plotting functions. Each method has comments explaining how to use in more detail. + +After generating data through the first line of the minimal example, you can plot the data more selective by calling different plots and options via +```python +data=loadPickle(.6875,25,1,5) + +calculatePathwaySucess(data) +#or +plotEvolution(data) +#or +plotPhaseSpace(data) +``` +Where various method default parameters have been commented on in the files. + +### Directory layout +The main folder of interest is scripts/, although a more curious user can modify the c++ in the other folders. In the main directory there is also the makefile, which can be modified with the previously mentioned compiler flags. + ++ scripts/ (main directory for user) + utility methods + analysis module + plotting module @@ -50,15 +80,8 @@ The main folders of interest are bin/ and scripts/, although a more curious user + c++ headers + scripts/ + generic polyomino plotting - -### Simulations -The code is split into two main parts. The polyomino evolutions are written in c++, while the analysis is in python3. The c++ code, in the executable **ProteinEvolution**, can be run directly, with parameters mostly detailed by calling the `ProteinEvolution -H` feature. - -However, the easiest way is to use the main method in the _interface\_analysis_ file, which calls `runEvolutionSequence`. The default parameters are sufficient to generate some interesting data, and can be modified if desired. - -The files are generated in the calling directory unless the default parameter for file_path is changed, and can be large (>100s of mB) for larger choices of population size and longer simulations. - -### Plotting -Several visualising methods have been included to interpret the generated data, in the _interface\_plotting_ file. They mostly involve loading simulation results, called through e.g. `loadPickle(params...)`, and passing those to the plotting functions. Each method has comments explaining how to use in more detail. ++ bin/ + + executable ++ build/ (internal compiling only) diff --git a/includes/interface_model.hpp b/includes/interface_model.hpp index 99f0326..7b2c03d 100644 --- a/includes/interface_model.hpp +++ b/includes/interface_model.hpp @@ -6,12 +6,21 @@ #include "core_evolution.hpp" //using interfaces of 64 bits length, can use any width of unsigned integer +//i.e. can alternatively use uint8_t, uint16_t, uint32_t +//[UNTESTED] potentially could use further width types, like the GNU specific "unsigned __int128" using interface_type = uint64_t; //set genotype to be a vector of these elements constexpr uint8_t interface_size=CHAR_BIT*sizeof(interface_type); using BGenotype = std::vector; +//parameters used +namespace simulation_params +{ + extern uint8_t model_type,n_tiles,samming_threshold; + extern double temperature,binding_threshold,mu_prob; +} + //extension of base assembly (in polyomino_core/include/core_genotype) class InterfaceAssembly : public PolyominoAssembly { @@ -32,14 +41,6 @@ class InterfaceAssembly : public PolyominoAssembly { static void Mutation(BGenotype& genotype); }; -//parameters used -namespace simulation_params -{ - extern uint8_t model_type,n_tiles,samming_threshold; - extern uint16_t dissociation_time; - extern double temperature,binding_threshold,mu_prob; -} - namespace interface_model { diff --git a/makefile b/makefile index 2e1b90b..b4c2637 100644 --- a/makefile +++ b/makefile @@ -53,12 +53,12 @@ test: @echo "Running basic test to check c++ and python3 integration\nRunning default generation and analysis" @python3 scripts/interface_analysis.py @echo "Test seems successful, cleaning up generated files" - @rm PIDs_Run0.txt - @rm Strengths_Run0.txt - @rm PhenotypeTable_Run0.txt - @rm Selections_Run0.txt - @rm Y0.6875T10.0Mu1.0F5.0.pkl - @rm Mu1.0Y0.6875T10.0F5.0O0.pkl + @rm PIDs_Run*.txt + @rm Strengths_Run*.txt + @rm PhenotypeTable_Run*.txt + @rm Selections_Run*.txt + @rm Y*.pkl + @rm Mu*.pkl #Compile diff --git a/scripts/interface_analysis.py b/scripts/interface_analysis.py index 3c68c41..bd075f0 100644 --- a/scripts/interface_analysis.py +++ b/scripts/interface_analysis.py @@ -158,7 +158,7 @@ def __valid_descendent(gen_idx,kid): return True ##helper method to identify where a tree branches into a new ancestry - def __addBranch(): + def __addBranch(): if g_idx: if len(bond_ref)=len(interactions[g_idx,c_idx].bonds))>POPULATION_FIXATION: continue - ##only consider it failed if it "should" have survived based on fitness advantage + ##only consider it failed if it "should" have survived based on fitness advantage if not __growDescendentTree(Tree(pid_c,interactions[g_idx,c_idx].bonds,(-1,-1),g_idx,[[c_idx]]),SURVIVAL_DEPTH): failed_jumps[tuple(tuple(_) for _ in (pid_c,pid_d))]+=1 @@ -329,15 +329,17 @@ def analyseHomogeneousPopulation(run,temperature): return np.asarray(param_trajectory) -def runEvolutionSequence(): +##main wrapper function, uses hardcoded parameters given below to run an evolution simulation (from c++ files), and then analyse results in python3 +##results are then written into a pickle or npz file depending on parameters, and saved for later plotting/printing +def runEvolutionSequence(runs=1): + + ##change parameters here for data generation, meanings for parameter flags can be found in the src/interface_simulation.cpp file at the end - ##change parameters here for data generation - ##defaults well-suited for generating evolution of fixed system - default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 400, 'B' : 20, 'X': .5, 'F': 5, 'A' : 1, 'D' : 1, 'J': 5, 'M': 1, 'Y' : .6875, 'T': 10, 'O' : 100, 'G' : 10} + default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 500, 'B' : 25, 'X': 0, 'F': 5, 'A' : 1, 'D' : runs, 'J': 5, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 0, 'G' : 0} - ##defaults for dynamic fitness landscape - #default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 600, 'B' : 150, 'X': 0, 'F': 1, 'A' : 2, 'D' : 1, 'J': 1, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 200, 'G' : 10} + ##defaults for dynamic fitness landscape + #default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 600, 'B' : 150, 'X': 0, 'F': 1, 'A' : 2, 'D' : 1, 'J': 1, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 200, 'G' : 10} ##helper method to stringify parameters def generateParameterString(): @@ -386,13 +388,14 @@ def collateByMode(mode,run_range, params): elif mode == 2: np.savez_compressed(file_base,collateNPZs(*params,runs=run_range)) - + else: print('unknown mode request, set parameter \'A\'') if __name__ == '__main__': try: - runEvolutionSequence() + ##by default run two simulations, otherwise use user input + runEvolutionSequence(2 if len(sys.argv)!=2 else int(sys.argv[1])) except Exception as e: print(e) else: diff --git a/scripts/interface_methods.py b/scripts/interface_methods.py index 6240865..d7cd4f8 100644 --- a/scripts/interface_methods.py +++ b/scripts/interface_methods.py @@ -43,8 +43,7 @@ def __repr__(self): def loadSelectionHistory(run): selections=[] for line in open(BASE_PATH.format('Selections',run)): - converted=[int(i) for i in line.split()] - selections.append(converted) + selections.append([int(i) for i in line.split()]) return np.array(selections,np.uint16) def loadPIDHistory(run): @@ -110,25 +109,34 @@ def __init__(self,S_star,T,mu,gamma,data): print("Clearing out 10s") del form[(10,0)] - def __repr__(self): return r'Data struct for S*={:.3f},T={:.3g},mu={:.2g},gamma={:.2g}'.format(self.S_star,self.T,self.mu,self.gamma) - + +##primary method to load pickle data +##parameters: +## S_star (float): critical interaction strength +## t (float): temperature +## mu (float): mutation rate +## gamma (float): fitness punishment value def loadPickle(S_star,t,mu,gamma): with open('Y{}T{}Mu{}F{}.pkl'.format(S_star,*(float(i) for i in (t,mu,gamma))), 'rb') as f: return EvoData(S_star,t,mu,gamma,load(f)) - +##primary method to load numpyzip data +##same parameters as above def loadNPZ(S_star,t,mu,gamma): return np.load('Y{}T{}Mu{}F{}.npz'.format(S_star,*(float(i) for i in (t,mu,gamma))), 'rb')['arr_0'] ################# ## RANDOM WALK ## ################# + +##internal method for calculating Markov chain def RandomWalk(I_size=64,n_steps=1000,phi=0.5,S_star=0.6,analytic=False): s_hats=np.linspace(0,1,I_size+1) N_states=int(I_size*(1-S_star))+1 + ##find normalised eigenvector for Markov matrix def __getSteadyStates(val): rows=[[1-phi,phi*(1-val[0])]+[0]*(N_states-2)] for i in range(1,N_states-1): @@ -138,14 +146,17 @@ def __getSteadyStates(val): eigval,eigvec=LA.eig(matrix) ve=eigvec.T[np.argmax(eigval)] return max(eigval),ve/sum(ve) - + + ##return matrix calculation if analytic: analytic_states=__getSteadyStates(s_hats[-N_states:])[1] return sum(s_hats[-N_states:]*analytic_states) - + + ##otherwise simulate directly states=np.array([1]+[0]*(N_states-1),dtype=float) progressive_states=[sum(s_hats[-N_states:]*states)] + ##simulate random walk on strength states def __updateStates(states,val): states_updating=states.copy() for i in range(states.shape[0]): @@ -156,6 +167,7 @@ def __updateStates(states,val): states_updating[i]+=states[i+1]*phi*val[i+1] return states_updating/sum(states_updating) + ##walk for n_steps, and then return state evolutions for _ in range(n_steps): states=__updateStates(states,s_hats[-N_states:]) progressive_states.append(sum(s_hats[-N_states:]*states)) @@ -165,6 +177,7 @@ def __updateStates(states,val): ## PHASE SPACE## ################ +##all various methods for calculated steps def HetTet(a,b): return 1/(1+2*b)*(2*b+2/(a+2)*(1+a*1/(a+2))) @@ -197,7 +210,8 @@ def seed2(): return seed2() else: return .5*seed1()+.5*seed2() - + +##internal method to calculate phase space coordinates for transitions def calcTransitionParams(evo_strs,transitions,T,S_star): param_dict={} @@ -225,8 +239,7 @@ def calcTransitionParams(evo_strs,transitions,T,S_star): param_dict[(1,(16,0),(8,0))]=(S_star/evo_strs[(8,0)][(2,2)][-1])**T param_dict[(0,(16,0),(8,0))]=(evo_strs[(16,0)][(4,4)][0]/evo_strs[(16,0)][(3,4)][0])**T param_dict[(0,(16,0),(16,0))]=(evo_strs[(16,0)][(4,4)][-1]/evo_strs[(16,0)][(3,4)][-1])**T - - + if (12,0) in evo_strs: if (4,1) in transitions[(12,0)]: if (2,0) in transitions[(4,1)]: diff --git a/scripts/interface_plotting.py b/scripts/interface_plotting.py index a455b40..861ea1f 100644 --- a/scripts/interface_plotting.py +++ b/scripts/interface_plotting.py @@ -18,12 +18,27 @@ from collections import defaultdict + +##example sequence of loading data and plotting +def examplePlot(): + ##if data not generated and analysed, can uncomment and run below line + #runEvolutionSequence() + + ##load the data from the pickle file (should be placed in this directory_ + data=loadPickle(.6875,25,1,5) + + ##plot the data as in Fig 4, and then print pathway success as in Fig 6 + calculatePathwaySucess(data) + plotEvolution(data,True,False) + + + ##plot evolution of phenotypes over time, used in Fig 4 ##parameters ##evo_data (custom structure from loadPickle method): structure holding simulation results ##add_samps (bool): if false plots only bulk trends, if true plots a sample of individual simulations -def plotEvolution(evo_data_struct,add_samps=False): +def plotEvolution(evo_data_struct,add_samps=False,interactive_mode=True): ##pull simulation parameters from data structure evo_data=evo_data_struct.evo_strs @@ -54,13 +69,12 @@ def plotEvolution(evo_data_struct,add_samps=False): ##set panel border colour plt.setp(ax_d[phen].spines.values(), color=phen_cols[phen],lw=(2 if phen[0]<10 else 1)) - + ##plot baseline and random walk expectation ax_d[phen].axhline(S_star,c='k',ls='-',lw=0.75) max_len=min(MAX_G,max(len(vals) for vals in data.values())) ax_d[phen].plot(range(max_len+1),RandomWalk(64,max_len,mu*1./6,S_star,False),ls='--',c='k',zorder=20) - ##iterate over each edge in the assembly graph for (bond,new),strs in data.items(): @@ -83,18 +97,17 @@ def plotValues(values,alph): ##plot averaged edge trend given plotting parameters above plotValues(strs,1) - + ##if adding sampled individual runs, plot now if add_samps and evo_data_samp is not None: for samp in evo_data_samp[phen][(bond,new)]: plotValues(samp,.2) - ##add labels f.tight_layout() axarr[1,1].set_xlabel('generations') axarr[1,0].set_ylabel(r'$\langle \hat{S} \rangle$') - plt.show(block=False) + plt.show(block= not interactive_mode) ##print the transition success rates used in Fig 6 @@ -120,12 +133,12 @@ def calculatePathwaySucess(evo_data): ##get total number of failed transitions fail_rate=0 - if parent in failed: - if child in failed[parent]: - fail_rate=failed[parent][child] + if child in failed: + if parent in failed[child]: + fail_rate=failed[child][parent] ##print formatted success/fail rates - print('{}→{} with {} successes and {} failures ({:.2f}%)'.format(parent,child,count,fail_rate,count/(count+fail_rate))) + print('{}→{} with {} successes and {} failures ({:.2f}%)'.format(parent,child,count,fail_rate,100*count/(count+fail_rate))) ##plot grid of parameters for the 16-mer phenotype, used in S1 Fig ##parameters are @@ -287,12 +300,12 @@ def plotDynamicLandscape(data,period=100): ##great 2 colour gradients for the fitness landscape changing c_grad_1 = LinearSegmentedColormap.from_list('grad_1', ['aqua','darkblue'], N=period) c_grad_2 = LinearSegmentedColormap.from_list('grad_2', ['mistyrose','firebrick'], N=period) - + ##create colour list with changing landscape phase_color=[] for _ in range(data.shape[1]//period//2): phase_color.extend(list(c_grad_1(np.linspace(0,1,period)))+list(c_grad_2(np.linspace(0,1,period)))) - + ##can the data burn in a few hundred generations burn_in=0 (x1,y1)=np.log10(np.mean(data[:,burn_in:,:],axis=0)).T @@ -309,3 +322,14 @@ def plotDynamicLandscape(data,period=100): plt.show(block=False) return + +if __name__ == '__main__': + try: + examplePlot() + except Exception as e: + print(e) + else: + print('Plotting sequence successful') + sys.exit(0) + print('Something went wrong, data loading or plotting potentially incomplete') + sys.exit(1) diff --git a/src/interface_model.cpp b/src/interface_model.cpp index 2528073..7f0f43f 100644 --- a/src/interface_model.cpp +++ b/src/interface_model.cpp @@ -3,7 +3,6 @@ namespace simulation_params { - uint16_t dissociation_time=0; uint8_t n_tiles=2,model_type=0,samming_threshold=10; double temperature=0,binding_threshold=1,mu_prob=1; } diff --git a/src/interface_simulator.cpp b/src/interface_simulator.cpp index e842979..10f481c 100644 --- a/src/interface_simulator.cpp +++ b/src/interface_simulator.cpp @@ -1,21 +1,18 @@ #include "interface_simulator.hpp" #include - +//internal flag to limit fatal mutations to less fit phenotypes bool KILL_BACK_MUTATIONS=false; //set the file path to be where the (large) files will be written, by default the location where the const std::string file_base_path=""; -const std::map phen_stages{{{0,0},0},{{10,0},4},{{1,0},1},{{2,0},2},{{4,0},2},{{4,1},3},{{8,0},3},{{12,0},4},{{16,0},4}}; -void EvolutionRunner() { - +//wrapper to run many (parallel) independent evolutions +void EvolutionRunner() { const uint16_t N_runs=simulation_params::independent_trials; #pragma omp parallel for schedule(dynamic) - for(uint16_t r=0;r < N_runs;++r) { + for(uint16_t r=0;r < N_runs;++r) EvolvePopulation("_Run"+std::to_string(r+simulation_params::run_offset)); - - } } //populate fixed phenotype table with example system @@ -50,8 +47,10 @@ void FinalModelTable(FitnessPhenotypeTable* pt) { pt->phenotype_fitnesses[10].emplace_back(0); } +//vertical columns of phenotype subset +const std::map phen_stages{{{0,0},0},{{10,0},4},{{1,0},1},{{2,0},2},{{4,0},2},{{4,1},3},{{8,0},3},{{12,0},4},{{16,0},4}}; - +//main evolution simulator function void EvolvePopulation(const std::string& run_details) { //create file names and make text files @@ -86,17 +85,13 @@ void EvolvePopulation(const std::string& run_details) { break; } - - - //main evolution loop for(uint32_t generation=0;generation