Skip to content

Commit ea001f7

Browse files
authored
Merge pull request #39 from francoep/master
Added script for extending types files
2 parents f8334cd + 1ff99df commit ea001f7

File tree

2 files changed

+78
-4
lines changed

2 files changed

+78
-4
lines changed

README.md

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
* generate_unique_lig_poses.py - Script for counter-example generation which computes all of the unique ligand poses in a directory
1212
* counterexample_generation_jobs.py - Script which generates a file containing all of the gnina commands to generate new counter-examples
1313
* generate_counterexample_typeslines.py - Script which generates a file containing the lines to add to the types file for a pocket.
14+
* types_extender.py - Script to generate a new types file containing the lines generated from the counterexamples from an existing types file.
1415

1516
## Dependencies
1617

@@ -281,16 +282,16 @@ Lastly, we run clustering.py as follows
281282
```
282283
clustering.py --cpickle matrix.pickle --input my_types.types --output my_types_cv_
283284
```
284-
## Generating new counterexamples
285-
There are 3 scripts here which form a pipeline to generate new counter-examples for a data directory.
285+
## Adding new counterexamples to types files
286+
There are 4 scripts here which form a pipeline to generate new counter-examples for a data directory.
286287

287-
The pipeline is as follows: 1) generate_unique_lig_poses.py; 2) counterexample_generation_jobs.py; 3) generate_counterexample_typeslines.py.
288+
The pipeline is as follows: 1) generate_unique_lig_poses.py; 2) counterexample_generation_jobs.py; 3) generate_counterexample_typeslines.py; 4) types_extender.py.
288289

289290
Global Assumptions: 1) The data directory structure is <ROOT>/<POCKET>/<FILES>, 2) Crystal ligand files are named <PDBid>_<ligname><CRYSTAL SUFFIX>,
290291
3) Receptors are PDB files, 4) output poses are SDF files.
291292

292293
### Step 1) Generating the unique poses for a Pocket
293-
In order to avoid extra calculations, we need to find the unique poses.
294+
In order to avoid extra calculations, we need to find the unique poses. NOTE - This process needs to be done exactly once when generating new counterexamples. After a round of counterexamples are generated, script 3 in the pipeline will generate the updated unique_poses.sdf file.
294295

295296
WARNING -- this script performs an O(n^2) calcualtion for each unique ligand name in the pocket!!
296297

@@ -466,6 +467,32 @@ The above command will be need to run for each directory in cd2020_pockets.txt.
466467

467468
That text file contains the lines that need to be added to the training/test types files. The default values match what we used for the CrossDocked2020 paper.
468469

470+
### Step 4 -- Adding the lines for the counterexamples to the types file
471+
Now that the lines we need to add are generated for each pocket, we can run types_extender.py on each of the types files that we use for training and testing to generate new types files with these added lines.
472+
```
473+
usage: types_extender.py [-h] -i INPUT -o OUTPUT -n NAME [-r ROOT]
474+
475+
Add lines to types file and create a new one. Assumes data file structure is
476+
ROOT/POCKET/FILES.
477+
478+
optional arguments:
479+
-h, --help show this help message and exit
480+
-i INPUT, --input INPUT
481+
Types file you will be extending.
482+
-o OUTPUT, --output OUTPUT
483+
Name of the extended types file.
484+
-n NAME, --name NAME Name of the file containing the lines to add for a
485+
given pocket. This is the output of
486+
generate_counterexample_typeslines.py.
487+
-r ROOT, --root ROOT Root of the data directory. Defaults to current
488+
working directory.
489+
```
490+
Continuing our example, after running script 3 there will be an it3_typeslines_toadd.txt file in each pocket. So now we generate a new train types file and new test types file as below:
491+
```
492+
python3 types_extender.py -i my_initial_train.types -o my_new_train.types -n it3_typeslines_toadd.txt -r MYROOT
493+
python3 types_extender.py -i my_initial_test.types -o my_new_test.types -n it3_typeslines_toadd.txt -r MYROOT
494+
```
495+
469496
## Using visualization script
470497
There are two scripts to help you visualize how the model scores atoms: 1) simple_grid_visualization.py; 2) grid_visualization.py
471498

types_extender.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
#!/usr/bin/env python3
2+
'''
3+
This script will generate the new types file with the lines from generate_counterexample_typeslines.py
4+
5+
Assumptions
6+
i) The data structure is <ROOT>/<POCKET>/<FILES>
7+
ii) The name of the file containing the types lines to add is <NAME> for each pocket in the types file.
8+
iii) the input types file has <POCKET>/<receptor file> from which to parse the needed pockets from.
9+
10+
INPUT
11+
i) Original types file
12+
ii) New types filename
13+
iii) Name of file in Pocket that contains the lines to add
14+
iv) The ROOT of the data directory
15+
16+
OUTPUT
17+
i) The new types file -- note that the lines of the new types file will not necessarily be in order.
18+
'''
19+
20+
import argparse, os, re, glob
21+
22+
def check_exists(filename):
23+
if os.path.isfile(filename) and os.path.getsize(filename)>0:
24+
return True
25+
else:
26+
return False
27+
28+
parser=argparse.ArgumentParser(description='Add lines to types file and create a new one. Assumes data file structure is ROOT/POCKET/FILES.')
29+
parser.add_argument('-i','--input',type=str,required=True,help='Types file you will be extending.')
30+
parser.add_argument('-o','--output',type=str,required=True,help='Name of the extended types file.')
31+
parser.add_argument('-n','--name',type=str,required=True,help='Name of the file containing the lines to add for a given pocket. This is the output of generate_counterexample_typeslines.py.')
32+
parser.add_argument('-r','--root',default='',help='Root of the data directory. Defaults to current working directory.')
33+
args=parser.parse_args()
34+
35+
completed=set()
36+
with open(args.output,'w') as outfile:
37+
with open(args.input) as infile:
38+
for line in infile:
39+
outfile.write(line)
40+
m=re.search(r' (\S+)/',line)
41+
pocket=m.group(1)
42+
43+
if pocket not in completed:
44+
completed.add(pocket)
45+
with open(os.path.join(args.root,pocket,args.name)) as linesfile:
46+
for line2 in linesfile:
47+
outfile.write(line2)

0 commit comments

Comments
 (0)