Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing unindexed external results #11

Open
iandoxsee opened this issue Feb 17, 2021 · 2 comments
Open

Importing unindexed external results #11

iandoxsee opened this issue Feb 17, 2021 · 2 comments

Comments

@iandoxsee
Copy link

(Posted here at Ben's request from private correspondence. Thanks, Ben!)

I have a question about importing external results: I've looked through all the example notebooks and bro.py but I'm still unable to figure out how to import a .csv file containing existing results, either one with the experimental index numbers or ideally one without them. Specifically, here's what I'm trying to do:

  1. Use BO_express module to easily encode some components with Mordred from the SMILES strings (e.g., ligand, base, solvent, while other variables use numeric encoding)
  2. Specify an external initialization (init_method='external') so that I can include pre-existing data from earlier screening (e.g., a ligand screen with all other variables held constant at levels which are included in the search space)
  3. Populate a .csv file with data from (e.g.) external ligand screen in the same format as the "init" or "round0" files, but ideally not requiring the experiment index numbers since the design is created after the ligand screen was run.
  4. Import the existing results .csv file into BO and use this to initialize the first round of screening.
@NLente-link
Copy link

Hey iandoxsee,

I created my self a little work around for your mentioned work flow.
Maybe it's not the most elegant way to do so, but it works...

  1. A pandas data frame is created containing the whole reaction space and indices
  2. A .csv file named initial.csv is created in /results with already given column names to fill in your results
  3. The reaction space and the initial.csv are compared and the original indices are added to your experiments automatically
  4. Your results are added via bo.add_results from initial.csv with matching indices
  5. bo is initialized with your experiments

I will attach my Jupiter notebook so that you can have a look and maybe use it for your optimization.

Reaction Optimization External.ipynb.zip

@b-shields
Copy link
Owner

Thanks for posting and sorry for taking a while to respond. For now, this function will allow users to import external results to a edbo.bro.BO object.

import pandas as pd
from edbo.objective import objective

# Define a function to load experiments
def add_unindexed_experiments(bo, results_path):
    """
    EDBO is currently designed to be used from end to end. This function will
    load experimental data which is not indexed by the optimizers search space
    so that we can use the data that has already been collected without having
    to look up the indices.
    """
    
    # Import points and load the reaction space
    results = pd.read_csv(results_path)
    domain_points = results.iloc[:,:-1]
    index = bo.reaction.base_data[bo.reaction.index_headers].copy()

    # Get corresponding points. Iterate to maintain order.
    union_index = []
    for i in range(len(domain_points)):
        ui = pd.merge(index.reset_index(), 
                      domain_points.iloc[[i]], 
                      how='inner')['index'][0]
        union_index.append(ui)
    
    index_out = index.iloc[union_index]
    
    # Make sure points are aligned
    assert False not in (index_out.values == domain_points.values).flatten()
    
    # Get encoded results
    encoded_results = bo.obj.domain.iloc[union_index].copy()
    encoded_results[results.columns.values[-1]] = results.iloc[:,-1].values 
    
    # Update the objective
    bo.obj = objective(domain=bo.obj.domain, results=encoded_results)

I'm not going to close the issue as a reminder to include a function in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants