This repository contains comp chem/ML-ready versions of the data in https://github.com/asapdiscovery/COVID_moonshot_submissions. This data has all been downloaded from the COVID Moonshot CDD Vault.
The IC50 measurements in this repository were measured using the fluorescence inhibition assay described in
Melissa L. Boby, Daren Fearon, Matteo Ferla, Mihajlo Filep, Lizbé Koekemoer, Matthew C. Robinson, The COVID Moonshot Consortium, John D. Chodera, Alpha A Lee, Nir London, Annette von Delft, Frank von Delft. “Open Science Discovery of Potent Non-Covalent SARS-CoV-2 Main Protease Inhibitors.” bioRxiv, September 06, 2023. https://www.biorxiv.org/content/10.1101/2020.10.29.339317v5.
On suspected_SMILES
: Note that many of the enantiopure compounds on this project were
obtained by chiral separation, and thus the compounds are often obtained as a single
enantiomer with unknown absolute stereochemistry. The suspected_SMILES
column
represents the current understanding of the actual identity of the compound given the current
information, which explains the frequent use of enhanced/relative stereochemistry
representations.
cdd_noncovalent_dates_2023_10_18_filt.csv
: All noncovalent available data as of October 10, 2023, filtered to remove data without a noted SMILES string or IC50 value. Note that some of the entries in this dataset are racemates.cdd_achiral_enantiopure_dates_2023_10_18_filt.csv
: All noncovalent available data as of October 10, 2023, filtered to contain only achiral and enantiopure molecules.
A brief explanation of each column present in the CSV files.
suspected_SMILES
: SMILES string of the synthesized moleculeCanonical_PostEra_ID
: Molecule's canonical nameIC50_(µM)
: Experimental IC50 value, in µM units. Unless otherwise noted in the file description, these values are semi-quantitative, meaning that some values may have been outside the range of the assay used. The semi-quantitativeness of a given measurement is given by theSemiquant
fieldIC50_CI_(Lower)_(µM)
: Lower bound of the 95% CI of the experimental IC50 value, in µM unitsIC50_CI_(Upper)_(µM)
: Upper bound of the 95% CI of the experimental IC50 value, in µM unitsHill_slope
: Calculated slope of the Hill equation for this measurementCDD_curve_class
: Calculated curve class from CDD for this measurementSemiquant
: Whether the given compound was too strong of a binder to be resolved ("Strong"), too weak of a binder to be resolved ("Weak"), or within the assay range ("Quant")