-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Description
I've been putting together some example notebooks which involve MD exports using Interchange and noticed that the GROMACS export in particular was taking an inordinate amount of time, so I decided to benchmark the export times to try and make that claim more precise:
Env setup:
mamba create -n interchange-export python=3.11 openff-interchange=0.4.0
Benchmark code: (took ~30 min to complete on my machine!)
from time import perf_counter
from tempfile import TemporaryDirectory
from collections import defaultdict
import numpy as np
import pandas as pd
from openff.toolkit import Molecule, ForceField
from openff.units import unit
from openff.interchange import Interchange
from openff.interchange.components._packmol import UNIT_CUBE
runtimes = defaultdict(dict)
DOPs : list[int] = [1, 5, 10, 50, 100, 500, 1000]
for DOP in DOPs:
offmol = Molecule.from_smiles('CCO' * DOP)
offmol.add_conformer(np.random.rand(offmol.n_atoms, 3) * unit.nanometer)
offmol.assign_partial_charges(partial_charge_method='gasteiger')
interchange = Interchange.from_smirnoff(
force_field=ForceField('openff-2.0.0.offxml'),
topology=offmol.to_topology(),
charge_from_molecules=[offmol],
)
interchange.box = np.eye(3, dtype=float) * unit.nanometer
with TemporaryDirectory() as tmpdir:
## OpenMM
omm_start = perf_counter()
sys = interchange.to_openmm()
runtimes['OpenMM'][DOP] = perf_counter() - omm_start
## LAMMPS
lmp_start = perf_counter()
sys = interchange.to_lammps(prefix=f'{tmpdir}/mol')
runtimes['LAMMPS'][DOP] = perf_counter() - lmp_start
## GROMACS
gmx_start = perf_counter()
sys = interchange.to_gromacs(prefix=f'{tmpdir}/mol')
runtimes['GROMACS'][DOP] = perf_counter() - gmx_start
runtimes_df = pd.DataFrame.from_records(runtimes)
runtimes_df.index.name = 'N_atoms'
runtimes_df.to_csv('inc_export_times.csv')
Runtime data: inc_export_times.csv
Plotting results (note the log scale):
import pandas as pd
import matplotlib.pyplot as plt
runtimes_df = pd.read_csv('inc_export_times.csv', index_col=0)
fig, ax = plt.subplots()
ax.set_title('Interchange export times by MD format')
ax.set_xlabel('Number of atoms in Topology')
ax.set_ylabel('Runtime (seconds)')
for engine, times in runtimes_df.items():
ax.loglog(runtimes_df.index, times, 'o-', label=engine)
ax.legend()
fig.savefig('inc_export_times.png')

Log-log fit to estimate runtime prefactor and exponent (assuming
import numpy as np
aeq : str = '\u2248'
discard_first_n : int = 2
for engine, times in runtimes_df.items():
slope, intercept = np.polyfit(
np.log(runtimes_df.index[discard_first_n:]),
np.log(times[discard_first_n:]),
deg=1,
)
print(f'{engine}: T {aeq} O(N^({slope:.4f})) (prefactor {aeq} {np.exp(intercept):.4f})')
Output of above:
GROMACS: T ≈ O(N^(1.9678)) (prefactor ≈ 0.0017)
LAMMPS: T ≈ O(N^(0.9818)) (prefactor ≈ 0.0005)
OpenMM: T ≈ O(N^(0.9903)) (prefactor ≈ 0.0049)
===
This isn't a huge amount of data, but nevertheless pretty clearly points to something asymptotically slower going on in the GROMACS writer. Any thoughts folks might have on what's causing the slowdown here?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status