Skip to content

Improve quadratic runtime of Interchange 0.4.0 GMX export #1264

@timbernat

Description

@timbernat

Description
I've been putting together some example notebooks which involve MD exports using Interchange and noticed that the GROMACS export in particular was taking an inordinate amount of time, so I decided to benchmark the export times to try and make that claim more precise:

Env setup:

mamba create -n interchange-export python=3.11 openff-interchange=0.4.0

Benchmark code: (took ~30 min to complete on my machine!)

from time import perf_counter
from tempfile import TemporaryDirectory
from collections import defaultdict

import numpy as np
import pandas as pd

from openff.toolkit import Molecule, ForceField
from openff.units import unit
from openff.interchange import Interchange
from openff.interchange.components._packmol import UNIT_CUBE


runtimes = defaultdict(dict)
DOPs : list[int] = [1, 5, 10, 50, 100, 500, 1000]

for DOP in DOPs:
    offmol = Molecule.from_smiles('CCO' * DOP)
    offmol.add_conformer(np.random.rand(offmol.n_atoms, 3) * unit.nanometer)
    offmol.assign_partial_charges(partial_charge_method='gasteiger')

    interchange = Interchange.from_smirnoff(
        force_field=ForceField('openff-2.0.0.offxml'),
        topology=offmol.to_topology(),
        charge_from_molecules=[offmol],
    )
    interchange.box = np.eye(3, dtype=float) * unit.nanometer
    
    with TemporaryDirectory() as tmpdir:
        ## OpenMM
        omm_start = perf_counter()
        sys = interchange.to_openmm()
        runtimes['OpenMM'][DOP] = perf_counter() - omm_start
        
        ## LAMMPS
        lmp_start = perf_counter()
        sys = interchange.to_lammps(prefix=f'{tmpdir}/mol')
        runtimes['LAMMPS'][DOP] = perf_counter() - lmp_start
        
        ## GROMACS
        gmx_start = perf_counter()
        sys = interchange.to_gromacs(prefix=f'{tmpdir}/mol')
        runtimes['GROMACS'][DOP] = perf_counter() - gmx_start
        
runtimes_df = pd.DataFrame.from_records(runtimes)
runtimes_df.index.name = 'N_atoms'
runtimes_df.to_csv('inc_export_times.csv')

Runtime data: inc_export_times.csv

Plotting results (note the log scale):

import pandas as pd
import matplotlib.pyplot as plt


runtimes_df = pd.read_csv('inc_export_times.csv', index_col=0)

fig, ax = plt.subplots()
ax.set_title('Interchange export times by MD format')
ax.set_xlabel('Number of atoms in Topology')
ax.set_ylabel('Runtime (seconds)')

for engine, times in runtimes_df.items():
    ax.loglog(runtimes_df.index, times, 'o-', label=engine)
ax.legend()

fig.savefig('inc_export_times.png')
Image

Log-log fit to estimate runtime prefactor and exponent (assuming $t = C \cdot N^\alpha$ and disregarding first couple of "flat" points as outliers):

import numpy as np

aeq : str = '\u2248'
discard_first_n : int = 2
for engine, times in runtimes_df.items():
    slope, intercept = np.polyfit(
        np.log(runtimes_df.index[discard_first_n:]),
        np.log(times[discard_first_n:]),
        deg=1,
    )
    print(f'{engine}: T {aeq} O(N^({slope:.4f})) (prefactor {aeq} {np.exp(intercept):.4f})')

Output of above:

GROMACS: T ≈ O(N^(1.9678)) (prefactor ≈ 0.0017)
LAMMPS: T ≈ O(N^(0.9818)) (prefactor ≈ 0.0005)
OpenMM: T ≈ O(N^(0.9903)) (prefactor ≈ 0.0049)

===

This isn't a huge amount of data, but nevertheless pretty clearly points to something asymptotically slower going on in the GROMACS writer. Any thoughts folks might have on what's causing the slowdown here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    feedback neededCould use feedback from usersgromacsrelating to GROMACSquestionFurther information is requested

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions