Improve quadratic runtime of Interchange 0.4.0 GMX export

**Description**
I've been putting together some example notebooks which involve MD exports using Interchange and noticed that the GROMACS export in particular was taking an inordinate amount of time, so I decided to benchmark the export times to try and make that claim more precise:

Env setup:
```shell
mamba create -n interchange-export python=3.11 openff-interchange=0.4.0
```

Benchmark code: (took ~30 min to complete on my machine!)
```python
from time import perf_counter
from tempfile import TemporaryDirectory
from collections import defaultdict

import numpy as np
import pandas as pd

from openff.toolkit import Molecule, ForceField
from openff.units import unit
from openff.interchange import Interchange
from openff.interchange.components._packmol import UNIT_CUBE


runtimes = defaultdict(dict)
DOPs : list[int] = [1, 5, 10, 50, 100, 500, 1000]

for DOP in DOPs:
    offmol = Molecule.from_smiles('CCO' * DOP)
    offmol.add_conformer(np.random.rand(offmol.n_atoms, 3) * unit.nanometer)
    offmol.assign_partial_charges(partial_charge_method='gasteiger')

    interchange = Interchange.from_smirnoff(
        force_field=ForceField('openff-2.0.0.offxml'),
        topology=offmol.to_topology(),
        charge_from_molecules=[offmol],
    )
    interchange.box = np.eye(3, dtype=float) * unit.nanometer
    
    with TemporaryDirectory() as tmpdir:
        ## OpenMM
        omm_start = perf_counter()
        sys = interchange.to_openmm()
        runtimes['OpenMM'][DOP] = perf_counter() - omm_start
        
        ## LAMMPS
        lmp_start = perf_counter()
        sys = interchange.to_lammps(prefix=f'{tmpdir}/mol')
        runtimes['LAMMPS'][DOP] = perf_counter() - lmp_start
        
        ## GROMACS
        gmx_start = perf_counter()
        sys = interchange.to_gromacs(prefix=f'{tmpdir}/mol')
        runtimes['GROMACS'][DOP] = perf_counter() - gmx_start
        
runtimes_df = pd.DataFrame.from_records(runtimes)
runtimes_df.index.name = 'N_atoms'
runtimes_df.to_csv('inc_export_times.csv')
```

Runtime data: [inc_export_times.csv](https://github.com/user-attachments/files/21323549/inc_export_times.csv)

Plotting results (note the log scale):
```python
import pandas as pd
import matplotlib.pyplot as plt


runtimes_df = pd.read_csv('inc_export_times.csv', index_col=0)

fig, ax = plt.subplots()
ax.set_title('Interchange export times by MD format')
ax.set_xlabel('Number of atoms in Topology')
ax.set_ylabel('Runtime (seconds)')

for engine, times in runtimes_df.items():
    ax.loglog(runtimes_df.index, times, 'o-', label=engine)
ax.legend()

fig.savefig('inc_export_times.png')
```
<img width="640" height="480" alt="Image" src="https://github.com/user-attachments/assets/fef51c6a-9ff5-4849-8728-fa611d88bec6" />

Log-log fit to estimate runtime prefactor and exponent (assuming $t = C \cdot N^\alpha$ and disregarding first couple of "flat" points as outliers):
```python
import numpy as np

aeq : str = '\u2248'
discard_first_n : int = 2
for engine, times in runtimes_df.items():
    slope, intercept = np.polyfit(
        np.log(runtimes_df.index[discard_first_n:]),
        np.log(times[discard_first_n:]),
        deg=1,
    )
    print(f'{engine}: T {aeq} O(N^({slope:.4f})) (prefactor {aeq} {np.exp(intercept):.4f})')
```

Output of above:
```
GROMACS: T ≈ O(N^(1.9678)) (prefactor ≈ 0.0017)
LAMMPS: T ≈ O(N^(0.9818)) (prefactor ≈ 0.0005)
OpenMM: T ≈ O(N^(0.9903)) (prefactor ≈ 0.0049)
```

===

This isn't a huge amount of data, but nevertheless pretty clearly points to something asymptotically slower going on in the GROMACS writer. Any thoughts folks might have on what's causing the slowdown here?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve quadratic runtime of Interchange 0.4.0 GMX export #1264

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve quadratic runtime of Interchange 0.4.0 GMX export #1264

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions