Refactor LAMMPS example with MPI support and improved structure by yafshar · Pull Request #10 · openkim/kim-convergence

yafshar · 2026-01-26T16:58:12Z

Summary

This PR refactors the LAMMPS example to add proper MPI support and improve code organization.

Changes

File Structure

Renamed: run_length_control.py -> cr_rlc_lammps.py (avoids naming conflict with kim_convergence.run_length_control)
Added: cr_rlc_lammps_pkg/ package with modular components:
- _argument_parser.py: Parses LAMMPS-style arguments (variable, compute, fix, bounds)
- _trajectory_callback.py: Builds callback for trajectory extraction with MPI support
- driver.py: Main driver with MPI master-worker pattern

MPI Support

Master (rank 0): Runs convergence detection, directs execution
Workers (rank 1+): Receive run lengths via MPI, execute lmp.command(f"run {nstep}")
Uses mpi4py for communication (optional dependency, only needed for nprocs > 1)
Graceful exit: rank 0 broadcasts -1 to signal termination

Documentation

Updated README.md with:
- MPI architecture section
- mpi4py requirement note
- Clarification on function naming (run_length_control in cr_rlc_lammps.py vs package function)

Testing

Single-process run (lmp -in in.lj)
MPI run (mpirun -np 4 lmp -in in.lj)

Breaking Changes

Users must update their LAMMPS input scripts:

-python run_length_control input ... file run_length_control.py
+python run_length_control input ... file cr_rlc_lammps.py

Summary by CodeRabbit

New Features
- New LAMMPS Python entry point with orchestration, argument parsing, trajectory callback, and MPI master-worker support for run-length control.
Documentation
- README updated: new entry point, nevery-based argument naming, examples, input type clarifications, and return-value description.
Refactor
- Run-length control broken into modular components and reorganized; legacy single-file implementation removed.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Rename run_length_control.py to cr_rlc_lammps.py to avoid confusion with kim_convergence.run_length_control function - Add proper MPI support with master-worker pattern (rank 0 controls, workers execute runs) - Modularize code into cr_rlc_lammps_pkg/ with: - _argument_parser.py: Handle LAMMPS argument parsing - _trajectory_callback.py: Build trajectory extraction callback - driver.py: Orchestrate convergence detection and MPI communication - Add mpi4py as optional dependency for multi-process runs - Update README.md with: - MPI architecture documentation - mpi4py requirement note BREAKING CHANGE: File renamed from run_length_control.py to cr_rlc_lammps.py

coderabbitai · 2026-01-26T16:58:35Z

📝 Walkthrough

Walkthrough

Refactors the LAMMPS example by removing the monolithic run_length_control.py and adding a modular package: new entry point cr_rlc_lammps.py, plus cr_rlc_lammps_pkg containing an ArgumentParser, trajectory callback factory, and Driver; updates README and example inputs; adds MPI-aware orchestration.

Changes

Cohort / File(s)	Summary
Documentation `examples/lammps/README.md`	Update references to use `cr_rlc_lammps.py`; add MPI/master-worker notes; change examples to `nevery` usage; add Return value section and clarify input types.
Entry point `examples/lammps/cr_rlc_lammps.py`	New LAMMPS Python entry point defining `run_length_control(lmpptr, nevery: int, *argv)` that instantiates `Driver` and runs it.
Argument parsing `examples/lammps/cr_rlc_lammps_pkg/_argument_parser.py`	New `ArgumentParser` class: parses/validates CLI-like args, builds LAMMPS fix vector command, resolves variables/computes/fixes, handles bounds and population-stat options, returns structured config dict.
Trajectory callback `examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py`	New `build_get_trajectory(lmp, config, nevery)` producing `get_trajectory(nstep, args)` closure; advances LAMMPS runs, extracts fix data, enforces bounds, handles single/multi-var shapes and MPI communication.
Driver / Orchestration `examples/lammps/cr_rlc_lammps_pkg/driver.py`	New `Driver` class wiring LAMMPS to `kim_convergence.run_length_control`; prepares params, coordinates root/worker MPI behavior, stores/prints `run_var`, and optionally dumps trajectory.
Example inputs `examples/lammps/in.lj`, `examples/lammps/in.lj2`, `examples/lammps/in.lj3`	Updated LAMMPS input files to invoke `cr_rlc_lammps.py` instead of `run_length_control.py`.
Removed `examples/lammps/run_length_control.py`	Removed legacy monolithic script; functionality refactored into the new package modules.

Sequence Diagram(s)

sequenceDiagram
  participant LMP as LAMMPS (Python)
  participant Driver as cr_rlc_lammps.Driver
  participant Parser as ArgumentParser
  participant KC as kim_convergence
  participant MPI as mpi4py (workers)

  LMP->>Driver: run_length_control(lmpptr, nevery, argv)
  Driver->>Parser: parse(lmp, nevery, argv) -> config
  Driver->>Driver: build_get_trajectory(lmp, config, nevery) -> get_trajectory
  Driver->>KC: run_length_control(get_trajectory, params...)
  KC->>get_trajectory: request trajectory(nstep, args)
  get_trajectory->>LMP: run steps / extract fix data
  alt MPI world_size > 1
    get_trajectory->>MPI: broadcast/recv step info / workers run local steps
  end
  get_trajectory-->>KC: return trajectory array
  KC-->>Driver: convergence report
  Driver->>LMP: set_string_variable("run_var", report) / print
  alt MPI world_size > 1
    Driver->>MPI: signal workers to exit
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Major overhaul: Modularize run_length_control, full docs, and deadlock fix #6: Refactors/modularizes kim_convergence.run_length_control which the new Driver invokes and integrates with.

Suggested labels

documentation

Poem

🐇 I parsed the args and hopped through the loop,
Built tidy trajectories, nevery in my scoop,
Driver at helm, MPI in tow,
Bounds checked, reports set to show—
A rabbit cheers for modular flow!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main objective: refactoring the LAMMPS example code structure and adding MPI support as demonstrated by the new package organization and MPI-aware components.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/lammps/README.md (1)
119-127: Typo: "varable" should be "variable".

This typo appears in multiple places in the examples section (lines 125, 145, 174).
📝 Suggested fix for line 125
-- `pea` is a string name of the LAMMPS varable
+- `pea` is a string name of the LAMMPS variable
The same fix should be applied to lines 145 and 174.

🤖 Fix all issues with AI agents

In `@examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py`:
- Around line 64-88: The conditionals using truthiness for bounds (if lb and ub,
elif lb, elif ub) incorrectly treat 0.0 as False; update these checks in
_trajectory_callback.py to use explicit None comparisons (e.g., lb is not None
and ub is not None, lb is not None, ub is not None) so the loops that call
lmp.extract_fix and potentially raise cr.CRError still run when a bound is zero;
keep all other logic (args["nstep"], ncountmax, var_name, and the cr.CRError
messages) unchanged.

In `@examples/lammps/cr_rlc_lammps_pkg/driver.py`:
- Around line 104-148: The call to cr.run_length_control can raise and prevent
rank 0 from sending the worker exit signals, causing MPI deadlock; wrap the
cr.run_length_control(...) invocation in a try/finally so that the existing
comm.send(-1, dest=r, tag=56789) loop (for r in range(1, nprocs)) always runs in
the finally block even if run_length_control raises, and re-raise the exception
after sending the signals so error behavior is preserved; locate the call to
cr.run_length_control and the comm.send loop in driver.py to implement this
change.
- Around line 150-152: The code assumes LAMMPS has set_string_variable, which
may be missing; update the block around set_string_variable/print to check
hasattr(self.lmp, "set_string_variable") and call it when present, otherwise
fall back to issuing a LAMMPS command to create the string variable via
self.lmp.command("variable run_var string '...'" ) (ensure the report string is
properly escaped/quoted before embedding); keep the final
self.lmp.command('print "${run_var}"') unchanged so both paths produce the same
printed output.

In `@examples/lammps/README.md`:
- Around line 7-13: The README contains a punctuation error where the sentence
ends with the period inside the inline code backticks around cr_rlc_lammps.py
(`cr_rlc_lammps.py.`); update the README entry so the inline code is
`cr_rlc_lammps.py` and move the trailing period outside the closing backtick to
read `cr_rlc_lammps.py`. Ensure you edit the sentence referencing the Python
auxiliary file (cr_rlc_lammps.py) in the README so the punctuation is outside
the code span.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…eptions

ilia-nikiforov-umn · 2026-01-26T18:22:32Z

Hi @yafshar , let me know when this is ready for me to take a look at. I believe I have a much quicker example that triggers the deadlock on Hydra, so testing should be faster

yafshar · 2026-01-26T18:24:00Z

@ilia-nikiforov-umn I believe this change should resolve the hang issue when running LAMMPS at scale with the kim-convergence package. Please try this PR. Note that there should no longer be a need to set any environment variables related to multithreading.
Just make sure to update the entry point to use the new orchestration when launching your run. You can refer to the in.lj input script as an example. If you need to modify any default runtime parameters, please update them in cr_rlc_lammps_pkg/driver.py.

yafshar added bug Something isn't working enhancement New feature or request labels Jan 26, 2026

yafshar assigned ilia-nikiforov-umn Jan 26, 2026

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

Comment thread examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py Outdated

Comment thread examples/lammps/cr_rlc_lammps_pkg/driver.py

Comment thread examples/lammps/cr_rlc_lammps_pkg/driver.py

Comment thread examples/lammps/README.md Outdated

yafshar and others added 5 commits January 26, 2026 09:32

Update examples/lammps/cr_rlc_lammps_pkg/driver.py

a94a3e8

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

fix typo in the README

3c7e3cf

Replace checks with None checks to handle zero bounds correctly

01a0523

fix minor punctuation issue

fd5e432

Wrap run_length_control in try/finally to prevent MPI deadlock on exc…

bc13a69

…eptions

yafshar mentioned this pull request Mar 24, 2026

fix(tools): auto-batch long time series to prevent MPI/BLAS deadlock #11

Closed

Merge branch 'main' into refactor/lammps-example-mpi-support

0741998

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor LAMMPS example with MPI support and improved structure#10

Refactor LAMMPS example with MPI support and improved structure#10
yafshar wants to merge 7 commits intomainfrom
refactor/lammps-example-mpi-support

yafshar commented Jan 26, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilia-nikiforov-umn commented Jan 26, 2026

Uh oh!

yafshar commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yafshar commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

File Structure

MPI Support

Documentation

Testing

Breaking Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilia-nikiforov-umn commented Jan 26, 2026

Uh oh!

yafshar commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yafshar commented Jan 26, 2026 •

edited

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading