Skip to content

Refactor LAMMPS example with MPI support and improved structure#10

Open
yafshar wants to merge 7 commits intomainfrom
refactor/lammps-example-mpi-support
Open

Refactor LAMMPS example with MPI support and improved structure#10
yafshar wants to merge 7 commits intomainfrom
refactor/lammps-example-mpi-support

Conversation

@yafshar
Copy link
Copy Markdown
Member

@yafshar yafshar commented Jan 26, 2026

Summary

This PR refactors the LAMMPS example to add proper MPI support and improve code organization.

Changes

File Structure

  • Renamed: run_length_control.py -> cr_rlc_lammps.py (avoids naming conflict with kim_convergence.run_length_control)
  • Added: cr_rlc_lammps_pkg/ package with modular components:
    • _argument_parser.py: Parses LAMMPS-style arguments (variable, compute, fix, bounds)
    • _trajectory_callback.py: Builds callback for trajectory extraction with MPI support
    • driver.py: Main driver with MPI master-worker pattern

MPI Support

  • Master (rank 0): Runs convergence detection, directs execution
  • Workers (rank 1+): Receive run lengths via MPI, execute lmp.command(f"run {nstep}")
  • Uses mpi4py for communication (optional dependency, only needed for nprocs > 1)
  • Graceful exit: rank 0 broadcasts -1 to signal termination

Documentation

  • Updated README.md with:
    • MPI architecture section
    • mpi4py requirement note
    • Clarification on function naming (run_length_control in cr_rlc_lammps.py vs package function)

Testing

  • Single-process run (lmp -in in.lj)
  • MPI run (mpirun -np 4 lmp -in in.lj)

Breaking Changes

Users must update their LAMMPS input scripts:

-python run_length_control input ... file run_length_control.py
+python run_length_control input ... file cr_rlc_lammps.py

Summary by CodeRabbit

  • New Features

    • New LAMMPS Python entry point with orchestration, argument parsing, trajectory callback, and MPI master-worker support for run-length control.
  • Documentation

    • README updated: new entry point, nevery-based argument naming, examples, input type clarifications, and return-value description.
  • Refactor

    • Run-length control broken into modular components and reorganized; legacy single-file implementation removed.

✏️ Tip: You can customize this high-level summary in your review settings.

- Rename run_length_control.py to cr_rlc_lammps.py to avoid confusion
  with kim_convergence.run_length_control function
- Add proper MPI support with master-worker pattern (rank 0 controls,
  workers execute runs)
- Modularize code into cr_rlc_lammps_pkg/ with:
  - _argument_parser.py: Handle LAMMPS argument parsing
  - _trajectory_callback.py: Build trajectory extraction callback
  - driver.py: Orchestrate convergence detection and MPI communication
- Add mpi4py as optional dependency for multi-process runs
- Update README.md with:
  - MPI architecture documentation
  - mpi4py requirement note

BREAKING CHANGE: File renamed from run_length_control.py to cr_rlc_lammps.py
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

Refactors the LAMMPS example by removing the monolithic run_length_control.py and adding a modular package: new entry point cr_rlc_lammps.py, plus cr_rlc_lammps_pkg containing an ArgumentParser, trajectory callback factory, and Driver; updates README and example inputs; adds MPI-aware orchestration.

Changes

Cohort / File(s) Summary
Documentation
examples/lammps/README.md
Update references to use cr_rlc_lammps.py; add MPI/master-worker notes; change examples to nevery usage; add Return value section and clarify input types.
Entry point
examples/lammps/cr_rlc_lammps.py
New LAMMPS Python entry point defining run_length_control(lmpptr, nevery: int, *argv) that instantiates Driver and runs it.
Argument parsing
examples/lammps/cr_rlc_lammps_pkg/_argument_parser.py
New ArgumentParser class: parses/validates CLI-like args, builds LAMMPS fix vector command, resolves variables/computes/fixes, handles bounds and population-stat options, returns structured config dict.
Trajectory callback
examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py
New build_get_trajectory(lmp, config, nevery) producing get_trajectory(nstep, args) closure; advances LAMMPS runs, extracts fix data, enforces bounds, handles single/multi-var shapes and MPI communication.
Driver / Orchestration
examples/lammps/cr_rlc_lammps_pkg/driver.py
New Driver class wiring LAMMPS to kim_convergence.run_length_control; prepares params, coordinates root/worker MPI behavior, stores/prints run_var, and optionally dumps trajectory.
Example inputs
examples/lammps/in.lj, examples/lammps/in.lj2, examples/lammps/in.lj3
Updated LAMMPS input files to invoke cr_rlc_lammps.py instead of run_length_control.py.
Removed
examples/lammps/run_length_control.py
Removed legacy monolithic script; functionality refactored into the new package modules.

Sequence Diagram(s)

sequenceDiagram
  participant LMP as LAMMPS (Python)
  participant Driver as cr_rlc_lammps.Driver
  participant Parser as ArgumentParser
  participant KC as kim_convergence
  participant MPI as mpi4py (workers)

  LMP->>Driver: run_length_control(lmpptr, nevery, argv)
  Driver->>Parser: parse(lmp, nevery, argv) -> config
  Driver->>Driver: build_get_trajectory(lmp, config, nevery) -> get_trajectory
  Driver->>KC: run_length_control(get_trajectory, params...)
  KC->>get_trajectory: request trajectory(nstep, args)
  get_trajectory->>LMP: run steps / extract fix data
  alt MPI world_size > 1
    get_trajectory->>MPI: broadcast/recv step info / workers run local steps
  end
  get_trajectory-->>KC: return trajectory array
  KC-->>Driver: convergence report
  Driver->>LMP: set_string_variable("run_var", report) / print
  alt MPI world_size > 1
    Driver->>MPI: signal workers to exit
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

documentation

Poem

🐇 I parsed the args and hopped through the loop,
Built tidy trajectories, nevery in my scoop,
Driver at helm, MPI in tow,
Bounds checked, reports set to show—
A rabbit cheers for modular flow!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective: refactoring the LAMMPS example code structure and adding MPI support as demonstrated by the new package organization and MPI-aware components.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yafshar yafshar added bug Something isn't working enhancement New feature or request labels Jan 26, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/lammps/README.md (1)

119-127: Typo: "varable" should be "variable".

This typo appears in multiple places in the examples section (lines 125, 145, 174).

📝 Suggested fix for line 125
-- `pea` is a string name of the LAMMPS varable
+- `pea` is a string name of the LAMMPS variable

The same fix should be applied to lines 145 and 174.

🤖 Fix all issues with AI agents
In `@examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py`:
- Around line 64-88: The conditionals using truthiness for bounds (if lb and ub,
elif lb, elif ub) incorrectly treat 0.0 as False; update these checks in
_trajectory_callback.py to use explicit None comparisons (e.g., lb is not None
and ub is not None, lb is not None, ub is not None) so the loops that call
lmp.extract_fix and potentially raise cr.CRError still run when a bound is zero;
keep all other logic (args["nstep"], ncountmax, var_name, and the cr.CRError
messages) unchanged.

In `@examples/lammps/cr_rlc_lammps_pkg/driver.py`:
- Around line 104-148: The call to cr.run_length_control can raise and prevent
rank 0 from sending the worker exit signals, causing MPI deadlock; wrap the
cr.run_length_control(...) invocation in a try/finally so that the existing
comm.send(-1, dest=r, tag=56789) loop (for r in range(1, nprocs)) always runs in
the finally block even if run_length_control raises, and re-raise the exception
after sending the signals so error behavior is preserved; locate the call to
cr.run_length_control and the comm.send loop in driver.py to implement this
change.
- Around line 150-152: The code assumes LAMMPS has set_string_variable, which
may be missing; update the block around set_string_variable/print to check
hasattr(self.lmp, "set_string_variable") and call it when present, otherwise
fall back to issuing a LAMMPS command to create the string variable via
self.lmp.command("variable run_var string '...'" ) (ensure the report string is
properly escaped/quoted before embedding); keep the final
self.lmp.command('print "${run_var}"') unchanged so both paths produce the same
printed output.

In `@examples/lammps/README.md`:
- Around line 7-13: The README contains a punctuation error where the sentence
ends with the period inside the inline code backticks around cr_rlc_lammps.py
(`cr_rlc_lammps.py.`); update the README entry so the inline code is
`cr_rlc_lammps.py` and move the trailing period outside the closing backtick to
read `cr_rlc_lammps.py`. Ensure you edit the sentence referencing the Python
auxiliary file (cr_rlc_lammps.py) in the README so the punctuation is outside
the code span.

Comment thread examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py Outdated
Comment thread examples/lammps/cr_rlc_lammps_pkg/driver.py
Comment thread examples/lammps/cr_rlc_lammps_pkg/driver.py
Comment thread examples/lammps/README.md Outdated
@ilia-nikiforov-umn
Copy link
Copy Markdown
Member

Hi @yafshar , let me know when this is ready for me to take a look at. I believe I have a much quicker example that triggers the deadlock on Hydra, so testing should be faster

@yafshar
Copy link
Copy Markdown
Member Author

yafshar commented Jan 26, 2026

@ilia-nikiforov-umn I believe this change should resolve the hang issue when running LAMMPS at scale with the kim-convergence package. Please try this PR. Note that there should no longer be a need to set any environment variables related to multithreading.
Just make sure to update the entry point to use the new orchestration when launching your run. You can refer to the in.lj input script as an example. If you need to modify any default runtime parameters, please update them in cr_rlc_lammps_pkg/driver.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants