Refactor LAMMPS example with MPI support and improved structure#10
Refactor LAMMPS example with MPI support and improved structure#10
Conversation
- Rename run_length_control.py to cr_rlc_lammps.py to avoid confusion with kim_convergence.run_length_control function - Add proper MPI support with master-worker pattern (rank 0 controls, workers execute runs) - Modularize code into cr_rlc_lammps_pkg/ with: - _argument_parser.py: Handle LAMMPS argument parsing - _trajectory_callback.py: Build trajectory extraction callback - driver.py: Orchestrate convergence detection and MPI communication - Add mpi4py as optional dependency for multi-process runs - Update README.md with: - MPI architecture documentation - mpi4py requirement note BREAKING CHANGE: File renamed from run_length_control.py to cr_rlc_lammps.py
📝 WalkthroughWalkthroughRefactors the LAMMPS example by removing the monolithic Changes
Sequence Diagram(s)sequenceDiagram
participant LMP as LAMMPS (Python)
participant Driver as cr_rlc_lammps.Driver
participant Parser as ArgumentParser
participant KC as kim_convergence
participant MPI as mpi4py (workers)
LMP->>Driver: run_length_control(lmpptr, nevery, argv)
Driver->>Parser: parse(lmp, nevery, argv) -> config
Driver->>Driver: build_get_trajectory(lmp, config, nevery) -> get_trajectory
Driver->>KC: run_length_control(get_trajectory, params...)
KC->>get_trajectory: request trajectory(nstep, args)
get_trajectory->>LMP: run steps / extract fix data
alt MPI world_size > 1
get_trajectory->>MPI: broadcast/recv step info / workers run local steps
end
get_trajectory-->>KC: return trajectory array
KC-->>Driver: convergence report
Driver->>LMP: set_string_variable("run_var", report) / print
alt MPI world_size > 1
Driver->>MPI: signal workers to exit
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/lammps/README.md (1)
119-127: Typo: "varable" should be "variable".This typo appears in multiple places in the examples section (lines 125, 145, 174).
📝 Suggested fix for line 125
-- `pea` is a string name of the LAMMPS varable +- `pea` is a string name of the LAMMPS variableThe same fix should be applied to lines 145 and 174.
🤖 Fix all issues with AI agents
In `@examples/lammps/cr_rlc_lammps_pkg/_trajectory_callback.py`:
- Around line 64-88: The conditionals using truthiness for bounds (if lb and ub,
elif lb, elif ub) incorrectly treat 0.0 as False; update these checks in
_trajectory_callback.py to use explicit None comparisons (e.g., lb is not None
and ub is not None, lb is not None, ub is not None) so the loops that call
lmp.extract_fix and potentially raise cr.CRError still run when a bound is zero;
keep all other logic (args["nstep"], ncountmax, var_name, and the cr.CRError
messages) unchanged.
In `@examples/lammps/cr_rlc_lammps_pkg/driver.py`:
- Around line 104-148: The call to cr.run_length_control can raise and prevent
rank 0 from sending the worker exit signals, causing MPI deadlock; wrap the
cr.run_length_control(...) invocation in a try/finally so that the existing
comm.send(-1, dest=r, tag=56789) loop (for r in range(1, nprocs)) always runs in
the finally block even if run_length_control raises, and re-raise the exception
after sending the signals so error behavior is preserved; locate the call to
cr.run_length_control and the comm.send loop in driver.py to implement this
change.
- Around line 150-152: The code assumes LAMMPS has set_string_variable, which
may be missing; update the block around set_string_variable/print to check
hasattr(self.lmp, "set_string_variable") and call it when present, otherwise
fall back to issuing a LAMMPS command to create the string variable via
self.lmp.command("variable run_var string '...'" ) (ensure the report string is
properly escaped/quoted before embedding); keep the final
self.lmp.command('print "${run_var}"') unchanged so both paths produce the same
printed output.
In `@examples/lammps/README.md`:
- Around line 7-13: The README contains a punctuation error where the sentence
ends with the period inside the inline code backticks around cr_rlc_lammps.py
(`cr_rlc_lammps.py.`); update the README entry so the inline code is
`cr_rlc_lammps.py` and move the trailing period outside the closing backtick to
read `cr_rlc_lammps.py`. Ensure you edit the sentence referencing the Python
auxiliary file (cr_rlc_lammps.py) in the README so the punctuation is outside
the code span.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
Hi @yafshar , let me know when this is ready for me to take a look at. I believe I have a much quicker example that triggers the deadlock on Hydra, so testing should be faster |
|
@ilia-nikiforov-umn I believe this change should resolve the hang issue when running LAMMPS at scale with the kim-convergence package. Please try this PR. Note that there should no longer be a need to set any environment variables related to multithreading. |
Summary
This PR refactors the LAMMPS example to add proper MPI support and improve code organization.
Changes
File Structure
run_length_control.py->cr_rlc_lammps.py(avoids naming conflict withkim_convergence.run_length_control)cr_rlc_lammps_pkg/package with modular components:_argument_parser.py: Parses LAMMPS-style arguments (variable, compute, fix, bounds)_trajectory_callback.py: Builds callback for trajectory extraction with MPI supportdriver.py: Main driver with MPI master-worker patternMPI Support
lmp.command(f"run {nstep}")mpi4pyfor communication (optional dependency, only needed fornprocs > 1)-1to signal terminationDocumentation
README.mdwith:mpi4pyrequirement noterun_length_controlincr_rlc_lammps.pyvs package function)Testing
lmp -in in.lj)mpirun -np 4 lmp -in in.lj)Breaking Changes
Users must update their LAMMPS input scripts:
Summary by CodeRabbit
New Features
Documentation
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.