fix

OutisLi · OutisLi · commit d494ba81b044 · 2026-03-01T12:03:42.000+08:00
diff --git a/doc/train/learning-rate.md b/doc/train/learning-rate.md
@@ -5,7 +5,7 @@ DeePMD-kit supports two learning rate schedules:
 - **`exp`**: Exponential decay with optional stepped or smooth mode
 - **`cosine`**: Cosine annealing for smooth decay curve
 
-Both schedules support optional warmup phase where the learning rate gradually increases from a small initial value to the target `start_lr`.
+Both schedules support an optional warmup phase where the learning rate gradually increases from a small initial value to the target `start_lr`.
 
 ## Quick Start
 
@@ -36,32 +36,37 @@ The following parameters are shared by both `exp` and `cosine` schedules.
 
 ### Required parameters
 
-| Parameter       | Type  | Description                                                                                                                                                           |
-| --------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `start_lr`      | float | The learning rate at the start of training (after warmup).                                                                                                            |
-| `stop_lr`       | float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
-| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`.                                                    |
+| Parameter  | Type  | Description                                                |
+| ---------- | ----- | ---------------------------------------------------------- |
+| `start_lr` | float | The learning rate at the start of training (after warmup). |
 
-You must provide exactly one of `stop_lr` or `stop_lr_ratio`.
+### Stopping learning rate
+
+You must specify exactly one of the following two mutually exclusive parameters:
+
+| Parameter       | Type  | Description                                                                                                              |
+| --------------- | ----- | ------------------------------------------------------------------------------------------------------------------------ |
+| `stop_lr`       | float | The learning rate at the end of training. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
+| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. Computed as `stop_lr = start_lr * stop_lr_ratio`.                                  |
 
 ### Optional parameters
 
 | Parameter             | Type  | Default  | Description                                                                                                                                               |
 | --------------------- | ----- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `warmup_steps`        | int   | 0        | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
-| `warmup_ratio`        | float | None     | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`.                    |
+| `warmup_ratio`        | float | None     | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * numb_steps)`. Mutually exclusive with `warmup_steps`.                   |
 | `warmup_start_factor` | float | 0.0      | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`.                                                             |
 | `scale_by_worker`     | str   | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`.                                                                 |
 
 ### Type-specific parameters
 
 #### Exponential decay (`type: "exp"`)
 
-| Parameter     | Type  | Default | Description                                                                                                                                                                                                               |
-| ------------- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `decay_steps` | int   | 5000    | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
-| `decay_rate`  | float | None    | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`.                                                                                                                                             |
-| `smooth`      | bool  | false   | If `true`, use smooth exponential decay. If `false`, stepped decay.                                                                                                                                                       |
+| Parameter     | Type  | Default | Description                                                                                                                                                                                                                |
+| ------------- | ----- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `decay_steps` | int   | 5000    | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`numb_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
+| `decay_rate`  | float | None    | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`.                                                                                                                                              |
+| `smooth`      | bool  | false   | If `true`, use smooth exponential decay. If `false`, stepped decay.                                                                                                                                                        |
 
 #### Cosine annealing (`type: "cosine"`)
 
@@ -92,10 +97,10 @@ lr(t) = start_lr * decay_rate ^ ((t - warmup_steps) / decay_steps)
 If `decay_rate` is not explicitly provided, it is computed from `start_lr` and `stop_lr`:
 
 ```text
-decay_rate = (stop_lr / start_lr) ^ (decay_steps / (num_steps - warmup_steps))
+decay_rate = (stop_lr / start_lr) ^ (decay_steps / (numb_steps - warmup_steps))
 ```
 
-where `num_steps` is the total training steps from the training configuration.
+where `numb_steps` is the internal total number of training steps (derived from `training.numb_steps` in the training configuration).
 
 ### Examples
 
@@ -150,7 +155,7 @@ Learning rate starts from `0.0001` (i.e., `0.1 * 0.001`), increases linearly to
 }
 ```
 
-If `num_steps` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts from `0.0` (default `warmup_start_factor`) and increases to `0.001`.
+If `numb_steps` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts from `0.0` (default `warmup_start_factor`) and increases to `0.001`.
 
 **Smooth exponential decay:**
 
@@ -175,7 +180,7 @@ The cosine annealing schedule smoothly decreases the learning rate following a c
 During the decay phase (after warmup), the learning rate follows:
 
 ```text
-lr(t) = stop_lr + (start_lr - stop_lr) / 2 * (1 + cos(pi * (t - warmup_steps) / (num_steps - warmup_steps)))
+lr(t) = stop_lr + (start_lr - stop_lr) / 2 * (1 + cos(pi * (t - warmup_steps) / (numb_steps - warmup_steps)))
 ```
 
 At the middle of training (relative to decay phase), the learning rate is approximately `(start_lr + stop_lr) / 2`.
@@ -245,7 +250,7 @@ When `warmup_start_factor` is 0.0 (default), warmup starts from 0:
 You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative):
 
 - `warmup_steps`: Explicit number of warmup steps
-- `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * num_steps)`
+- `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps`
 
 These are mutually exclusive.
 
@@ -257,7 +262,7 @@ These are mutually exclusive.
 | ---------------------- | ---------------------------------------------------- |
 | $\tau$                 | Global step index (0-indexed)                        |
 | $\tau^{\text{warmup}}$ | Number of warmup steps                               |
-| $\tau^{\text{decay}}$  | Number of decay steps = `num_steps - warmup_steps`   |
+| $\tau^{\text{decay}}$  | Number of decay steps = `numb_steps - warmup_steps`  |
 | $\gamma^0$             | `start_lr`: Learning rate at start of decay phase    |
 | $\gamma^{\text{stop}}$ | `stop_lr`: Learning rate at end of training          |
 | $f^{\text{warmup}}$    | `warmup_start_factor`: Initial warmup LR factor      |