Skip to content

Commit d494ba8

Browse files
committed
fix
1 parent b7caf37 commit d494ba8

1 file changed

Lines changed: 24 additions & 19 deletions

File tree

doc/train/learning-rate.md

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ DeePMD-kit supports two learning rate schedules:
55
- **`exp`**: Exponential decay with optional stepped or smooth mode
66
- **`cosine`**: Cosine annealing for smooth decay curve
77

8-
Both schedules support optional warmup phase where the learning rate gradually increases from a small initial value to the target `start_lr`.
8+
Both schedules support an optional warmup phase where the learning rate gradually increases from a small initial value to the target `start_lr`.
99

1010
## Quick Start
1111

@@ -36,32 +36,37 @@ The following parameters are shared by both `exp` and `cosine` schedules.
3636

3737
### Required parameters
3838

39-
| Parameter | Type | Description |
40-
| --------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
41-
| `start_lr` | float | The learning rate at the start of training (after warmup). |
42-
| `stop_lr` | float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
43-
| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`. |
39+
| Parameter | Type | Description |
40+
| ---------- | ----- | ---------------------------------------------------------- |
41+
| `start_lr` | float | The learning rate at the start of training (after warmup). |
4442

45-
You must provide exactly one of `stop_lr` or `stop_lr_ratio`.
43+
### Stopping learning rate
44+
45+
You must specify exactly one of the following two mutually exclusive parameters:
46+
47+
| Parameter | Type | Description |
48+
| --------------- | ----- | ------------------------------------------------------------------------------------------------------------------------ |
49+
| `stop_lr` | float | The learning rate at the end of training. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
50+
| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. Computed as `stop_lr = start_lr * stop_lr_ratio`. |
4651

4752
### Optional parameters
4853

4954
| Parameter | Type | Default | Description |
5055
| --------------------- | ----- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
5156
| `warmup_steps` | int | 0 | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
52-
| `warmup_ratio` | float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`. |
57+
| `warmup_ratio` | float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * numb_steps)`. Mutually exclusive with `warmup_steps`. |
5358
| `warmup_start_factor` | float | 0.0 | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`. |
5459
| `scale_by_worker` | str | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`. |
5560

5661
### Type-specific parameters
5762

5863
#### Exponential decay (`type: "exp"`)
5964

60-
| Parameter | Type | Default | Description |
61-
| ------------- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
62-
| `decay_steps` | int | 5000 | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
63-
| `decay_rate` | float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`. |
64-
| `smooth` | bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay. |
65+
| Parameter | Type | Default | Description |
66+
| ------------- | ----- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
67+
| `decay_steps` | int | 5000 | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`numb_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
68+
| `decay_rate` | float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`. |
69+
| `smooth` | bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay. |
6570

6671
#### Cosine annealing (`type: "cosine"`)
6772

@@ -92,10 +97,10 @@ lr(t) = start_lr * decay_rate ^ ((t - warmup_steps) / decay_steps)
9297
If `decay_rate` is not explicitly provided, it is computed from `start_lr` and `stop_lr`:
9398

9499
```text
95-
decay_rate = (stop_lr / start_lr) ^ (decay_steps / (num_steps - warmup_steps))
100+
decay_rate = (stop_lr / start_lr) ^ (decay_steps / (numb_steps - warmup_steps))
96101
```
97102

98-
where `num_steps` is the total training steps from the training configuration.
103+
where `numb_steps` is the internal total number of training steps (derived from `training.numb_steps` in the training configuration).
99104

100105
### Examples
101106

@@ -150,7 +155,7 @@ Learning rate starts from `0.0001` (i.e., `0.1 * 0.001`), increases linearly to
150155
}
151156
```
152157

153-
If `num_steps` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts from `0.0` (default `warmup_start_factor`) and increases to `0.001`.
158+
If `numb_steps` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts from `0.0` (default `warmup_start_factor`) and increases to `0.001`.
154159

155160
**Smooth exponential decay:**
156161

@@ -175,7 +180,7 @@ The cosine annealing schedule smoothly decreases the learning rate following a c
175180
During the decay phase (after warmup), the learning rate follows:
176181

177182
```text
178-
lr(t) = stop_lr + (start_lr - stop_lr) / 2 * (1 + cos(pi * (t - warmup_steps) / (num_steps - warmup_steps)))
183+
lr(t) = stop_lr + (start_lr - stop_lr) / 2 * (1 + cos(pi * (t - warmup_steps) / (numb_steps - warmup_steps)))
179184
```
180185

181186
At the middle of training (relative to decay phase), the learning rate is approximately `(start_lr + stop_lr) / 2`.
@@ -245,7 +250,7 @@ When `warmup_start_factor` is 0.0 (default), warmup starts from 0:
245250
You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative):
246251

247252
- `warmup_steps`: Explicit number of warmup steps
248-
- `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * num_steps)`
253+
- `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps`
249254

250255
These are mutually exclusive.
251256

@@ -257,7 +262,7 @@ These are mutually exclusive.
257262
| ---------------------- | ---------------------------------------------------- |
258263
| $\tau$ | Global step index (0-indexed) |
259264
| $\tau^{\text{warmup}}$ | Number of warmup steps |
260-
| $\tau^{\text{decay}}$ | Number of decay steps = `num_steps - warmup_steps` |
265+
| $\tau^{\text{decay}}$ | Number of decay steps = `numb_steps - warmup_steps` |
261266
| $\gamma^0$ | `start_lr`: Learning rate at start of decay phase |
262267
| $\gamma^{\text{stop}}$ | `stop_lr`: Learning rate at end of training |
263268
| $f^{\text{warmup}}$ | `warmup_start_factor`: Initial warmup LR factor |

0 commit comments

Comments
 (0)