Commit 1db331e
[core][autoscaler] Retry GCP project metadata updates on HTTP 412 errors (ray-project#60429)
When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:
```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```
The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.
---------
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>1 parent da5a1e3 commit 1db331e
1 file changed
+18
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
549 | 549 | | |
550 | 550 | | |
551 | 551 | | |
552 | | - | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
553 | 570 | | |
554 | 571 | | |
555 | 572 | | |
| |||
0 commit comments