Commit ddef397
[core][autoscaler] Retry GCP project metadata updates on HTTP 412 errors (ray-project#60429)
When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:
```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```
The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.
---------
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>1 parent 37ef6ba commit ddef397
1 file changed
+18
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
549 | 549 | | |
550 | 550 | | |
551 | 551 | | |
552 | | - | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
553 | 570 | | |
554 | 571 | | |
555 | 572 | | |
| |||
0 commit comments