Skip to content

[core][autoscaler] Retry GCP project metadata updates on HTTP 412 errors#60429

Merged
edoakes merged 4 commits intoray-project:masterfrom
rueian:retry-gcp-meta-412
Jan 23, 2026
Merged

[core][autoscaler] Retry GCP project metadata updates on HTTP 412 errors#60429
edoakes merged 4 commits intoray-project:masterfrom
rueian:retry-gcp-meta-412

Conversation

@rueian
Copy link
Contributor

@rueian rueian commented Jan 22, 2026

When the autoscaler tries to launch a Ray cluster on GCP, it puts a new SSH key into the project metadata if necessary. The update may results into an HTTP 412 precondition failure if there are concurrent tries to update the metadata. The error will look like this:

googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">

The error can only be resolved by retrying. Therefore, to provide a better user experience, this PR does the retry for the users automatically:

  1. Catch the error.
  2. Reload the metadata and update it again.

@rueian rueian requested a review from a team as a code owner January 22, 2026 23:02
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism to handle HTTP 412 precondition failures when updating GCP project metadata for SSH keys. This is a good improvement for robustness in concurrent environments. My main feedback is to make the retry loop bounded to prevent potential infinite loops and to add a small backoff delay. This will make the retry logic safer and more robust.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b722b8dcd1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor[bot]

This comment was marked as outdated.

@rueian rueian force-pushed the retry-gcp-meta-412 branch from b722b8d to 71a0e8c Compare January 22, 2026 23:10
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
@rueian rueian force-pushed the retry-gcp-meta-412 branch from 71a0e8c to e9418e0 Compare January 22, 2026 23:12
@rueian rueian added core-autoscaler autoscaler related issues core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests labels Jan 22, 2026
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@rueian rueian force-pushed the retry-gcp-meta-412 branch from a8677a0 to 9495ec7 Compare January 22, 2026 23:44
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
@rueian rueian force-pushed the retry-gcp-meta-412 branch from 9495ec7 to ab909bd Compare January 23, 2026 00:00
@rueian rueian requested a review from edoakes January 23, 2026 19:52
@rueian
Copy link
Contributor Author

rueian commented Jan 23, 2026

Hi @edoakes, please review it again 🙏

@edoakes edoakes merged commit b84e58c into ray-project:master Jan 23, 2026
6 checks passed
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Jan 26, 2026
…ors (ray-project#60429)

When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:

```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```

The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.

---------

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
jinbum-kim pushed a commit to jinbum-kim/ray that referenced this pull request Jan 29, 2026
…ors (ray-project#60429)

When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:

```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```

The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.

---------

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: jinbum-kim <jinbum9958@gmail.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
…ors (ray-project#60429)

When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:

```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```

The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.

---------

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ors (ray-project#60429)

When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:

```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```

The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.

---------

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ors (ray-project#60429)

When the autoscaler tries to launch a Ray cluster on GCP, it puts a new
SSH key into the project metadata if necessary. The update may results
into an HTTP 412 precondition failure if there are concurrent tries to
update the metadata. The error will look like this:

```python
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/my_gcp_project/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
```

The error can only be resolved by retrying. Therefore, to provide a
better user experience, this PR does the retry for the users
automatically:
1. Catch the error.
2. Reload the metadata and update it again.

---------

Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core core-autoscaler autoscaler related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants