-
Notifications
You must be signed in to change notification settings - Fork 135
Labels
area/robustnessRobustness, reliability, resilience relatedRobustness, reliability, resilience relatedkind/enhancementEnhancement, improvement, extensionEnhancement, improvement, extensionpriority/3Priority (lower number equals higher priority)Priority (lower number equals higher priority)status/closedIssue is closed (either delivered or triaged)Issue is closed (either delivered or triaged)
Description
How to categorize this issue?
/area robustness
/kind enhancement
/priority 3
What would you like to be added:
- Adding a check for
codes.ResourceExhaustedwith a higher retry period.
machine-controller-manager/pkg/util/provider/machinecontroller/machine_util.go
Lines 503 to 515 in 1e2563f
| func (c *controller) machineCreateErrorHandler(ctx context.Context, machine *v1alpha1.Machine, createMachineResponse *driver.CreateMachineResponse, err error) (machineutils.RetryPeriod, error) { | |
| var ( | |
| retryRequired = machineutils.MediumRetry | |
| lastKnownState string | |
| ) | |
| machineErr, ok := status.FromError(err) | |
| if ok { | |
| switch machineErr.Code() { | |
| case codes.Unknown, codes.DeadlineExceeded, codes.Aborted, codes.Unavailable: | |
| retryRequired = machineutils.ShortRetry | |
| lastKnownState = machine.Status.LastKnownState | |
| } | |
| } |
Why is this needed:
Currently, machines that fail due to codes.ResourceExhaused, are retried using machineutils.MediumRetry which is every 3 minutes.
When a resource in the underlying infrastructure is exhausted, it's unlikely that this will change in that short period of time.
The provider-openstack first creates the volume and then the machine.
However, depending on the size of the nodePool, this can lead to a large number of unnecessary create/delete API calls.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/robustnessRobustness, reliability, resilience relatedRobustness, reliability, resilience relatedkind/enhancementEnhancement, improvement, extensionEnhancement, improvement, extensionpriority/3Priority (lower number equals higher priority)Priority (lower number equals higher priority)status/closedIssue is closed (either delivered or triaged)Issue is closed (either delivered or triaged)