Skip to content

Machine creation retry too frequently for machines with ResourceExhausted #977

@RAPSNX

Description

@RAPSNX

How to categorize this issue?
/area robustness
/kind enhancement
/priority 3

What would you like to be added:

  • Adding a check for codes.ResourceExhausted with a higher retry period.

func (c *controller) machineCreateErrorHandler(ctx context.Context, machine *v1alpha1.Machine, createMachineResponse *driver.CreateMachineResponse, err error) (machineutils.RetryPeriod, error) {
var (
retryRequired = machineutils.MediumRetry
lastKnownState string
)
machineErr, ok := status.FromError(err)
if ok {
switch machineErr.Code() {
case codes.Unknown, codes.DeadlineExceeded, codes.Aborted, codes.Unavailable:
retryRequired = machineutils.ShortRetry
lastKnownState = machine.Status.LastKnownState
}
}

Why is this needed:
Currently, machines that fail due to codes.ResourceExhaused, are retried using machineutils.MediumRetry which is every 3 minutes.
When a resource in the underlying infrastructure is exhausted, it's unlikely that this will change in that short period of time.

The provider-openstack first creates the volume and then the machine.
However, depending on the size of the nodePool, this can lead to a large number of unnecessary create/delete API calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/robustnessRobustness, reliability, resilience relatedkind/enhancementEnhancement, improvement, extensionpriority/3Priority (lower number equals higher priority)status/closedIssue is closed (either delivered or triaged)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions