Which component are you using?:
Cluster Autoscaler
What version of the component are you using?:
registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.4
Component version:
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version
Client Version: v1.26.6+k3s1
Kustomize Version: v4.5.7
Server Version: v1.26.6+k3s1
What environment is this in?:
Hetzner Cloud
What did you expect to happen?:
When the cluster autoscaler is configured with a priority expander, and multiple node groups of differing priorities are provided, the cluster autoscaler should back-off after some time if the cloud provider fails to provision nodes in the high priority node group due to resource unavailability and proceed to lower priority node groups.
What happened instead?:
The high priority node group (in the below log pool1) has no resources available currently to provision the requested nodes.
The cluster autoscaler is stuck in a loop trying to provision nodes in the high-prio group and not proceeding to pool2 (lower prio, resources available). I've also tried to set --max-node-group-backoff-duration=1m with no effect.
W1101 05:15:05.399825 1 hetzner_servers_cache.go:94] Fetching servers from Hetzner API
I1101 05:15:15.806488 1 hetzner_node_group.go:438] Set node group draining-node-pool size from 0 to 0, expected delta 0
I1101 05:15:15.806519 1 hetzner_node_group.go:438] Set node group pool1 size from 1 to 1, expected delta 0
I1101 05:15:15.806525 1 hetzner_node_group.go:438] Set node group pool2 size from 0 to 0, expected delta 0
I1101 05:15:15.808727 1 scale_up.go:608] Scale-up: setting group pool1 size to 4
E1101 05:15:16.068533 1 hetzner_node_group.go:117] failed to create error: could not create server type ccx43 in region fsn1: we are unable to provision servers for this location, try with a different location or try later (resource_unavailable)
E1101 05:15:16.079704 1 hetzner_node_group.go:117] failed to create error: could not create server type ccx43 in region fsn1: we are unable to provision servers for this location, try with a different location or try later (resource_unavailable)
E1101 05:15:16.126786 1 hetzner_node_group.go:117] failed to create error: could not create server type ccx43 in region fsn1: we are unable to provision servers for this location, try with a different location or try later (resource_unavailable)
W1101 05:15:16.126816 1 hetzner_servers_cache.go:94] Fetching servers from Hetzner API
I1101 05:15:26.655179 1 hetzner_node_group.go:438] Set node group pool1 size from 1 to 1, expected delta 0
I1101 05:15:26.655243 1 hetzner_node_group.go:438] Set node group pool2 size from 0 to 0, expected delta 0
I1101 05:15:26.655257 1 hetzner_node_group.go:438] Set node group draining-node-pool size from 0 to 0, expected delta 0
I1101 05:15:26.660093 1 scale_up.go:608] Scale-up: setting group pool1 size to 4
E1101 05:15:26.948368 1 hetzner_node_group.go:117] failed to create error: could not create server type ccx43 in region fsn1: we are unable to provision servers for this location, try with a different location or try later (resource_unavailable)
E1101 05:15:26.981452 1 hetzner_node_group.go:117] failed to create error: could not create server type ccx43 in region fsn1: we are unable to provision servers for this location, try with a different location or try later (resource_unavailable)
E1101 05:15:27.044150 1 hetzner_node_group.go:117] failed to create error: could not create server type ccx43 in region fsn1: we are unable to provision servers for this location, try with a different location or try later (resource_unavailable)
How to reproduce it (as minimally and precisely as possible):
apiVersion: v1
data:
priorities: |
10:
- pool2
20:
- pool1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
spec:
containers:
- command:
- ./cluster-autoscaler
- --scale-down-unneeded-time=5m
- --cloud-provider=hetzner
- --stderrthreshold=info
- --nodes=0:4:CCX43:FSN1:pool1
- --nodes=0:4:CCX43:NBG1:pool2
- --expander=priority
env:
- name: HCLOUD_IMAGE
value: debian-11
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hcloud
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.4
name: cluster-autoscaler
serviceAccountName: cluster-autoscaler
Anything else we need to know?:
Which component are you using?:
Cluster Autoscaler
What version of the component are you using?:
registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.4
Component version:
What k8s version are you using (
kubectl version)?:kubectl versionOutputWhat environment is this in?:
Hetzner Cloud
What did you expect to happen?:
When the cluster autoscaler is configured with a priority expander, and multiple node groups of differing priorities are provided, the cluster autoscaler should back-off after some time if the cloud provider fails to provision nodes in the high priority node group due to resource unavailability and proceed to lower priority node groups.
What happened instead?:
The high priority node group (in the below log pool1) has no resources available currently to provision the requested nodes.
The cluster autoscaler is stuck in a loop trying to provision nodes in the high-prio group and not proceeding to pool2 (lower prio, resources available). I've also tried to set
--max-node-group-backoff-duration=1mwith no effect.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: