Skip to content

vault-agent-injector ClusterRole missing 'get nodes' permission for leader election #1174

@chittalpatel

Description

@chittalpatel

Description

The vault-agent-injector-clusterrole ClusterRole is missing get permission on nodes, which causes continuous error logs from the leader election mechanism in the non-leader injector pod(s).

Current behavior

When running the injector with replicas > 1, the non-leader pod logs the following error approximately every 16 seconds:

{"@level":"error","@message":"Failed to get Node","@module":"handler.operator-lib.leader","Node.Name":"<node-name>","error":"nodes \"<node-name>\" is forbidden: User \"system:serviceaccount:vault:vault-agent-injector\" cannot get resource \"nodes\" in API group \"\" at the cluster scope"}

This happens because operator-framework/operator-lib's leader election calls isNotReadyNode() during the election loop to check whether the current leader's node has gone NotReady. This function calls getNode(), which requires get on nodes at cluster scope — a permission not included in the chart's ClusterRole.

Impact

The error is gracefully handled — isNotReadyNode() returns false on failure, so the non-leader pod simply waits normally. However:

  1. Noisy logs: The error is logged every ~16 seconds per non-leader pod, which adds up to significant log noise in environments with log aggregation/alerting.
  2. Degraded failover: If the leader pod's node enters a NotReady state, the non-leader cannot detect this and proactively take over leadership. Instead, it must wait for Kubernetes garbage collection to delete the leader ConfigMap, which can take significantly longer (see also Leader Elect Duration Configurations #743).

Current ClusterRole

rules:
- apiGroups: ["admissionregistration.k8s.io"]
  resources: ["mutatingwebhookconfigurations"]
  verbs: ["get", "list", "watch", "patch"]

Proposed fix

Add get permission on nodes to the vault-agent-injector-clusterrole:

rules:
- apiGroups: ["admissionregistration.k8s.io"]
  resources: ["mutatingwebhookconfigurations"]
  verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get"]

This is a minimal, read-only permission that enables the operator-lib leader elector to function as designed.

Environment

  • Chart version: 0.22.1
  • Kubernetes: 1.34 (EKS)
  • Injector replicas: 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions