Configurable graceful shutdown timeout for controller manager

### What problem are you facing?

When my Kubernetes cluster moves pods to different nodes (e.g., during node maintenance, scaling, or resource rebalancing), the OpenTofu provider pod can be terminated while it's running a `tofu apply` operation. These operations often take longer than 30 seconds to complete.

Currently, I'm seeing the following error in the pod logs:
```
crossplane-opentofu-provider: error: Cannot start controller manager: failed waiting for all runnables to end within grace period of 30s: context deadline exceeded
```

**Current situation:**
- I've set `terminationGracePeriodSeconds` to a larger value (e.g., 300 seconds) in the pod spec, which gives Kubernetes more time before forcefully killing the pod
- However, the controller manager itself has a hardcoded `GracefulShutdownTimeout` that defaults to 30 seconds
- This means even though Kubernetes is willing to wait 5 minutes, the controller manager only waits 30 seconds for its workers (which run `tofu apply` commands) to finish gracefully
- After 30 seconds, the controller manager gives up and exits, potentially leaving Terraform/OpenTofu operations in an inconsistent state

**Why this is a problem:**
- Long-running `tofu apply` operations (>30s) get interrupted during pod migrations
- This can leave infrastructure in an inconsistent state
- The `terminationGracePeriodSeconds` setting alone doesn't solve the problem because the controller manager has its own internal timeout
- There's currently no way to configure the controller manager's `GracefulShutdownTimeout` to match the pod's termination grace period



### How could Upbound help solve your problem?

Add a new command-line flag (similar to `--poll` and `--max-reconcile-rate`) to configure the controller manager's `GracefulShutdownTimeout`. This would allow users to set a timeout that matches their `terminationGracePeriodSeconds` and the expected duration of their `tofu apply` operations.

**Benefits:**
- Users can configure the graceful shutdown timeout to match their workload requirements
- Prevents premature termination of long-running `tofu apply` operations
- Aligns with Kubernetes pod lifecycle management best practices
- Follows the existing pattern of other configurable timeouts in the provider (like `--timeout`, etc.)

**Related configuration:**
This would work in conjunction with:
- `terminationGracePeriodSeconds` in the pod spec (Kubernetes-level timeout)
- `--timeout` flag (controls how long individual tofu processes may run)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable graceful shutdown timeout for controller manager #121

What problem are you facing?

How could Upbound help solve your problem?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configurable graceful shutdown timeout for controller manager #121

Description

What problem are you facing?

How could Upbound help solve your problem?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions