Skip to content

feat(scheduler): make node lock timeout configurable#1117

Merged
archlitchi merged 1 commit intoProject-HAMi:masterfrom
Kevinz857:optimze-node-lock-time
Jun 16, 2025
Merged

feat(scheduler): make node lock timeout configurable#1117
archlitchi merged 1 commit intoProject-HAMi:masterfrom
Kevinz857:optimze-node-lock-time

Conversation

@Kevinz857
Copy link
Contributor

@Kevinz857 Kevinz857 commented Jun 12, 2025

Summary

This PR makes the HAMi scheduler's node lock timeout configurable, allowing operators to adjust the timeout duration based on their cluster characteristics and scheduling requirements.

Changes Made

  1. Configuration Layer (pkg/scheduler/config/config.go)

    • Added NodeLockTimeout int configuration parameter
    • Default value set to 1 minute to maintain backward compatibility
  2. Command Line Interface (cmd/scheduler/main.go)

    • Added --node-lock-timeout flag to scheduler command line options
    • Flag accepts timeout value in minutes
    • Integrated timeout initialization in the start() function
  3. Runtime Behavior (pkg/util/nodelock/nodelock.go)

    • The global NodeLockTimeout variable is now set from configuration
    • Maintains existing lock/unlock logic and timeout checking mechanism

Motivation

Previously, the node lock timeout was hardcoded to 1 minute. In some cluster environments, this timeout may be too long, causing unnecessary delays in scheduling operations. By making this configurable, operators can:

  • Reduce scheduling latency in fast clusters by setting shorter timeouts
  • Increase timeout for slower clusters to avoid premature lock releases
  • Fine-tune scheduler behavior based on workload characteristics

Backward Compatibility

  • Default timeout remains 1 minute, ensuring no behavior change for existing deployments
  • Existing configurations continue to work without modification
  • New flag is optional and uses sensible defaults

Default timeout is 5 minute

./scheduler

Custom set 30s

./scheduler --node-lock-timeout=30s

Custom set 2mins

./scheduler --node-lock-timeout=2m

@archlitchi
Copy link
Member

have you tested on your environment?

@Kevinz857
Copy link
Contributor Author

have you tested on your environment?
@archlitchi Yes, I have verified in our own production environment, and so far this lock time is working properly

@Kevinz857 Kevinz857 force-pushed the optimze-node-lock-time branch 2 times, most recently from 5791392 to 40d7b4b Compare June 15, 2025 12:14
@wawa0210
Copy link
Member

/lgtm

@codecov
Copy link

codecov bot commented Jun 15, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/util/nodelock/nodelock.go 33.33% 1 Missing and 1 partial ⚠️
Flag Coverage Δ
unittests 52.95% <33.33%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/util/nodelock/nodelock.go 53.92% <33.33%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add configurable node lock timeout parameter to scheduler config
- Add --node-lock-timeout flag to scheduler command line options
- Change NodeLockTimeout type from int to time.Duration for better type safety
- Update command line flag to use DurationVar instead of IntVar
- Remove default value description from comment as suggested in code review
- Initialize nodelock.NodeLockTimeout from config at startup
- Default timeout set to 5 minutes but now can be customized
- Support flexible time units (e.g., 30s, 2m, 1h) instead of just minutes
- This allows operators to tune lock timeout based on cluster characteristics

This addresses code review feedback from wawa0210 regarding:
1. Using proper time.Duration type for timeout configuration
2. Removing redundant default value comments

Signed-off-by: Kevinz857 <kevinnz@foxmail.com>
@Kevinz857 Kevinz857 force-pushed the optimze-node-lock-time branch from 40d7b4b to 8e45320 Compare June 15, 2025 15:27
@hami-robott hami-robott bot removed the lgtm label Jun 15, 2025
@hami-robott
Copy link
Contributor

hami-robott bot commented Jun 15, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Kevinz857
Once this PR has been reviewed and has the lgtm label, please ask for approval from wawa0210. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robott hami-robott bot added size/M and removed size/S labels Jun 15, 2025
@archlitchi
Copy link
Member

/lgtm

@hami-robott hami-robott bot added the lgtm label Jun 16, 2025
@archlitchi archlitchi merged commit a612dd1 into Project-HAMi:master Jun 16, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants