Skip to content

Comments

Fix: get_stats blocks main thread on storage failure#13550

Closed
UranusSeven wants to merge 2 commits intosgl-project:mainfrom
novitalabs:get_stats_blocks_main_thread_on_storage_failure
Closed

Fix: get_stats blocks main thread on storage failure#13550
UranusSeven wants to merge 2 commits intosgl-project:mainfrom
novitalabs:get_stats_blocks_main_thread_on_storage_failure

Conversation

@UranusSeven
Copy link
Contributor

@UranusSeven UranusSeven commented Nov 19, 2025

Hi from novita.ai team 👋

Motivation

When using HiCache with 3fs backend, the main scheduler thread could become blocked when collecting storage metrics via get_stats(). This issue occurs due to lock contention between multiple threads:

Backup thread: Holds the lock during long-running filesystem I/O operations in _batch_set() (decorated with @synchronized())
Main thread: Periodically calls get_stats() (also decorated with @synchronized()) to collect storage metrics for monitoring

When storage is slow or down, the backup thread holds the lock for extended periods, causing the main thread to block when trying to acquire the same lock.

Modifications

Changed get_stats() to use non-blocking lock acquisition: uses self.lock.acquire(blocking=False) instead of the @synchronized() decorator.

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the hicache Hierarchical Caching for SGLang label Nov 19, 2025
Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. cc: @xiezhq-hermann what do you think

@hzh0425
Copy link
Collaborator

hzh0425 commented Nov 19, 2025

This issue can be resolved by this PR. Thank you! @UranusSeven

#13407

@ShangmingCai
Copy link
Collaborator

Feel free to reopen this PR if #13407 is not sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hicache Hierarchical Caching for SGLang

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants