Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 49 additions & 2 deletions smoke/tests/tool/nydusd.go
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,16 @@ func (nydusd *Nydusd) Mount() error {
return err
}

return nydusd.WaitStatus("RUNNING")
err = nydusd.WaitStatus("RUNNING")
if err != nil {
return err
}

if nydusd.EnablePrefetch {
nydusd.waitPrefetch()
}

return nil
}

func (nydusd *Nydusd) MountByAPI(config NydusdConfig) error {
Expand Down Expand Up @@ -361,8 +370,15 @@ func (nydusd *Nydusd) MountByAPI(config NydusdConfig) error {
"application/json",
bytes.NewBuffer(body),
)
if err != nil {
return err
}

return err
if config.EnablePrefetch {
nydusd.waitPrefetch()
}

return nil
}

func (nydusd *Nydusd) Umount() error {
Expand Down Expand Up @@ -696,3 +712,34 @@ func Verify(t *testing.T, ctx Context, expectedFileTree map[string]*File) {
defer nydusd.Umount()
nydusd.Verify(t, expectedFileTree)
}

func (nydusd *Nydusd) waitPrefetch() {
// Wait for prefetch to start and stabilize
deadline := time.Now().Add(10 * time.Second)
var lastReadCount uint64
stableCount := 0

for time.Now().Before(deadline) {
time.Sleep(500 * time.Millisecond)

metrics, err := nydusd.GetBackendMetrics()
if err != nil {
Comment on lines +720 to +726
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors from GetBackendMetrics are silently ignored with continue statements. While this approach ensures the wait loop doesn't fail prematurely, it could mask underlying issues such as the API being unavailable, authentication problems, or other persistent errors. Consider adding logging for these errors or implementing a counter to fail fast if consecutive errors exceed a threshold, helping identify configuration or connectivity issues earlier.

Suggested change
stableCount := 0
for time.Now().Before(deadline) {
time.Sleep(500 * time.Millisecond)
metrics, err := nydusd.GetBackendMetrics()
if err != nil {
stableCount := 0
errCount := 0
for time.Now().Before(deadline) {
time.Sleep(500 * time.Millisecond)
metrics, err := nydusd.GetBackendMetrics()
if err != nil {
errCount++
logrus.WithError(err).Warn("waitPrefetch: failed to get backend metrics")
if errCount >= 5 {
logrus.WithError(err).Error("waitPrefetch: aborting due to repeated backend metrics retrieval failures")
return
}

Copilot uses AI. Check for mistakes.
continue
}

if metrics.ReadCount == 0 {
Comment on lines +729 to +730
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic assumes ReadCount == 0 means prefetch hasn't started yet, but this doesn't distinguish between prefetch not starting and legitimate cases where there's nothing to prefetch (e.g., empty filesystem or all data already cached). This could lead to the full 10-second timeout in scenarios where prefetch completes instantly or isn't needed. Consider checking additional metrics like PrefetchDataAmount from BlobCacheMetrics or adding an early exit condition when it's clear there's no prefetch work to do.

Suggested change
if metrics.ReadCount == 0 {
// If there is no prefetch data configured or needed, return early instead of
// waiting for ReadCount to change.
if metrics.ReadCount == 0 {
// PrefetchDataAmount comes from BlobCacheMetrics; 0 here means there is
// nothing to prefetch (e.g., empty filesystem or everything already cached).
if metrics.PrefetchDataAmount == 0 {
return
}
// There is prefetch work, but it hasn't started generating reads yet.

Copilot uses AI. Check for mistakes.
continue
}

// Check if read count is stable (prefetch completed)
if metrics.ReadCount == lastReadCount {
stableCount++
if stableCount >= 3 {
return
Comment on lines +718 to +738
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout duration of 10 seconds, polling interval of 500 milliseconds, and stability threshold of 3 consecutive identical reads are hardcoded. These values may not be appropriate for all environments (e.g., slower CI systems, larger prefetch operations). Consider making these configurable through environment variables or the NydusdConfig structure to allow tuning for different test scenarios and environments.

Copilot uses AI. Check for mistakes.
}
} else {
stableCount = 0
lastReadCount = metrics.ReadCount
}
}
}
Comment on lines +716 to +745
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The waitPrefetch method silently returns without any indication of success or failure. When the deadline expires without achieving stable read counts, the function returns normally, which could lead to false positives in tests where prefetch actually failed or timed out. Consider adding logging to distinguish between successful completion and timeout scenarios, or returning an error when the timeout is reached. This would make test failures more debuggable and help identify when the timeout duration needs adjustment.

Copilot uses AI. Check for mistakes.
Loading