Add suspend stats related check (Bugfix) by hanhsuan · Pull Request #1700 · canonical/checkbox

hanhsuan · 2025-01-24T03:01:01Z

Description

WARNING: This modifies com.canonical.certification::sru-server

According to the suspend_advanced_auto is the dependency for all after suspend jobs, it would be better to do more tests for this concept to fix #1579.
Therefore, this PR is the first step to collect information by:

valid_suspend_status is used to check success should be non zero
any_fail_during_supend is used to check fail should be zero

If everything woks well, valid_suspend_status will be merge into suspend_advanced_auto in the future.

Resolved issues

#1579

Documentation

Tests

succ 202001-27667
fail 202412-36131

1. test case is_suspend_success will be merged into suspend_advanced_auto 2. is_device_supend_success is used to check any fail

codecov · 2025-01-24T03:03:10Z

Codecov Report

Attention: Patch coverage is 92.15686% with 4 lines in your changes missing coverage. Please review.

Project coverage is 49.16%. Comparing base (e5b7c6f) to head (6021828).
Report is 182 commits behind head on main.

Files with missing lines	Patch %	Lines
providers/base/bin/suspend_stats.py	92.15%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   49.01%   49.16%   +0.15%     
==========================================
  Files         372      374       +2     
  Lines       40348    40481     +133     
  Branches     6817     6845      +28     
==========================================
+ Hits        19777    19903     +126     
- Misses      19849    19853       +4     
- Partials      722      725       +3

Flag	Coverage Δ
provider-base	`∅ <92.15%> (∅)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Hook25

Well done, I really like what I see and I wouldn't be against merging (the half that checks we did actually suspend) in suspend advanced auto.

I have 2 big worries:

Is this supported on any kernel we have to support? Even xenial? Can you check? If not, what do we do about it?
Does this work from a strict snap? Can you check?

providers/base/bin/suspend_stats.py

providers/base/units/suspend/suspend.pxu

providers/base/units/suspend/test-plan.pxu

Hook25 · 2025-01-31T16:35:53Z

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

hanhsuan · 2025-02-12T03:36:19Z

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

I've checked it on the 201604-21849

ubuntu@201604-21849:/sys/power$ grep -r "" *
disk:[platform] shutdown reboot suspend 
image_size:6662103040
pm_async:1
pm_freeze_timeout:20000
pm_print_times:0
pm_test:[none] core processors platform devices freezer
pm_trace:0
pm_trace_dev_match:usb
grep: pm_wakeup_irq: No data available
reserved_size:1048576
resume:0:0
state:freeze mem disk
wake_lock:
wake_unlock:
wakeup_count:0

The things might be used to check are

ubuntu@201604-21849:~$ grep "PM\:" 1604_suspend.log 
Feb 11 22:13:20 201604-21849 kernel: PM: Syncing filesystems ... done.
Feb 11 22:13:20 201604-21849 kernel: PM: Preparing system for sleep (mem)
Feb 11 22:13:51 201604-21849 kernel: PM: Suspending system (mem)
Feb 11 22:13:51 201604-21849 kernel: PM: suspend of devices complete after 1155.793 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: late suspend of devices complete after 20.104 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: noirq suspend of devices complete after 17.105 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: Saving platform NVS memory
Feb 11 22:13:51 201604-21849 kernel: PM: Restoring platform NVS memory
Feb 11 22:13:51 201604-21849 kernel: PM: noirq resume of devices complete after 15.017 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: early resume of devices complete after 17.481 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: resume of devices complete after 449.473 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: Finishing wakeup.

and

ubuntu@201604-21849:~$ grep "systemd-sleep" 1604_suspend.log 
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: Selected interface 'mlan0'
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: OK
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: Suspending system...
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: System resumed.
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: Selected interface 'mlan0'
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: OK

Do you think it is enough to check the output of journal -b 0 -g systemd-sleep should not include error for validating the suspend status in 16.04 and 18.04 ? The log file is uploaded to CHECKBOX-1642

hanhsuan · 2025-02-17T00:08:38Z

According to this, I think the failed_* indicates one or more device failed. Therefore the kernel changes the success from 0 to 1.
The device failed log:

Feb 17 07:49:24 ubuntu-202412-36132 kernel: spd5118 2-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Feb 17 07:49:24 ubuntu-202412-36132 kernel: spd5118 2-0050: PM: failed to resume async: error -6

Therefore, I would like to change the method is_after_suspend to

    def is_after_suspend(self) -> bool:
        """
        The system is under after suspend status or not

        :returns: return Ture while system is under after suspend status
        """
        return self.contents["success"] != "0"

and is_any_failed to

    def is_any_failed(self) -> bool:
        """
        Is any failed during suspend

        :returns: return Ture while one failed during suspend
        """
        for c, v in self.contents.items():
            if c.startswith("fail") and v != "0":
                return True
        return False

@Hook25 How do you think ?

1. is_after_suspend --> success != 0 2. is_any_failed --> fail* != 0

hanhsuan · 2025-02-21T00:38:01Z

New check condition test result:

PASS
-- 16.04
FAIL
-- suspend failed
-- device failed

Hook25

+1 this is great, well done

* This is the first step to fix #1579. 1. test case is_suspend_success will be merged into suspend_advanced_auto 2. is_device_supend_success is used to check any fail * fix black error * Change to check fail only * fix black error * Following the comments to fix the scripts and pxu files * Fix black error * Fix script error on python 3.5 * change user to root to make it could read debugfs * Change the check conditions: 1. is_after_suspend --> success != 0 2. is_any_failed --> fail* != 0

hanhsuan added 2 commits January 24, 2025 10:45

This is the first step to fix #1579.

9b1cd14

1. test case is_suspend_success will be merged into suspend_advanced_auto 2. is_device_supend_success is used to check any fail

fix black error

c0c2536

hanhsuan added 2 commits January 24, 2025 13:30

Change to check fail only

10872f3

fix black error

3684954

hanhsuan marked this pull request as ready for review January 24, 2025 05:38

hanhsuan changed the title ~~Add suspend stats releated check (Bugfix)~~ Add suspend stats related check (Bugfix) Jan 24, 2025

pieqq assigned Hook25 Jan 24, 2025

Hook25 requested changes Jan 29, 2025

View reviewed changes

hanhsuan added 3 commits February 16, 2025 16:33

Following the comments to fix the scripts and pxu files

95742b5

Fix black error

6a31743

Fix script error on python 3.5

b347f1d

hanhsuan added 2 commits February 17, 2025 08:09

change user to root to make it could read debugfs

16c1844

Change the check conditions:

6021828

1. is_after_suspend --> success != 0 2. is_any_failed --> fail* != 0

Hook25 approved these changes Mar 20, 2025

View reviewed changes

Hook25 merged commit 8da3c4c into main Mar 20, 2025
18 checks passed

Hook25 deleted the add_suspend_stats_check branch March 20, 2025 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add suspend stats related check (Bugfix)#1700

Add suspend stats related check (Bugfix)#1700
Hook25 merged 9 commits intomainfrom
add_suspend_stats_check

hanhsuan commented Jan 24, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 24, 2025 •

edited

Loading

Uh oh!

Hook25 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hook25 commented Jan 31, 2025

Uh oh!

hanhsuan commented Feb 12, 2025 •

edited by atlassian bot

Loading

Uh oh!

hanhsuan commented Feb 17, 2025 •

edited

Loading

Uh oh!

hanhsuan commented Feb 21, 2025

Uh oh!

Hook25 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanhsuan commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

WARNING: This modifies com.canonical.certification::sru-server

Resolved issues

Documentation

Tests

Uh oh!

codecov bot commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Hook25 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hook25 commented Jan 31, 2025

Uh oh!

hanhsuan commented Feb 12, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhsuan commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhsuan commented Feb 21, 2025

Uh oh!

Hook25 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanhsuan commented Jan 24, 2025 •

edited

Loading

codecov bot commented Jan 24, 2025 •

edited

Loading

Hook25 left a comment •

edited

Loading

hanhsuan commented Feb 12, 2025 •

edited by atlassian bot

Loading

hanhsuan commented Feb 17, 2025 •

edited

Loading