Skip to content

Add suspend stats related check (Bugfix)#1700

Merged
Hook25 merged 9 commits intomainfrom
add_suspend_stats_check
Mar 20, 2025
Merged

Add suspend stats related check (Bugfix)#1700
Hook25 merged 9 commits intomainfrom
add_suspend_stats_check

Conversation

@hanhsuan
Copy link
Copy Markdown
Contributor

@hanhsuan hanhsuan commented Jan 24, 2025

Description

WARNING: This modifies com.canonical.certification::sru-server

According to the suspend_advanced_auto is the dependency for all after suspend jobs, it would be better to do more tests for this concept to fix #1579.
Therefore, this PR is the first step to collect information by:

  1. valid_suspend_status is used to check success should be non zero
  2. any_fail_during_supend is used to check fail should be zero

If everything woks well, valid_suspend_status will be merge into suspend_advanced_auto in the future.

Resolved issues

#1579

Documentation

Tests

succ 202001-27667
fail 202412-36131

1. test case is_suspend_success will be merged into
   suspend_advanced_auto
2. is_device_supend_success is used to check any fail
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 92.15686% with 4 lines in your changes missing coverage. Please review.

Project coverage is 49.16%. Comparing base (e5b7c6f) to head (6021828).
Report is 182 commits behind head on main.

Files with missing lines Patch % Lines
providers/base/bin/suspend_stats.py 92.15% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   49.01%   49.16%   +0.15%     
==========================================
  Files         372      374       +2     
  Lines       40348    40481     +133     
  Branches     6817     6845      +28     
==========================================
+ Hits        19777    19903     +126     
- Misses      19849    19853       +4     
- Partials      722      725       +3     
Flag Coverage Δ
provider-base ∅ <92.15%> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hanhsuan hanhsuan marked this pull request as ready for review January 24, 2025 05:38
@hanhsuan hanhsuan changed the title Add suspend stats releated check (Bugfix) Add suspend stats related check (Bugfix) Jan 24, 2025
Copy link
Copy Markdown
Collaborator

@Hook25 Hook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, I really like what I see and I wouldn't be against merging (the half that checks we did actually suspend) in suspend advanced auto.

I have 2 big worries:

  1. Is this supported on any kernel we have to support? Even xenial? Can you check? If not, what do we do about it?
  2. Does this work from a strict snap? Can you check?

@Hook25
Copy link
Copy Markdown
Collaborator

Hook25 commented Jan 31, 2025

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

@hanhsuan
Copy link
Copy Markdown
Contributor Author

hanhsuan commented Feb 12, 2025

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

I've checked it on the 201604-21849

ubuntu@201604-21849:/sys/power$ grep -r "" *
disk:[platform] shutdown reboot suspend 
image_size:6662103040
pm_async:1
pm_freeze_timeout:20000
pm_print_times:0
pm_test:[none] core processors platform devices freezer
pm_trace:0
pm_trace_dev_match:usb
grep: pm_wakeup_irq: No data available
reserved_size:1048576
resume:0:0
state:freeze mem disk
wake_lock:
wake_unlock:
wakeup_count:0

The things might be used to check are

ubuntu@201604-21849:~$ grep "PM\:" 1604_suspend.log 
Feb 11 22:13:20 201604-21849 kernel: PM: Syncing filesystems ... done.
Feb 11 22:13:20 201604-21849 kernel: PM: Preparing system for sleep (mem)
Feb 11 22:13:51 201604-21849 kernel: PM: Suspending system (mem)
Feb 11 22:13:51 201604-21849 kernel: PM: suspend of devices complete after 1155.793 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: late suspend of devices complete after 20.104 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: noirq suspend of devices complete after 17.105 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: Saving platform NVS memory
Feb 11 22:13:51 201604-21849 kernel: PM: Restoring platform NVS memory
Feb 11 22:13:51 201604-21849 kernel: PM: noirq resume of devices complete after 15.017 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: early resume of devices complete after 17.481 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: resume of devices complete after 449.473 msecs
Feb 11 22:13:51 201604-21849 kernel: PM: Finishing wakeup.

and

ubuntu@201604-21849:~$ grep "systemd-sleep" 1604_suspend.log 
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: Selected interface 'mlan0'
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: OK
Feb 11 22:13:20 201604-21849 systemd-sleep[16925]: Suspending system...
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: System resumed.
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: Selected interface 'mlan0'
Feb 11 22:13:51 201604-21849 systemd-sleep[16925]: OK

Do you think it is enough to check the output of journal -b 0 -g systemd-sleep should not include error for validating the suspend status in 16.04 and 18.04 ? The log file is uploaded to CHECKBOX-1642

@hanhsuan
Copy link
Copy Markdown
Contributor Author

hanhsuan commented Feb 17, 2025

According to this, I think the failed_* indicates one or more device failed. Therefore the kernel changes the success from 0 to 1.
The device failed log:

Feb 17 07:49:24 ubuntu-202412-36132 kernel: spd5118 2-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Feb 17 07:49:24 ubuntu-202412-36132 kernel: spd5118 2-0050: PM: failed to resume async: error -6

Therefore, I would like to change the method is_after_suspend to

    def is_after_suspend(self) -> bool:
        """
        The system is under after suspend status or not

        :returns: return Ture while system is under after suspend status
        """
        return self.contents["success"] != "0"

and is_any_failed to

    def is_any_failed(self) -> bool:
        """
        Is any failed during suspend

        :returns: return Ture while one failed during suspend
        """
        for c, v in self.contents.items():
            if c.startswith("fail") and v != "0":
                return True
        return False

@Hook25 How do you think ?

1. is_after_suspend --> success != 0
2. is_any_failed --> fail* != 0
@hanhsuan
Copy link
Copy Markdown
Contributor Author

New check condition test result:

Copy link
Copy Markdown
Collaborator

@Hook25 Hook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this is great, well done

@Hook25 Hook25 merged commit 8da3c4c into main Mar 20, 2025
18 checks passed
@Hook25 Hook25 deleted the add_suspend_stats_check branch March 20, 2025 15:24
stanley31huang pushed a commit that referenced this pull request Mar 28, 2025
* This is the first step to fix #1579.
1. test case is_suspend_success will be merged into
   suspend_advanced_auto
2. is_device_supend_success is used to check any fail

* fix black error

* Change to check fail only

* fix black error

* Following the comments to fix the scripts and pxu files

* Fix black error

* Fix script error on python 3.5

* change user to root to make it could read debugfs

* Change the check conditions:
1. is_after_suspend --> success != 0
2. is_any_failed --> fail* != 0
mreed8855 pushed a commit that referenced this pull request Jul 30, 2025
* This is the first step to fix #1579.
1. test case is_suspend_success will be merged into
   suspend_advanced_auto
2. is_device_supend_success is used to check any fail

* fix black error

* Change to check fail only

* fix black error

* Following the comments to fix the scripts and pxu files

* Fix black error

* Fix script error on python 3.5

* change user to root to make it could read debugfs

* Change the check conditions:
1. is_after_suspend --> success != 0
2. is_any_failed --> fail* != 0
mreed8855 pushed a commit that referenced this pull request Jul 31, 2025
* This is the first step to fix #1579.
1. test case is_suspend_success will be merged into
   suspend_advanced_auto
2. is_device_supend_success is used to check any fail

* fix black error

* Change to check fail only

* fix black error

* Following the comments to fix the scripts and pxu files

* Fix black error

* Fix script error on python 3.5

* change user to root to make it could read debugfs

* Change the check conditions:
1. is_after_suspend --> success != 0
2. is_any_failed --> fail* != 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

suspend/suspend_advanced_auto always returns passed result

2 participants