Retire pm_test from PC suspend 30 cycles and separate IoT suspend cycles (New) by seankingyang · Pull Request #601 · canonical/checkbox

seankingyang · 2023-07-04T05:49:02Z

Description

Retire pm_test.py from PC for suspend 30 cycles
Separate IOT's suspend cycles from one job to multiple jobs
Make the suspend-cycles-stress-test can support both PC and IoT

The detail story will be here: https://warthogs.atlassian.net/browse/CQT-2475

Resolved issues

Which suspend number the DUT is suspend can be easily identified when DUT is hang.

Documentation

The following environment configs will be used in these test cases:
STRESS_S3_ITERATIONS : suspend times in each reboot cycles
STRESS_SUSPEND_REBOOT_ITERATIONS : total reboot cycles

Note: Total suspend number = STRESS_S3_ITERATIONS x STRESS_SUSPEND_REBOOT_ITERATIONS

STRESS_S3_SLEEP_DELAY: sleep dealy
STRESS_S3_WAIT_DELAY: device check delay
STRESS_SUSPEND_SLEEP_THRESHOLD : suspend time threshold
STRESS_SUSPEND_RESUME_THRESHOLD: resume time threshold

Tests

DUT image: Desktop , Checkbox: deb version

Using Jenkins exclued memory stress test (checkbox bug #571): https://certification.canonical.com/hardware/202212-31030/submission/320761/
Using local run: https://certification.canonical.com/hardware/202212-31030/submission/321252/

DUT image: FDE , Checkbox: Slave- snap version, Master- deb version

Using checkbox remote exclued memory stress test: https://certification.canonical.com/hardware/202207-30448/submission/321677/

DUT image: Desktop , Checkbox: snap version

Using local run: https://certification.canonical.com/hardware/202212-31030/submission/321762/

kissiel

This is an awesome piece of work!

There are a few small problems I pointed out below.

In general the numerous ifs here signify lack of decomposition or suboptimal handling responsibility given to parts of the solution.

Let's look at the FWTS ifs.
The way I understand it is that you want to have a different behavior (commands run) when the platform is x86_64 or i386.
In the current state of the branch there is a jinja2 branch which interpolates the command string differently depending on the value provided by the resource.
So this means that the chain of concrete jobs may end up being one of those two:
suspend_(fwts_test) -> reboot -> do_checks
suspend_(using_rtcwake) -> reboot -> do_checks

I understand that you don't want to have two separate groups of all three steps depending on the FWTS (the strategy-level responsibility). But if that's just a detail on how to do a suspend, then the suspend job shouldn't change, and inside of it the decision should be made. In other words, imagine this:

command: suspend.py $"{STRESS_S3_DURATION}"

or

command: suspend.sh $"{STRESS_S3_DURATION}"

And within those the machine could be checked and appropriate method could be chosen. So the flow would be a linear:
suspend->reboot->do_checks

Also, introducing the generic "FTWS is supported if machine is either x86_64 or i386" conjecture is misleading. The check checks for machine type, it has nothing to do with FWTS. As a matter of fact FWTS today is supported for arm64, armhf, riscv64 and others as well.

providers/base/units/stress/s3s4.pxu

codecov · 2023-10-30T02:35:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (120bb96) 34.83% compared to head (8b67b95) 35.73%.
Report is 75 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #601      +/-   ##
==========================================
+ Coverage   34.83%   35.73%   +0.89%     
==========================================
  Files         302      302              
  Lines       34165    34245      +80     
  Branches     5909     5916       +7     
==========================================
+ Hits        11901    12236     +335     
+ Misses      21698    21441     -257     
- Partials      566      568       +2

Flag	Coverage Δ
provider-base	`5.50% <ø> (+2.36%)`	⬆️
provider-certification-client	`57.14% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pieqq · 2023-11-03T06:01:57Z

/canonical/self-hosted-runners/run-workflows d75bbee

pieqq

Nice!

The newly created jobs are not simple to understand at first glance. It's quite a complicated setup to be able to get everything you want (a mix of suspend cycles and reboot cycles). I cannot think of a better way to do, though, so I think we should go ahead with this, especially if we can use it on both desktop and IoT projects!

I tested your changes on a laptop, using Checkbox remote (with your changes sideloaded in /var/tmp/checkbox-providers/).

On the controller (my computer) I created the following launcher:

[launcher]
launcher_version = 1

[environment]
STRESS_S3_ITERATIONS = 2
STRESS_SUSPEND_REBOOT_ITERATIONS = 3

[agent]
normal_user = u

I then launched Checkbox remote using the following command:

checkbox.checkbox-cli control <DUT IP> my-launcher

I selected com.canonical.certification::suspend-cycles-stress-test test plan.

Both $STRESS_S3_ITERATIONS and $STRESS_SUSPEND_REBOOT_ITERATIONS were correctly taken into account, and no problems were reported! ✔️

The only complaint I have is that you used the summary field to provide a long description of what each new test is doing, resulting in rather long lines in the output:

----------------------------[ Running job 19 / 30 ]-----------------------------
[ This is part of automated stress suspend cycles test that will force the system to suspend/resume. ]
ID: com.canonical.certification::stress-tests/suspend_cycles_1_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
(...)

----------------------------[ Running job 20 / 30 ]-----------------------------
[ This is part of automated stress suspend cycles (suspend_cycles_2_reboot1) test that will force the system to suspend/resume. ]
ID: com.canonical.certification::stress-tests/suspend_cycles_2_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
(...)

----------------------------[ Running job 21 / 30 ]-----------------------------
[ This is part of automated stress suspend cycles test that will force the system to reboot (suspend_cycles_reboot1). ]
ID: com.canonical.certification::stress-tests/suspend_cycles_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
Connection lost!
connection closed by peer
Reconnecting ...
Reconnected (took: 114s)
[ This is part of automated stress suspend cycles test that will force the system to reboot (suspend_cycles_reboot1). ]
ID: com.canonical.certification::stress-tests/suspend_cycles_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 26 / 34 ]-----------------------------
[ This is part of automated stress suspend cycles (suspend_cycles_1_reboot2) test that will force the system to suspend/resume. ]
ID: com.canonical.certification::stress-tests/suspend_cycles_1_reboot2
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
(...)

Could you modify the summary for the new jobs to display something more concise? For instance:

This is part of automated stress suspend cycles (suspend_cycles_2_reboot1) test that will force the system to suspend/resume.

could be reworded, for instance,

Suspend and resume device (suspend cycle 2, reboot cycle 1)

Apart from that, I think this can land.

diohe0311 · 2023-11-24T08:56:09Z

/canonical/self-hosted-runners/run-workflows 3766ce1

kissiel

This is a huge piece of work and it requires a lot of coffee to understand.
My recommendations:
The generation part is very complicated, and I won't argue its usefulness, but It would be awesome if all of those mechanics would be put in a separate pxu file with a big explanation at the beginning of the file (bonus points for diagrams/flowcharts).

There are also unnecessary variables being introduced that make the code unfortunately, unnecessarily more complex.

providers/base/units/stress/s3s4.pxu

diohe0311 · 2023-12-01T08:49:03Z

/canonical/self-hosted-runners/run-workflows 6bb969d

pieqq

Whooo, this is great stuff! Very nice documentation added!

I made a few suggestions and typos, and once landed I think this is good to be merged.

Thanks for your hard work, @seankingyang !

providers/base/units/stress/suspend_cycles_reboot.md

providers/base/units/stress/suspend_cycles_reboot.pxu

…nd_cycles_reboot.pxu

pieqq

Just a tiny comment regarding the use of \ to split lines (it's not necessary). Then we can land :)

providers/base/units/stress/suspend_cycles_reboot.md

providers/base/units/stress/suspend_cycles_reboot.pxu

pieqq

This had slipped my attention. Awesome, thanks!

pieqq · 2024-01-05T02:55:29Z

/canonical/self-hosted-runners/run-workflows 8b67b95

The documentation changes requested by Maciej have been made.

…les (New) (canonical#601) * Retire pm_test from PC suspend 30 cycles and saperate IoT suspend cycles * Remove the useless flag preserve-local * Make suspend (fwts and rtcwake) flow more linear * Correct the summary of resources jobs * Modify the summary. * Remove unnecessary variables * Seperate the suspend_cycles_reboot test case to a new file * Add the detail description in md file. * Fix the typo, and add the short description at the beginning of suspend_cycles_reboot.pxu * Break lines of text at 80 characters as possible as I can * Fix some tiny problems

seankingyang force-pushed the Modify-suspend-cycles branch from 66a5285 to bc2aa40 Compare July 4, 2023 06:09

seankingyang requested a review from pieqq July 4, 2023 06:16

seankingyang marked this pull request as draft July 4, 2023 06:22

seankingyang force-pushed the Modify-suspend-cycles branch 2 times, most recently from 7977722 to 6d7bc92 Compare July 7, 2023 03:12

seankingyang marked this pull request as ready for review July 7, 2023 03:32

kissiel suggested changes Aug 8, 2023

View reviewed changes

seankingyang requested a review from kissiel August 14, 2023 09:09

seankingyang force-pushed the Modify-suspend-cycles branch from 50aa17e to 72e2e88 Compare October 30, 2023 02:32

seankingyang added 4 commits October 30, 2023 10:36

Retire pm_test from PC suspend 30 cycles and saperate IoT suspend cycles

619ff15

Remove the useless flag preserve-local

0084033

Make suspend (fwts and rtcwake) flow more linear

74234ec

Correct the summary of resources jobs

d75bbee

seankingyang force-pushed the Modify-suspend-cycles branch from 72e2e88 to d75bbee Compare October 30, 2023 02:36

seankingyang changed the title ~~Retire pm_test from PC suspend 30 cycles and separate IoT suspend cycles~~ Retire pm_test from PC suspend 30 cycles and separate IoT suspend cycles (New) Nov 2, 2023

pieqq requested changes Nov 7, 2023

View reviewed changes

Modify the summary.

3766ce1

seankingyang requested a review from pieqq November 24, 2023 08:45

kissiel previously requested changes Nov 27, 2023

View reviewed changes

providers/base/units/stress/s3s4.pxu Outdated Show resolved Hide resolved

providers/base/units/stress/s3s4.pxu Outdated Show resolved Hide resolved

providers/base/units/stress/s3s4.pxu Outdated Show resolved Hide resolved

seankingyang added 3 commits November 30, 2023 15:46

Remove unnecessary variables

ba2afc7

Seperate the suspend_cycles_reboot test case to a new file

74355bc

Add the detail description in md file.

6bb969d

seankingyang requested a review from kissiel December 1, 2023 08:33

pieqq requested changes Dec 8, 2023

View reviewed changes

Fix the typo, and add the short description at the beginning of suspe…

2bfa734

…nd_cycles_reboot.pxu

seankingyang requested a review from pieqq December 14, 2023 02:52

Break lines of text at 80 characters as possible as I can

a44f8a1

pieqq requested changes Dec 22, 2023

View reviewed changes

providers/base/units/stress/suspend_cycles_reboot.md Outdated Show resolved Hide resolved

providers/base/units/stress/suspend_cycles_reboot.md Outdated Show resolved Hide resolved

providers/base/units/stress/suspend_cycles_reboot.pxu Outdated Show resolved Hide resolved

Fix some tiny problems

8b67b95

seankingyang requested a review from pieqq December 22, 2023 04:57

pieqq approved these changes Jan 5, 2024

View reviewed changes

pieqq merged commit cac5044 into canonical:main Jan 5, 2024

seankingyang mentioned this pull request Jan 5, 2024

Modify plainbox config to fit the new suspend_cycles_with_reboot jobs, and set the SANP_TASK_TIMEOUT canonical/oem-qa-tools#58

Merged

Conversation

seankingyang commented Jul 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Resolved issues

Documentation

Tests

Uh oh!

kissiel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pieqq commented Nov 3, 2023

Uh oh!

pieqq left a comment

Choose a reason for hiding this comment

Uh oh!

diohe0311 commented Nov 24, 2023

Uh oh!

kissiel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diohe0311 commented Dec 1, 2023

Uh oh!

pieqq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pieqq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pieqq left a comment

Choose a reason for hiding this comment

Uh oh!

pieqq commented Jan 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seankingyang commented Jul 4, 2023 •

edited

Loading

kissiel left a comment •

edited

Loading

codecov bot commented Oct 30, 2023 •

edited

Loading