Retire pm_test from PC suspend 30 cycles and separate IoT suspend cycles (New)#601
Conversation
66a5285 to
bc2aa40
Compare
7977722 to
6d7bc92
Compare
There was a problem hiding this comment.
This is an awesome piece of work!
There are a few small problems I pointed out below.
In general the numerous ifs here signify lack of decomposition or suboptimal handling responsibility given to parts of the solution.
Let's look at the FWTS ifs.
The way I understand it is that you want to have a different behavior (commands run) when the platform is x86_64 or i386.
In the current state of the branch there is a jinja2 branch which interpolates the command string differently depending on the value provided by the resource.
So this means that the chain of concrete jobs may end up being one of those two:
suspend_(fwts_test) -> reboot -> do_checks
suspend_(using_rtcwake) -> reboot -> do_checks
I understand that you don't want to have two separate groups of all three steps depending on the FWTS (the strategy-level responsibility). But if that's just a detail on how to do a suspend, then the suspend job shouldn't change, and inside of it the decision should be made. In other words, imagine this:
command: suspend.py $"{STRESS_S3_DURATION}"
or
command: suspend.sh $"{STRESS_S3_DURATION}"
And within those the machine could be checked and appropriate method could be chosen. So the flow would be a linear:
suspend->reboot->do_checks
Also, introducing the generic "FTWS is supported if machine is either x86_64 or i386" conjecture is misleading. The check checks for machine type, it has nothing to do with FWTS. As a matter of fact FWTS today is supported for arm64, armhf, riscv64 and others as well.
50aa17e to
72e2e88
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #601 +/- ##
==========================================
+ Coverage 34.83% 35.73% +0.89%
==========================================
Files 302 302
Lines 34165 34245 +80
Branches 5909 5916 +7
==========================================
+ Hits 11901 12236 +335
+ Misses 21698 21441 -257
- Partials 566 568 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
72e2e88 to
d75bbee
Compare
|
/canonical/self-hosted-runners/run-workflows d75bbee |
pieqq
left a comment
There was a problem hiding this comment.
Nice!
The newly created jobs are not simple to understand at first glance. It's quite a complicated setup to be able to get everything you want (a mix of suspend cycles and reboot cycles). I cannot think of a better way to do, though, so I think we should go ahead with this, especially if we can use it on both desktop and IoT projects!
I tested your changes on a laptop, using Checkbox remote (with your changes sideloaded in /var/tmp/checkbox-providers/).
On the controller (my computer) I created the following launcher:
[launcher]
launcher_version = 1
[environment]
STRESS_S3_ITERATIONS = 2
STRESS_SUSPEND_REBOOT_ITERATIONS = 3
[agent]
normal_user = u
I then launched Checkbox remote using the following command:
checkbox.checkbox-cli control <DUT IP> my-launcher
I selected com.canonical.certification::suspend-cycles-stress-test test plan.
Both $STRESS_S3_ITERATIONS and $STRESS_SUSPEND_REBOOT_ITERATIONS were correctly taken into account, and no problems were reported! ✔️
The only complaint I have is that you used the summary field to provide a long description of what each new test is doing, resulting in rather long lines in the output:
----------------------------[ Running job 19 / 30 ]-----------------------------
[ This is part of automated stress suspend cycles test that will force the system to suspend/resume. ]
ID: com.canonical.certification::stress-tests/suspend_cycles_1_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
(...)
----------------------------[ Running job 20 / 30 ]-----------------------------
[ This is part of automated stress suspend cycles (suspend_cycles_2_reboot1) test that will force the system to suspend/resume. ]
ID: com.canonical.certification::stress-tests/suspend_cycles_2_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
(...)
----------------------------[ Running job 21 / 30 ]-----------------------------
[ This is part of automated stress suspend cycles test that will force the system to reboot (suspend_cycles_reboot1). ]
ID: com.canonical.certification::stress-tests/suspend_cycles_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
Connection lost!
connection closed by peer
Reconnecting ...
Reconnected (took: 114s)
[ This is part of automated stress suspend cycles test that will force the system to reboot (suspend_cycles_reboot1). ]
ID: com.canonical.certification::stress-tests/suspend_cycles_reboot1
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 26 / 34 ]-----------------------------
[ This is part of automated stress suspend cycles (suspend_cycles_1_reboot2) test that will force the system to suspend/resume. ]
ID: com.canonical.certification::stress-tests/suspend_cycles_1_reboot2
Category: Suspend (S3) Stress Test
--------------------------------------------------------------------------------
(...)
Could you modify the summary for the new jobs to display something more concise? For instance:
This is part of automated stress suspend cycles (suspend_cycles_2_reboot1) test that will force the system to suspend/resume.
could be reworded, for instance,
Suspend and resume device (suspend cycle 2, reboot cycle 1)
Apart from that, I think this can land.
|
/canonical/self-hosted-runners/run-workflows 3766ce1 |
kissiel
left a comment
There was a problem hiding this comment.
This is a huge piece of work and it requires a lot of coffee to understand.
My recommendations:
The generation part is very complicated, and I won't argue its usefulness, but It would be awesome if all of those mechanics would be put in a separate pxu file with a big explanation at the beginning of the file (bonus points for diagrams/flowcharts).
There are also unnecessary variables being introduced that make the code unfortunately, unnecessarily more complex.
|
/canonical/self-hosted-runners/run-workflows 6bb969d |
pieqq
left a comment
There was a problem hiding this comment.
Whooo, this is great stuff! Very nice documentation added!
I made a few suggestions and typos, and once landed I think this is good to be merged.
Thanks for your hard work, @seankingyang !
…nd_cycles_reboot.pxu
pieqq
left a comment
There was a problem hiding this comment.
Just a tiny comment regarding the use of \ to split lines (it's not necessary). Then we can land :)
pieqq
left a comment
There was a problem hiding this comment.
This had slipped my attention. Awesome, thanks!
|
/canonical/self-hosted-runners/run-workflows 8b67b95 |
The documentation changes requested by Maciej have been made.
…les (New) (canonical#601) * Retire pm_test from PC suspend 30 cycles and saperate IoT suspend cycles * Remove the useless flag preserve-local * Make suspend (fwts and rtcwake) flow more linear * Correct the summary of resources jobs * Modify the summary. * Remove unnecessary variables * Seperate the suspend_cycles_reboot test case to a new file * Add the detail description in md file. * Fix the typo, and add the short description at the beginning of suspend_cycles_reboot.pxu * Break lines of text at 80 characters as possible as I can * Fix some tiny problems
…les (New) (canonical#601) * Retire pm_test from PC suspend 30 cycles and saperate IoT suspend cycles * Remove the useless flag preserve-local * Make suspend (fwts and rtcwake) flow more linear * Correct the summary of resources jobs * Modify the summary. * Remove unnecessary variables * Seperate the suspend_cycles_reboot test case to a new file * Add the detail description in md file. * Fix the typo, and add the short description at the beginning of suspend_cycles_reboot.pxu * Break lines of text at 80 characters as possible as I can * Fix some tiny problems
…les (New) (canonical#601) * Retire pm_test from PC suspend 30 cycles and saperate IoT suspend cycles * Remove the useless flag preserve-local * Make suspend (fwts and rtcwake) flow more linear * Correct the summary of resources jobs * Modify the summary. * Remove unnecessary variables * Seperate the suspend_cycles_reboot test case to a new file * Add the detail description in md file. * Fix the typo, and add the short description at the beginning of suspend_cycles_reboot.pxu * Break lines of text at 80 characters as possible as I can * Fix some tiny problems
Description
suspend-cycles-stress-testcan support both PC and IoTThe detail story will be here: https://warthogs.atlassian.net/browse/CQT-2475
Resolved issues
Which suspend number the DUT is suspend can be easily identified when DUT is hang.
Documentation
The following environment configs will be used in these test cases:
STRESS_S3_ITERATIONS: suspend times in each reboot cyclesSTRESS_SUSPEND_REBOOT_ITERATIONS: total reboot cyclesSTRESS_S3_SLEEP_DELAY: sleep dealySTRESS_S3_WAIT_DELAY: device check delaySTRESS_SUSPEND_SLEEP_THRESHOLD: suspend time thresholdSTRESS_SUSPEND_RESUME_THRESHOLD: resume time thresholdTests
DUT image: Desktop , Checkbox: deb version
DUT image: FDE , Checkbox: Slave- snap version, Master- deb version
DUT image: Desktop , Checkbox: snap version