Fail snap refresh/revert jobs if the `snap_update_test.py` script fails (bugfix) by pieqq · Pull Request #1616 · canonical/checkbox

pieqq · 2024-11-22T15:04:31Z

Description

Adding set -eo pipefail in the snapd refresh/revert jobs that initiate
a reboot ensures that:

any failure in the snap_update_test.py is not masked by the pipe
command (this is achieved by set -o pipefail)
if an exception occurs while issuing a refresh or revert snap command, the script records the issue using a special Checkbox mechanism (__result file in the session sharing directory) and raise an exception. This will not prevent the job from rebooting, but when resuming, the job will be marked as failed.

Minor: increase timeout to 10 minutes to try to limit the number of issues on slow devices.

Resolved issues

Documentation

Tests

Adding `set -eo pipefail` in the snapd refresh/revert jobs that initiate a reboot ensures that: - any failure in the `snap_update_test.py` is not masked by the pipe command (this is achieved by `set -o pipefail`) - any failure at any step of the `command:` field just fails the whole job (this is ensured by `set -e`). This is important to avoid the `reboot` command from being issued if something goes wrong in the script. Fix #1615

Some devices in our test lab are really slow to download and apply snap updates. Increase timeout from 5 minutes to 10 minutes to hopefully help with this.

codecov · 2024-11-22T15:07:27Z

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.

Project coverage is 48.44%. Comparing base (6eba7e4) to head (6968d72).
Report is 101 commits behind head on main.

Files with missing lines	Patch %	Lines
providers/base/bin/snap_update_test.py	91.30%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1616      +/-   ##
==========================================
+ Coverage   48.29%   48.44%   +0.14%     
==========================================
  Files         372      368       -4     
  Lines       40075    39880     -195     
  Branches     6758     6739      -19     
==========================================
- Hits        19356    19318      -38     
+ Misses      20002    19847     -155     
+ Partials      717      715       -2

Flag	Coverage Δ
provider-base	`24.79% <91.30%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Hook25

Please confirm that noreturn will actually terminate when a job fails

pieqq · 2024-11-22T15:53:37Z

Please confirm that noreturn will actually terminate when a job fails

Unfortunately you're right, a noreturn job gets stuck there.

Hook25 · 2024-11-22T16:41:51Z

You can try to use $PAINBOX_SESSION_SHARE/__result, this will make a pointless restart, but it at least gives the right outcome

…mmand When running a snapd revert or refresh command, a SnapdRequestError exception might be raised (for example when the refresh command failed to run within the given time). If this is the case, the script fails, and if it is executed as part of a Checkbox run, a `__result` file is generated and stored in the shared session directory to record the fact that this job failed. When the session is resumed, Checkbox will automatically resume the session and mark this test as failed.

pieqq · 2024-11-26T20:20:32Z

I've side-loaded the provider on the same device that exhibited the original problem, and the jobs seem to pass now:

(...)
----------------------------[ Running job 26 / 37 ]-----------------------------
--------------[ Refresh cascade-kernel snap to its base revision ]--------------
ID: com.canonical.certification::snapd/snap-refresh-kernel-cascade-kernel-to-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Refreshing snap cascade-kernel from revision 203 to 155
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Copy snap "cascade-kernel" data
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Wait) Make snap "cascade-kernel" (155) available to the system
Snap operation finished. See `snap change 1578` for more info
Waiting for reboot...
Connection lost!
[Errno 32] Broken pipe
Reconnecting ...
Reconnected (took: 387s)
----------------------------[ Running job 28 / 38 ]-----------------------------
-----[ Attach logs after refreshing cascade-kernel snap to base revision ]------
ID: com.canonical.certification::snapd/log-attach-after-snap-refresh-kernel-cascade-kernel-to-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Refreshing snap cascade-kernel from revision 203 to 155
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Copy snap "cascade-kernel" data
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Wait) Make snap "cascade-kernel" (155) available to the system
Snap operation finished. See `snap change 1578` for more info
Waiting for reboot...
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 29 / 38 ]-----------------------------
---[ Verify cascade-kernel snap revision after refreshing to base revision ]----
ID: com.canonical.certification::snapd/snap-verify-after-refresh-kernel-cascade-kernel-to-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Beginning verify...
cascade-kernel snap refresh complete
Checking refresh status for snap cascade-kernel...
PASS: current revision (155) matches the expected revision
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 30 / 38 ]-----------------------------
-----[ Revert cascade-kernel snap from base revision to original revision ]-----
ID: com.canonical.certification::snapd/snap-revert-kernel-cascade-kernel-from-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Reverting snap cascade-kernel from revision 155 to 203
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Wait) Make snap "cascade-kernel" (203) available to the system
Snap operation finished. See `snap change 1579` for more info
Waiting for reboot...
Connection lost!
[Errno 104] Connection reset by peer
Reconnecting ...
Reconnected (took: 364s)
----------------------------[ Running job 31 / 38 ]-----------------------------
------[ Attach logs after reverting cascade-kernel snap to base revision ]------
ID: com.canonical.certification::snapd/log-attach-after-snap-revert-kernel-cascade-kernel-from-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Reverting snap cascade-kernel from revision 155 to 203
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Wait) Make snap "cascade-kernel" (203) available to the system
Snap operation finished. See `snap change 1579` for more info
Waiting for reboot...
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 32 / 38 ]-----------------------------
---[ Verify cascade-kernel snap revision after reverting from base revision ]---
ID: com.canonical.certification::snapd/snap-verify-after-revert-kernel-cascade-kernel-from-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Beginning verify...
cascade-kernel snap revert complete
Checking revert status for snap cascade-kernel...
PASS: current revision (203) matches the expected revision
--------------------------------------------------------------------------------
Outcome: job passed
(...)

This exception can also be raised when calling snap commands using checkbox_support.snap_utils.snapd.Snapd.

pieqq · 2024-11-26T22:21:39Z

I've modified the script to timeout much earlier (a few seconds instead of 10 minutes) just to trigger some exceptions. I noticed another exception needed to be caught in order to apply the trick using __result file, so I added a patch for this.

I ran the modified script on the same device, and when the session is auto-resumed, the kernel refresh job is automatically marked as failed as expected (this is an abstract from the raw session:

      "com.canonical.certification::snapd/snap-refresh-kernel-cascade-kernel-to-base-rev": [
        {
          "comments": "Doing",
          "execution_duration": 493.60218334198,
          "io_log": [],
          "outcome": "fail",
          "return_code": null
        }
      ],

providers/base/bin/snap_update_test.py

Co-authored-by: Massimiliano <massimiliano.girardi@canonical.com>

pieqq added 2 commits November 22, 2024 16:01

Increase default timeout of the snap_update_test.py script

d3fab50

Some devices in our test lab are really slow to download and apply snap updates. Increase timeout from 5 minutes to 10 minutes to hopefully help with this.

Hook25 requested changes Nov 22, 2024

View reviewed changes

pieqq marked this pull request as draft November 22, 2024 15:41

pieqq added 3 commits November 26, 2024 15:05

Revert the use of set -e

1b3ea0d

Update unit tests for snap_update_test

94b1bb5

pieqq marked this pull request as ready for review November 26, 2024 20:18

Also fail job in case of an AsynException

9cc457f

This exception can also be raised when calling snap commands using checkbox_support.snap_utils.snapd.Snapd.

fernando79513 assigned Hook25 Nov 28, 2024

Hook25 approved these changes Dec 2, 2024

View reviewed changes

Hook25 previously approved these changes Dec 2, 2024

View reviewed changes

providers/base/bin/snap_update_test.py Outdated Show resolved Hide resolved

pieqq dismissed Hook25’s stale review via a06cefb December 2, 2024 14:55

pieqq and others added 3 commits December 2, 2024 15:55

Update providers/base/bin/snap_update_test.py

a06cefb

Co-authored-by: Massimiliano <massimiliano.girardi@canonical.com>

black is not happy

9683cb4

Fix flake8

6968d72

Hook25 approved these changes Dec 3, 2024

View reviewed changes

Hook25 merged commit 843f2a8 into main Dec 3, 2024

Hook25 deleted the fix-1615 branch December 3, 2024 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail snap refresh/revert jobs if the `snap_update_test.py` script fails (bugfix)#1616

Fail snap refresh/revert jobs if the `snap_update_test.py` script fails (bugfix)#1616
Hook25 merged 9 commits intomainfrom
fix-1615

pieqq commented Nov 22, 2024 •

edited

Loading

Uh oh!

codecov bot commented Nov 22, 2024 •

edited

Loading

Uh oh!

Hook25 left a comment

Uh oh!

pieqq commented Nov 22, 2024

Uh oh!

Hook25 commented Nov 22, 2024

Uh oh!

pieqq commented Nov 26, 2024

Uh oh!

pieqq commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pieqq commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Resolved issues

Documentation

Tests

Uh oh!

codecov bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Hook25 left a comment

Choose a reason for hiding this comment

Uh oh!

pieqq commented Nov 22, 2024

Uh oh!

Hook25 commented Nov 22, 2024

Uh oh!

pieqq commented Nov 26, 2024

Uh oh!

pieqq commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pieqq commented Nov 22, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading