Skip to content

Fail snap refresh/revert jobs if the snap_update_test.py script fails (bugfix)#1616

Merged
Hook25 merged 9 commits intomainfrom
fix-1615
Dec 3, 2024
Merged

Fail snap refresh/revert jobs if the snap_update_test.py script fails (bugfix)#1616
Hook25 merged 9 commits intomainfrom
fix-1615

Conversation

@pieqq
Copy link
Copy Markdown
Collaborator

@pieqq pieqq commented Nov 22, 2024

Description

Adding set -eo pipefail in the snapd refresh/revert jobs that initiate
a reboot ensures that:

  • any failure in the snap_update_test.py is not masked by the pipe
    command (this is achieved by set -o pipefail)
  • if an exception occurs while issuing a refresh or revert snap command, the script records the issue using a special Checkbox mechanism (__result file in the session sharing directory) and raise an exception. This will not prevent the job from rebooting, but when resuming, the job will be marked as failed.

Minor: increase timeout to 10 minutes to try to limit the number of issues on slow devices.

Resolved issues

Documentation

Tests

Adding `set -eo pipefail` in the snapd refresh/revert jobs that initiate
a reboot ensures that:

- any failure in the `snap_update_test.py` is not masked by the pipe
command (this is achieved by `set -o pipefail`)
- any failure at any step of the `command:` field just fails the whole
job (this is ensured by `set -e`). This is important to avoid the
`reboot` command from being issued if something goes wrong in the
script.

Fix #1615
Some devices in our test lab are really slow to download and apply snap
updates. Increase timeout from 5 minutes to 10 minutes to hopefully help
with this.
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.

Project coverage is 48.44%. Comparing base (6eba7e4) to head (6968d72).
Report is 101 commits behind head on main.

Files with missing lines Patch % Lines
providers/base/bin/snap_update_test.py 91.30% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1616      +/-   ##
==========================================
+ Coverage   48.29%   48.44%   +0.14%     
==========================================
  Files         372      368       -4     
  Lines       40075    39880     -195     
  Branches     6758     6739      -19     
==========================================
- Hits        19356    19318      -38     
+ Misses      20002    19847     -155     
+ Partials      717      715       -2     
Flag Coverage Δ
provider-base 24.79% <91.30%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Collaborator

@Hook25 Hook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that noreturn will actually terminate when a job fails

@pieqq pieqq marked this pull request as draft November 22, 2024 15:41
@pieqq
Copy link
Copy Markdown
Collaborator Author

pieqq commented Nov 22, 2024

Please confirm that noreturn will actually terminate when a job fails

Unfortunately you're right, a noreturn job gets stuck there.

@Hook25
Copy link
Copy Markdown
Collaborator

Hook25 commented Nov 22, 2024

You can try to use $PAINBOX_SESSION_SHARE/__result, this will make a pointless restart, but it at least gives the right outcome

…mmand

When running a snapd revert or refresh command, a SnapdRequestError
exception might be raised (for example when the refresh command failed
to run within the given time).

If this is the case, the script fails, and if it is executed as part of
a Checkbox run, a `__result` file is generated and stored in the shared
session directory to record the fact that this job failed.

When the session is resumed, Checkbox will automatically resume the
session and mark this test as failed.
@pieqq pieqq marked this pull request as ready for review November 26, 2024 20:18
@pieqq
Copy link
Copy Markdown
Collaborator Author

pieqq commented Nov 26, 2024

I've side-loaded the provider on the same device that exhibited the original problem, and the jobs seem to pass now:

(...)
----------------------------[ Running job 26 / 37 ]-----------------------------
--------------[ Refresh cascade-kernel snap to its base revision ]--------------
ID: com.canonical.certification::snapd/snap-refresh-kernel-cascade-kernel-to-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Refreshing snap cascade-kernel from revision 203 to 155
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Copy snap "cascade-kernel" data
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Wait) Make snap "cascade-kernel" (155) available to the system
Snap operation finished. See `snap change 1578` for more info
Waiting for reboot...
Connection lost!
[Errno 32] Broken pipe
Reconnecting ...
Reconnected (took: 387s)
----------------------------[ Running job 28 / 38 ]-----------------------------
-----[ Attach logs after refreshing cascade-kernel snap to base revision ]------
ID: com.canonical.certification::snapd/log-attach-after-snap-refresh-kernel-cascade-kernel-to-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Refreshing snap cascade-kernel from revision 203 to 155
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Copy snap "cascade-kernel" data
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Doing) Setup snap "cascade-kernel" (155) security profiles
(info) (Doing) Monitoring snap "cascade-kernel" to determine whether extra refresh steps are required
(info) (Wait) Make snap "cascade-kernel" (155) available to the system
Snap operation finished. See `snap change 1578` for more info
Waiting for reboot...
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 29 / 38 ]-----------------------------
---[ Verify cascade-kernel snap revision after refreshing to base revision ]----
ID: com.canonical.certification::snapd/snap-verify-after-refresh-kernel-cascade-kernel-to-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Beginning verify...
cascade-kernel snap refresh complete
Checking refresh status for snap cascade-kernel...
PASS: current revision (155) matches the expected revision
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 30 / 38 ]-----------------------------
-----[ Revert cascade-kernel snap from base revision to original revision ]-----
ID: com.canonical.certification::snapd/snap-revert-kernel-cascade-kernel-from-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Reverting snap cascade-kernel from revision 155 to 203
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Wait) Make snap "cascade-kernel" (203) available to the system
Snap operation finished. See `snap change 1579` for more info
Waiting for reboot...
Connection lost!
[Errno 104] Connection reset by peer
Reconnecting ...
Reconnected (took: 364s)
----------------------------[ Running job 31 / 38 ]-----------------------------
------[ Attach logs after reverting cascade-kernel snap to base revision ]------
ID: com.canonical.certification::snapd/log-attach-after-snap-revert-kernel-cascade-kernel-from-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Reverting snap cascade-kernel from revision 155 to 203
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Doing) Setup snap "cascade-kernel" (203) security profiles
(info) (Wait) Make snap "cascade-kernel" (203) available to the system
Snap operation finished. See `snap change 1579` for more info
Waiting for reboot...
--------------------------------------------------------------------------------
Outcome: job passed
----------------------------[ Running job 32 / 38 ]-----------------------------
---[ Verify cascade-kernel snap revision after reverting from base revision ]---
ID: com.canonical.certification::snapd/snap-verify-after-revert-kernel-cascade-kernel-from-base-rev
Category: Snapd
--------------------------------------------------------------------------------
Beginning verify...
cascade-kernel snap revert complete
Checking revert status for snap cascade-kernel...
PASS: current revision (203) matches the expected revision
--------------------------------------------------------------------------------
Outcome: job passed
(...)

This exception can also be raised when calling snap commands using
checkbox_support.snap_utils.snapd.Snapd.
@pieqq
Copy link
Copy Markdown
Collaborator Author

pieqq commented Nov 26, 2024

I've modified the script to timeout much earlier (a few seconds instead of 10 minutes) just to trigger some exceptions. I noticed another exception needed to be caught in order to apply the trick using __result file, so I added a patch for this.

I ran the modified script on the same device, and when the session is auto-resumed, the kernel refresh job is automatically marked as failed as expected (this is an abstract from the raw session:

      "com.canonical.certification::snapd/snap-refresh-kernel-cascade-kernel-to-base-rev": [
        {
          "comments": "Doing",
          "execution_duration": 493.60218334198,
          "io_log": [],
          "outcome": "fail",
          "return_code": null
        }
      ],

Hook25
Hook25 previously approved these changes Dec 2, 2024
pieqq and others added 3 commits December 2, 2024 15:55
Co-authored-by: Massimiliano <massimiliano.girardi@canonical.com>
@Hook25 Hook25 merged commit 843f2a8 into main Dec 3, 2024
@Hook25 Hook25 deleted the fix-1615 branch December 3, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants