Skip to content

Fixes a timeout on properties access (bugfix)#1887

Merged
Hook25 merged 1 commit intomainfrom
notimeout
Apr 24, 2025
Merged

Fixes a timeout on properties access (bugfix)#1887
Hook25 merged 1 commit intomainfrom
notimeout

Conversation

@Hook25
Copy link
Copy Markdown
Collaborator

@Hook25 Hook25 commented Apr 24, 2025

This disables the attribute access timeout from client to server and bumps the server to client one. This is to avoid that slow properties (like system infomation) make the connection timeout and the run void

Minor: this also fixes a very old bug where print_tb was used instead of print_exception

Description

On very slow devices, the tar exporter fails because the access to system_information triggers the (very generous imo) timeout of 2m. Regardless, the timeout is pointless, Checkbox should wait for a result on property access regardless of how much time it takes, as the controller has no way to recover either way if the agent is stuck on a task and failing would just make the problem worse.

This fixes both of the issues in this log:

  • This makes the timeout infinite from controller to agent (for the above described reason, controller has no way to recover either way)
  • This also fixes that dumb double exception at the bottom, the wrong traceback api was used to print the exception :(

Log reported by @LiaoU3

80.0kB [00:00, 1.76MB/s, file=/home/ubuntu/.local/share/checkbox-ng/submission_2025-04-24T11.38.56.460490.junit.xml]
[junit_file]: file:///home/ubuntu/.local/share/checkbox-ng/submission_2025-04-24T11.38.56.460490.junit.xml (URL added to: /home/ubuntu/.local/share/checkbox-ng/submission_2025-04-24T11.38.56.460490.submission.log)
Problem with a '2_tar_file' report using 'com.canonical.plainbox::tar' exporter sent to '/home/ubuntu/.local/share/checkbox-ng/submission_2025-04-24T11.38.56.460490.tar.xz' transport. Reason result expired
Traceback (most recent call last):
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/stages.py", line 708, in _export_results
    result = self._export_fn(exporter_id, transport)
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/controller.py", line 834, in local_export
    rf = self.sa.cache_report(exporter_id, options)
  File "/snap/checkbox22/current/lib/python3.10/site-packages/plainbox/vendor/rpyc/core/netref.py", line 240, in __call__
    return syncreq(_self, consts.HANDLE_CALL, args, kwargs)
  File "/snap/checkbox22/current/lib/python3.10/site-packages/plainbox/vendor/rpyc/core/netref.py", line 63, in syncreq
    return conn.sync_request(handler, proxy, *args)
  File "/snap/checkbox22/current/lib/python3.10/site-packages/plainbox/vendor/rpyc/core/protocol.py", line 718, in sync_request
    return _async_res.value
  File "/snap/checkbox22/current/lib/python3.10/site-packages/plainbox/vendor/rpyc/core/async_.py", line 106, in value
    self.wait()
  File "/snap/checkbox22/current/lib/python3.10/site-packages/plainbox/vendor/rpyc/core/async_.py", line 55, in wait
    raise AsyncResultTimeout("result expired")
TimeoutError: result expired

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/snap/checkbox22/current/bin/checkbox-cli", line 8, in <module>
    sys.exit(main())
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/checkbox_cli.py", line 182, in main
    return subcmd.invoked(ctx)
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/controller.py", line 178, in invoked
    return self.connect_and_run(ctx.args.host, port)
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/controller.py", line 306, in connect_and_run
    keep_running = {
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/controller.py", line 781, in run_jobs
    self.finish_session()
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/controller.py", line 707, in finish_session
    self._export_results()
  File "/snap/checkbox22/current/lib/python3.10/site-packages/checkbox_ng/launcher/stages.py", line 774, in _export_results
    traceback.print_tb(exc)
  File "/snap/checkbox22/current/usr/lib/python3.10/traceback.py", line 53, in print_tb
    print_list(extract_tb(tb, limit=limit), file=file)
  File "/snap/checkbox22/current/usr/lib/python3.10/traceback.py", line 72, in extract_tb
    return StackSummary.extract(walk_tb(tb), limit=limit)
  File "/snap/checkbox22/current/usr/lib/python3.10/traceback.py", line 364, in extract
    for f, lineno in frame_gen:
  File "/snap/checkbox22/current/usr/lib/python3.10/traceback.py", line 329, in walk_tb
    yield tb.tb_frame, tb.tb_lineno
AttributeError: 'TimeoutError' object has no attribute 'tb_frame'

Resolved issues

Fixes: https://warthogs.atlassian.net/browse/CHECKBOX-1877

Documentation

Documented the rationale above it.

Tests

Tested this interactively by using this launcher:

        #!/usr/bin/env checkbox-cli
        [launcher]
        launcher_version = 1
        stock_reports = submission_json
        [test plan]
        # filtering to avoid the test being out of bound
        forced = yes
        unit = 2021.com.canonical.certification::pass-and-fail
        [ui]
        type=silent
        [manifest]
        2021.com.canonical.certification::manifest_location = 0

First, I set the controller to agent timeout to 1 and added a breakpoint to the system_information property. When this is done the same exception as above is triggered after 1 second (at the end of the run when the agent tries to create the submission files).

I've modified the traceback call to the correct one and the following was printed: (Not crashing the controller this time!)

Problem with a '2_json_file' report using 'com.canonical.plainbox::json' exporter sent to '/home/h25/.local/share/checkbox-ng/submission_2025-04-24T12.31.32.274737.json' transport. Reason: TimeoutError('result expired')
Traceback (most recent call last):
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/checkbox_ng/launcher/stages.py", line 708, in _export_results
    result = self._export_fn(exporter_id, transport)
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/checkbox_ng/launcher/controller.py", line 836, in local_export
    rf = self.sa.cache_report(exporter_id, options)
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/plainbox/vendor/rpyc/core/netref.py", line 240, in __call__
    return syncreq(_self, consts.HANDLE_CALL, args, kwargs)
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/plainbox/vendor/rpyc/core/netref.py", line 63, in syncreq
    return conn.sync_request(handler, proxy, *args)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/plainbox/vendor/rpyc/core/protocol.py", line 718, in sync_request
    return _async_res.value
           ^^^^^^^^^^^^^^^^
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/plainbox/vendor/rpyc/core/async_.py", line 106, in value
    self.wait()
    ~~~~~~~~~^^
  File "/home/h25/prj/canonical/checkbox/checkbox-ng/plainbox/vendor/rpyc/core/async_.py", line 55, in wait
    raise AsyncResultTimeout("result expired")
TimeoutError: result expired

Good, now that we have the correct exception and not some broken mess, I changed the timeout to -1, doing so the controller is blocked forever on the breakpoint (tested 15m which is less than forever but more than 120s)

This disables the attribute access timeout from client to server and
bumps the server to client one. This is to avoid that slow properties
(like system infomation) make the connection timeout and the run void

Minor: this also fixes a very old bug where print_tb was used instead of
print_exception
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 50.44%. Comparing base (d600c6b) to head (763463f).
Report is 101 commits behind head on main.

Files with missing lines Patch % Lines
checkbox-ng/checkbox_ng/launcher/controller.py 0.00% 1 Missing ⚠️
checkbox-ng/checkbox_ng/launcher/stages.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1887   +/-   ##
=======================================
  Coverage   50.44%   50.44%           
=======================================
  Files         382      382           
  Lines       41026    41026           
  Branches     6890     6890           
=======================================
  Hits        20696    20696           
  Misses      19585    19585           
  Partials      745      745           
Flag Coverage Δ
checkbox-ng 70.20% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@fernando79513 fernando79513 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1!
I didn't liked the infinite timeout in the beginning, but as you said, there is not a better way to handle the recovery in this case.

@Hook25 Hook25 merged commit b7c6055 into main Apr 24, 2025
26 of 28 checks passed
@Hook25 Hook25 deleted the notimeout branch April 24, 2025 16:00
Meulengracht pushed a commit to Meulengracht/checkbox that referenced this pull request May 19, 2025
Fixes a timeout on properties access

This disables the attribute access timeout from client to server and
bumps the server to client one. This is to avoid that slow properties
(like system infomation) make the connection timeout and the run void

Minor: this also fixes a very old bug where print_tb was used instead of
print_exception
mreed8855 pushed a commit that referenced this pull request Jul 30, 2025
Fixes a timeout on properties access

This disables the attribute access timeout from client to server and
bumps the server to client one. This is to avoid that slow properties
(like system infomation) make the connection timeout and the run void

Minor: this also fixes a very old bug where print_tb was used instead of
print_exception
mreed8855 pushed a commit that referenced this pull request Jul 31, 2025
Fixes a timeout on properties access

This disables the attribute access timeout from client to server and
bumps the server to client one. This is to avoid that slow properties
(like system infomation) make the connection timeout and the run void

Minor: this also fixes a very old bug where print_tb was used instead of
print_exception
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants