Fix race condition timeout decorator (bugfix)#1442
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1442 +/- ##
==========================================
- Coverage 45.72% 45.70% -0.02%
==========================================
Files 367 367
Lines 39134 39158 +24
Branches 6618 6622 +4
==========================================
+ Hits 17895 17899 +4
- Misses 20565 20584 +19
- Partials 674 675 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
bf61458 to
8ce0d84
Compare
|
I test the timeout on my side with this fix, the -- issue is gone. The test log is as below: |
|
Some more test : |
pieqq
left a comment
There was a problem hiding this comment.
It took me a while to review this, because I'm not familiar with the multiprocessing module (nor queuing, nor threading... :)).
I did some tests locally, adding a @timeout decorator to a function spawning other long running processes, and it seems to work as expected.
This looks OK to me, with some documentation improvements and tiny typos.
A few things that are related to this PR, but not part of it:
- Usage for this decorator is not documented anywhere. At the very least, there should be a very simple example at the top of the
timeout.pyfile. - Using
@timeoutdecorator (without passing the duration wanted) results in some weird outcome depending when you use it. Maybe there should be a check to make sure a duration is passed (and it's a number) - Nice stuff offered by
checkbox_supportshould be documented in the official doc, otherwise there is very little chance outside users and test authors use them.
Co-authored-by: Pierre Equoy <pierre.equoy@canonical.com>
pieqq
left a comment
There was a problem hiding this comment.
Thanks for the feedback and the fixes!
Description
The timeout decorator seldomly fails to receive either a value or an exception from the subprocess. I am currently unable to reproduce this issue in any way but I've seen it happen on a very slow device running core20. Either way this improves the implementation in several key ways:
emptybefore getting from the queue: this is an anti-pattern and not guaranteed to workQueue.This also contains a driveby fix to the issue described in this PR (removal of the
--before the pid): #1441And a new test to prove that the solution in the decorator works (while using the builtin kill or any signal wont, to the best of my knowledge)
Resolved issues
Fixes: CHECKBOX-1544
Documentation
New comments to explain why some things are done (including why
subprocess.runis used)Tests
New test that proves the kill behaviour works. Also new coverage for all the new lines