Gracefully notify orchestrator in case of a panic in transcoder#2094
Gracefully notify orchestrator in case of a panic in transcoder#2094leszko merged 6 commits intolivepeer:masterfrom
Conversation
There was a problem hiding this comment.
Looks great! Just a few minor nit comments.
Also, I believe the changes in LocalTranscoder and NvidiaTranscoder should be sufficient, but I wanted to just note that when the node does GPU transcoding, it actually uses the LoadBalancingTranscoder type which also implements the Transcoder interface. Under the hood, the LoadBalancingTranscoder uses NvidiaTranscoder [1]. When LoadBalancingTranscoder's Transcode() method is used, there is actually a nested NvidiaTranscoder.Transcode() call in a goroutine. I believe a panic in NvidiaTranscoder's Transcode call would be caught and returned as an error to LoadBalancingTranscoder and then it would be handled in the ot_rpc.go code the same way that it would be handled if the error was returned directly by NvidiaTranscoder (but feel free to correct me here if there is any disagreement on the expected behavior!).
[1] See this comment for more details #2070 (comment)
Yes, you're right, it'll work exactly as you expected. I just double-checked it. We could consider adding a unit test in |
yondonfu
left a comment
There was a problem hiding this comment.
Looks great! Last thing before merging...
We typically like to prefix commit messages with the package that the commit updates (see this doc - we loosely follow the guidelines described) as its helpful when sifting through commit history later on. Would be great to update the commits accordingly and to also squash the commits updating CHANGELOG_PENDING into a single one [1]. Then we're good to go!
[1] We've typically saved this process for the end of the PR review process and at that time rebasing/modifying commit history is considered fair game (generally we try to avoid doing so in middle of the review process just for the reviewer's convenience although there are exceptions).
Ok, good to know. Then, I think that this PR should be a single commit with the message |
jailuthra
left a comment
There was a problem hiding this comment.
LGTM! 🚢 🥳
When the PR is approved [..], I'll just click "Squash and merge" and update the commit message there.
Yeah 👍 I think it's okay to squash and merge using github UI and edit the commit message there, given the changes in this PR don't touch many things.
For more complicated changes it's preferred to run a local rebase+squash to split independent things in separate commits, and force-push on the PR branch before merging.
I think that this PR should be a single commit with the message
[core+server] Gracefully notify orchestrator in case of panic in transcoder
The message can probably be formatted as core,server: Gracefully notify.. or core+server: Gracefully notify.. to be closer to rest of the commit history. I don't think we've used the square brackets [] before.
What does this pull request do? Explain your changes. (required)
Fix transcoder to send an error to orchestrator before crashing. Before, when the transcoder crashed after
panic(), the orchestrator had to time out before trying another transcoder. The change is related only to the split O+T topology.Specific updates (required)
recover()in to thetranscoderfunctions (both for Nvidia and the standard CPU transcoding)UnrecoverableErrorinot_rpc.goto send the error message to orchestrator and only then to panic and crashHow did you test each of these updates (required)
panic(errors.New("Some error"))insidetranscoder.go#Transcode()functionThe same test without this PR, makes orchestrator wait for the timeout before selecting the second transcoder.
Does this pull request close any open issues?
fix #2088
Checklist:
makeruns successfully./test.shpass