Add error details for failed remote job#370
Conversation
lneuhaus
left a comment
There was a problem hiding this comment.
Nice, thanks a lot for having implemented this so quickly. I did not see any issues in the code, though I am not extremely familiar with this part of the codebase.
Have you been able to run an end-to-end test of this?
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
=======================================
Coverage 97.69% 97.70%
=======================================
Files 52 52
Lines 6478 6485 +7
=======================================
+ Hits 6329 6336 +7
Misses 149 149
Continue to review full report at Codecov.
|
strawberryfields/engine.py
Outdated
| message = ( | ||
| "The remote job %s failed due to an internal " | ||
| "server error. Please try again." % job.id | ||
| "server error. Please try again. %s" % (job.id, job.meta) |
There was a problem hiding this comment.
Any reason not to use new-style formatting here?
There was a problem hiding this comment.
Oh, I think this is an artifact of Python's logger expecting printf string formatting. At one point, this line must have been inside the self.log.error() function.
Now that it's been refactored out, it's best to update this to use new-style formatting.
strawberryfields/engine.py
Outdated
| message = ( | ||
| "The remote job %s failed due to an internal " | ||
| "server error. Please try again." % job.id | ||
| "server error. Please try again. %s" % (job.id, job.meta) |
There was a problem hiding this comment.
Oh, I think this is an artifact of Python's logger expecting printf string formatting. At one point, this line must have been inside the self.log.error() function.
Now that it's been refactored out, it's best to update this to use new-style formatting.
| self._status = JobStatus(self._connection.get_job_status(self.id)) | ||
| job_info = self._connection.get_job(self.id) | ||
| self._status = JobStatus(job_info.status) | ||
| self._meta = job_info.meta |
There was a problem hiding this comment.
Would it be a good idea to log the retrieved metadata at the DEBUG level here? Combined with #360, this will allow retrieved job metadata to appear within the users logs (if they decide to configure it).
There was a problem hiding this comment.
Good idea, this has been added! Let me know if you think the message formatting needs to be changed.
|
Just FYI, @lneuhaus and I coordinated an end-to-end test for a deliberately failing job and didn't find any problems. |
| return | ||
| self._status = JobStatus(self._connection.get_job_status(self.id)) | ||
| job_info = self._connection.get_job(self.id) | ||
| self._status = JobStatus(job_info.status) |
There was a problem hiding this comment.
Logically for me, the meta data belongs to the status of the Job. At least this attribute will change the similarly to the change in status and as far as I understand it, is meant to provide more details about the status of the Job.
However, making this change in abstraction might not be so easy (would involve restructuring JobStatus, etc.), so I'm happy as is now.
There was a problem hiding this comment.
Yes, I agree with the general idea, and also that it would be a relatively big change, so it seems best to tackle that in the future 🙂
| "The remote job {} failed due to an internal " | ||
| "server error. Please try again. {}".format(job.id, job.meta) |
There was a problem hiding this comment.
These were previously made so such that they can be used for logging, but here the message is already pre-defined, so I guess format should work just fine as well?
tests/api/test_remote_engine.py
Outdated
| if self.request_count >= self.REQUESTS_BEFORE_COMPLETED | ||
| else JobStatus.QUEUED | ||
| ) | ||
| return Job(id_="123", status=status, connection=None) |
There was a problem hiding this comment.
A small dummy meta dictionary could be added here, and then later checked similarly to the status.
There was a problem hiding this comment.
Good idea, this has been added!
|
Thanks so much for this @pauljxtan, looks really good! 💯 Had a couple of suggestions which might be improvements to add, but I'm also happy with the current state. |
Co-Authored-By: antalszava <antalszava@gmail.com>
Co-Authored-By: Theodor <theodor@xanadu.ai>
Context:
Currently, a failed remote job produces a generic error message that is difficult to diagnose. The job info response from the remote platform contains a
metafield that is currently unused, but has some information that may be useful, e.g.,{ 'error_code': 'hardware-message-error', 'error_detail': 'The job failed because the message received from the hardware could not be understood.' }Description of the Change:
metaproperty to theJobclass. In the event of job failure, this dictionary will contain anerror-codeanderror-detail.metaattribute (along withstatus) whenJob.refresh()is called.metato the error message for a failed job.Example:
"The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again.""The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again. {'error_code': 'hardware-message-error', 'error_detail': 'The job failed because the message received from the hardware could not be understood.'}"Benefits:
The added context in a copy-pastable error message is likely to make it easier to diagnose and debug job failures.
Possible Drawbacks:
The new error message is slightly verbose.
Related GitHub Issues:
N/A