Often we get random jaccl errors like #1711 and the only way to recover seems to be disconnect and reconnect the Thunderbolt 5 cable or reboot the machines altogether.
Usually when these errors happen, the ibv_devinfo command hangs - so this can be a way we detect this.
We should show in the dashboard a warning and give the user troubleshooting guidance to try reconnecting cables, then try rebooting if that doesn't work.