The logs showed:
Code: Select all
17:54:18:WU00:FS00:Requesting new work unit for slot 00: gpu:10:0 TU104 [GeForce RTX 2070 SUPER] 8218 from 54.157.202.86
17:54:18:WU00:FS00:Connecting to 54.157.202.86:8080
17:54:48:ERROR:WU00:FS00:Exception: Not connected
Changing Client Preference from "COVID" to "Any" resulted in not being assigned to mskcc1and getting work.
So it looks like there might be an issue with mskcc1 where it has jobs to give out but can't assign them and the client spins away trying and eventuall drops to a failed state.
OS: Ubuntu 18.04.3 LTS; NVidia Driver: 460.91.03; Client: 7.6.21