problem with exit-when-done
Posted: Wed Apr 15, 2020 7:35 am
Hi everyone,
I'm trying to do some backfilling on a GPU farm, e.g. starting some GPU load if available and exiting if no work units are available. I am using FAHClient on Ubuntu 18.04. config.xml looks like this:
The full command line looks like this:
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=0 --smp=false --exit-when-done=true
The program hasn't received any work units and is sitting idle for hours, blocking the GPU on the farm. I would have expected that the --exit-when-done option would make FAHClient actually exit if no WUs are assigned.
From the log:
I'm trying to do some backfilling on a GPU farm, e.g. starting some GPU load if available and exiting if no work units are available. I am using FAHClient on Ubuntu 18.04. config.xml looks like this:
Code: Select all
<config>
<!-- Folding Slots -->
<slot id='0' type='GPU'/>
</config>
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=0 --smp=false --exit-when-done=true
The program hasn't received any work units and is sitting idle for hours, blocking the GPU on the farm. I would have expected that the --exit-when-done option would make FAHClient actually exit if no WUs are assigned.
From the log:
Code: Select all
20:52:32:WU00:FS00:Connecting to 18.218.241.186:80
20:52:33:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:52:33:ERROR:WU00:FS00:Exception: Could not get an assignment
20:59:23:WU00:FS00:Connecting to 65.254.110.245:8080
20:59:24:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:59:24:WU00:FS00:Connecting to 18.218.241.186:80
20:59:24:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:59:24:ERROR:WU00:FS00:Exception: Could not get an assignment
21:10:29:WU00:FS00:Connecting to 65.254.110.245:8080
21:10:29:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:10:29:WU00:FS00:Connecting to 18.218.241.186:80
21:10:30:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:10:30:ERROR:WU00:FS00:Exception: Could not get an assignment
21:28:25:WU00:FS00:Connecting to 65.254.110.245:8080
21:28:26:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:28:26:WU00:FS00:Connecting to 18.218.241.186:80
21:28:26:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:28:26:ERROR:WU00:FS00:Exception: Could not get an assignment
21:57:28:WU00:FS00:Connecting to 65.254.110.245:8080
21:57:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:57:28:WU00:FS00:Connecting to 18.218.241.186:80
21:57:29:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:57:29:ERROR:WU00:FS00:Exception: Could not get an assignment
22:44:26:WU00:FS00:Connecting to 65.254.110.245:8080
22:44:27:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
22:44:27:WU00:FS00:Connecting to 18.218.241.186:80
22:44:27:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
22:44:27:ERROR:WU00:FS00:Exception: Could not get an assignment
00:00:28:WU00:FS00:Connecting to 65.254.110.245:8080
00:00:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
00:00:28:WU00:FS00:Connecting to 18.218.241.186:80
00:00:29:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
00:00:29:ERROR:WU00:FS00:Exception: Could not get an assignment
02:03:27:WU00:FS00:Connecting to 65.254.110.245:8080
02:03:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
02:03:28:WU00:FS00:Connecting to 18.218.241.186:80
02:03:28:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
02:03:28:ERROR:WU00:FS00:Exception: Could not get an assignment
******************************* Date: 2020-04-15 *******************************
05:22:28:WU00:FS00:Connecting to 65.254.110.245:8080
05:22:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:22:28:WU00:FS00:Connecting to 18.218.241.186:80
05:22:29:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:22:29:ERROR:WU00:FS00:Exception: Could not get an assignment