problem with exit-when-done

Moderators: Site Moderators, FAHC Science Team

Post Reply
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

problem with exit-when-done

Post by gw666 »

Hi everyone,

I'm trying to do some backfilling on a GPU farm, e.g. starting some GPU load if available and exiting if no work units are available. I am using FAHClient on Ubuntu 18.04. config.xml looks like this:

Code: Select all

<config>
  <!-- Folding Slots -->
  <slot id='0' type='GPU'/>
</config>
The full command line looks like this:
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=0 --smp=false --exit-when-done=true

The program hasn't received any work units and is sitting idle for hours, blocking the GPU on the farm. I would have expected that the --exit-when-done option would make FAHClient actually exit if no WUs are assigned.

From the log:

Code: Select all

20:52:32:WU00:FS00:Connecting to 18.218.241.186:80
20:52:33:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:52:33:ERROR:WU00:FS00:Exception: Could not get an assignment
20:59:23:WU00:FS00:Connecting to 65.254.110.245:8080
20:59:24:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:59:24:WU00:FS00:Connecting to 18.218.241.186:80
20:59:24:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:59:24:ERROR:WU00:FS00:Exception: Could not get an assignment
21:10:29:WU00:FS00:Connecting to 65.254.110.245:8080
21:10:29:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:10:29:WU00:FS00:Connecting to 18.218.241.186:80
21:10:30:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:10:30:ERROR:WU00:FS00:Exception: Could not get an assignment
21:28:25:WU00:FS00:Connecting to 65.254.110.245:8080
21:28:26:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:28:26:WU00:FS00:Connecting to 18.218.241.186:80
21:28:26:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:28:26:ERROR:WU00:FS00:Exception: Could not get an assignment
21:57:28:WU00:FS00:Connecting to 65.254.110.245:8080
21:57:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:57:28:WU00:FS00:Connecting to 18.218.241.186:80
21:57:29:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:57:29:ERROR:WU00:FS00:Exception: Could not get an assignment
22:44:26:WU00:FS00:Connecting to 65.254.110.245:8080
22:44:27:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
22:44:27:WU00:FS00:Connecting to 18.218.241.186:80
22:44:27:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
22:44:27:ERROR:WU00:FS00:Exception: Could not get an assignment
00:00:28:WU00:FS00:Connecting to 65.254.110.245:8080
00:00:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
00:00:28:WU00:FS00:Connecting to 18.218.241.186:80
00:00:29:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
00:00:29:ERROR:WU00:FS00:Exception: Could not get an assignment
02:03:27:WU00:FS00:Connecting to 65.254.110.245:8080
02:03:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
02:03:28:WU00:FS00:Connecting to 18.218.241.186:80
02:03:28:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
02:03:28:ERROR:WU00:FS00:Exception: Could not get an assignment
******************************* Date: 2020-04-15 *******************************
05:22:28:WU00:FS00:Connecting to 65.254.110.245:8080
05:22:28:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:22:28:WU00:FS00:Connecting to 18.218.241.186:80
05:22:29:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:22:29:ERROR:WU00:FS00:Exception: Could not get an assignment

PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: problem with exit-when-done

Post by PantherX »

Welcome to the F@H Forum gw666,

It seems that since you started the client, you haven't been assigned a WU hence, it hasn't exited as it never finished a WU. There's a known issue where the demand for GPU WUs is significantly more than supply for GPU WUs. There's work in the pipeline to resolve this issue :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

Re: problem with exit-when-done

Post by gw666 »

I had the same command on a different node. I has finished a few WU:

Code: Select all

21:35:20:WU01:FS00:Upload 70.73%
21:35:26:WU01:FS00:Upload 81.37%
21:35:30:WU00:FS00:Connecting to 65.254.110.245:8080
21:35:30:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:35:30:WU00:FS00:Connecting to 18.218.241.186:80
21:35:31:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:35:31:ERROR:WU00:FS00:Exception: Could not get an assignment
21:35:32:WU01:FS00:Upload 95.01%
21:35:35:WU01:FS00:Upload complete
21:35:35:WU01:FS00:Server responded WORK_ACK (400)
21:35:35:WU01:FS00:Final credit estimate, 156113.00 points
21:35:35:WU01:FS00:Cleaning up
21:38:07:WU00:FS00:Connecting to 65.254.110.245:8080
21:38:08:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:38:08:WU00:FS00:Connecting to 18.218.241.186:80
21:38:08:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:38:08:ERROR:WU00:FS00:Exception: Could not get an assignment
and is idling since:

Code: Select all

02:53:17:WU00:FS00:Connecting to 65.254.110.245:8080
02:53:18:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
02:53:18:WU00:FS00:Connecting to 18.218.241.186:80
02:53:18:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
02:53:18:ERROR:WU00:FS00:Exception: Could not get an assignment
******************************* Date: 2020-04-15 *******************************
06:12:17:WU00:FS00:Connecting to 65.254.110.245:8080
06:12:18:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
06:12:18:WU00:FS00:Connecting to 18.218.241.186:80
06:12:19:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
06:12:19:ERROR:WU00:FS00:Exception: Could not get an assignment
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: problem with exit-when-done

Post by PantherX »

It should have exited if the correct command was give. It seems that you might be using a wrong command:

Code: Select all

  exit-when-done <boolean=false>
    Exit when all slots are paused.
Try this instead:

Code: Select all

  --finish
      Finish all current work units, send the results, then exit.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

Re: problem with exit-when-done

Post by gw666 »

PantherX wrote:It should have exited if the correct command was give. It seems that you might be using a wrong command:

Code: Select all

  exit-when-done <boolean=false>
    Exit when all slots are paused.
Try this instead:

Code: Select all

  --finish
      Finish all current work units, send the results, then exit.
I think --finish is for exiting an already running instance of Folding, if you start a new instance of Folding with --finish, it will never do anything. That's why I'm suspecting that the --exit-when-done=true option doesn't work as intended. Maybe my idling slot is never paused?
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: problem with exit-when-done

Post by PantherX »

I had another look at the options and think that you have to use this one in addition to exit-when-done set to true:

Code: Select all

  max-units <integer=0>
    Process at most this number of units, then pause.
This this might be how you run it:
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=0 --smp=false --max-units=1 --exit-when-done=true
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

Re: problem with exit-when-done

Post by gw666 »

PantherX wrote:I had another look at the options and think that you have to use this one in addition to exit-when-done set to true:

Code: Select all

  max-units <integer=0>
    Process at most this number of units, then pause.
This this might be how you run it:
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=0 --smp=false --max-units=1 --exit-when-done=true
I'll give that a try. Any idea on how long F@H will try to get that one unit?
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: problem with exit-when-done

Post by PantherX »

The Client will try and if it fails, will back off in an exponential manner. Currently, on attempt 10, it is about 1 hour wait time.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

Re: problem with exit-when-done

Post by gw666 »

PantherX wrote:I had another look at the options and think that you have to use this one in addition to exit-when-done set to true:

Code: Select all

  max-units <integer=0>
    Process at most this number of units, then pause.
This this might be how you run it:
/usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-index=0 --smp=false --max-units=1 --exit-when-done=true
This command is working fine for me, the job takes between 1 and 2:15 hours processing one WU, that makes it perfect for backfilling.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: problem with exit-when-done

Post by PantherX »

Glad to hear that it works as per your expectations! If you can always change the number from 1 to 2 or whatever you think you can successfully fold within that time. Please note that the folding time for WUs varies from Project to Project so you may need to keep an eye on it :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

Re: problem with exit-when-done

Post by gw666 »

I'm not yet happy, there have been several cases where the program was idle for 4 hours without getting a WU. I would've preferred the program to exit in that case.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: problem with exit-when-done

Post by bruce »

Why are you setting --cuda_index=0?

The current folding slots don't really care wht cuda-index is used because cuda, itself, is never used.
gw666
Posts: 14
Joined: Thu Apr 09, 2020 8:53 am

Re: problem with exit-when-done

Post by gw666 »

bruce wrote:Why are you setting --cuda_index=0?

The current folding slots don't really care wht cuda-index is used because cuda, itself, is never used.
Thanks for pointing it out, I've removed the option.
Post Reply