Folding Forum

gw666

It looks like the --memory option doesn't work for me, I have set it to 4468709120, but the batch system killed my job with this error: Reason: job 60953324.264 exceeds job master hard limit "h_rss" (4791971840.00000 > limit:4509715660.80000) - initiate terminate method; So 4509715660 is t...

gw666

Hello, from the Help of FAHClient you can get: memory <string> Override memory, in bytes, reported to Folding@home. As there are also arguments left from older versions I don't know if this would work but you might give it a try. Regards Patrick I had used that option, see my original message. It d...

gw666

I'm backfilling on a batch system that limits the job's RAM usage. Unfortunately, many folding jobs are terminated because they use too much RAM. Does the --memory option do any good? *********************** Log Started 2020-05-21T13:02:06Z *********************** 13:02:06:**************************...

gw666

bruce wrote:Why are you setting --cuda_index=0?

The current folding slots don't really care wht cuda-index is used because cuda, itself, is never used.

Thanks for pointing it out, I've removed the option.

gw666

I'm not yet happy, there have been several cases where the program was idle for 4 hours without getting a WU. I would've preferred the program to exit in that case.

gw666

I had another look at the options and think that you have to use this one in addition to exit-when-done set to true: max-units <integer=0> Process at most this number of units, then pause. This this might be how you run it: /usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-ind...

gw666

I had another look at the options and think that you have to use this one in addition to exit-when-done set to true: max-units <integer=0> Process at most this number of units, then pause. This this might be how you run it: /usr/bin/FAHClient --user=ANALY_MANC_GPU --team=38188 --gpu=true --cuda-ind...

gw666

It should have exited if the correct command was give. It seems that you might be using a wrong command: exit-when-done <boolean=false> Exit when all slots are paused. Try this instead: --finish Finish all current work units, send the results, then exit. I think --finish is for exiting an already r...

gw666

I had the same command on a different node. I has finished a few WU: 21:35:20:WU01:FS00:Upload 70.73% 21:35:26:WU01:FS00:Upload 81.37% 21:35:30:WU00:FS00:Connecting to 65.254.110.245:8080 21:35:30:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this config...

gw666

Hi everyone, I'm trying to do some backfilling on a GPU farm, e.g. starting some GPU load if available and exiting if no work units are available. I am using FAHClient on Ubuntu 18.04. config.xml looks like this: <config>  <slot id='0' type='GPU'/> </config> The full command li...

gw666

There is no longer a /usr/bin/python on EL8, only /usr/bin/python2 and /usr/bin/python3, so the package deps cannot be resolved. The package must be adapted for EL8 or installed with --nodeps.

gw666

Thank you again for the notes. Bit of progress: jobs are running and completing ok but uploads fail with: 01:39:40:WU00:FS00:Connecting to 155.247.166.219:8080 01:39:40:All slots are done, exiting 01:39:40:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed 01:39:41:...

gw666

This is a bug in FAHClient, it sees 6 GPUs in HW but only one OpenCL device 0. So it should only create one GPU slot. As workaround you need to edit the config.xml file manually and delete the other GPU slots. How do these slots correspondent? In this case, the automatically generated config.xml lo...

gw666

Hi everyone, I'm trying to do some backfilling on a farm machine, just like the friends at CERN are doing. My setup is Scientific Linux 7.7 on x86_64, the machines all have two Xeon CPUs and 6 or 8 Nvidia GPUs of several generations, in this example 6 NVidia Tesla P4. I'm using the latest CUDA 10.2....

Folding Forum

Search found 14 matches

Re: Linux: limit RAM usage

Re: Linux: limit RAM usage

Linux: limit RAM usage

Re: problem with exit-when-done

Re: problem with exit-when-done

Re: problem with exit-when-done

Re: problem with exit-when-done

Re: problem with exit-when-done

Re: problem with exit-when-done

problem with exit-when-done

Re: FAHControl won't install on CentOS 8

Re: High Throughput Resources

Re: Please use only one of the GPUs

Please use only one of the GPUs