Page 2 of 3

Re: Why?

Posted: Tue Mar 18, 2014 1:33 pm
by 7im
StreetSam wrote:Hi All,

Silent Folder calling.

Probably been asked before, but why do the Pande group allow the ASs to run out of WUs.
I have two 7970s sitting waiting to crunch numbers for them. Although the 660TIs still seem to get WUs.
One, the AS never runs out of WUs as it never has any. Only work servers have WUs. ;)

And the problem is usually just the opposite. The work server fills up with completed work, and it takes more time to shift those terabytes of data than it does adding new work units.

And since that rarely happens with any planned regularity or at a convenient time of day, it has to sit over night until the next day when humans come back to work.

And normally there are other projects and work servers to take up the slack, like always happens with SMP work now days. But GPU folding is in a transition from older cores and old projects to newer cores and new projects, so there will be some occassional gaps.

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 1:36 pm
by Nicolas_orleans
Same here, empty work server assignment, Linux rig, client-type advanced, client 7.3.6

Code: Select all

*********************** Log Started 2014-03-18T13:32:10Z ***********************
13:32:10:************************* Folding@home Client *************************
13:32:10:    Website: http://folding.stanford.edu/
13:32:10:  Copyright: (c) 2009-2013 Stanford University
13:32:10:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:32:10:       Args: --child --lifeline 959 /etc/fahclient/config.xml --run-as fahclient
13:32:10:             --pid-file=/var/run/fahclient.pid --daemon
13:32:10:     Config: /etc/fahclient/config.xml
13:32:10:******************************** Build ********************************
13:32:10:    Version: 7.3.6
13:32:10:       Date: Feb 18 2013
13:32:10:       Time: 07:24:08
13:32:10:    SVN Rev: 3923
13:32:10:     Branch: fah/trunk/client
13:32:10:   Compiler: GNU 4.4.7
13:32:10:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
13:32:10:             -fno-unsafe-math-optimizations -msse2
13:32:10:   Platform: linux2 3.2.0-1-amd64
13:32:10:       Bits: 64
13:32:10:       Mode: Release
13:32:10:******************************* System ********************************
13:32:10:        CPU: Intel(R) Celeron(R) CPU G1610 @ 2.60GHz
13:32:10:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
13:32:10:       CPUs: 2
13:32:10:     Memory: 3.81GiB
13:32:10:Free Memory: 3.38GiB
13:32:10:    Threads: POSIX_THREADS
13:32:10:Has Battery: false
13:32:10: On Battery: false
13:32:10: UTC offset: 1
13:32:10:        PID: 1030
13:32:10:        CWD: /var/lib/fahclient
13:32:10:         OS: Linux 3.8.0-27-generic x86_64
13:32:10:    OS Arch: AMD64
13:32:10:       GPUs: 2
13:32:10:      GPU 0: NVIDIA:3 GK104 [GeForce GTX 770]
13:32:10:      GPU 1: NVIDIA:3 GK104 [GeForce GTX 770]
13:32:10:       CUDA: 3.0
13:32:10:CUDA Driver: 5050
13:32:10:***********************************************************************
13:32:10:<config>
13:32:10:  <!-- Client Control -->
13:32:10:  <fold-anon v='true'/>
13:32:10:
13:32:10:  <!-- Folding Slot Configuration -->
13:32:10:  <power v='full'/>
13:32:10:
13:32:10:  <!-- HTTP Server -->
13:32:10:  <allow v='127.0.0.1,192.168.1.32'/>
13:32:10:
13:32:10:  <!-- Network -->
13:32:10:  <proxy v=':8080'/>
13:32:10:
13:32:10:  <!-- Remote Command Server -->
13:32:10:  <password v='******'/>
13:32:10:
13:32:10:  <!-- User Information -->
13:32:10:  <passkey v='********************************'/>
13:32:10:  <team v='33'/>
13:32:10:  <user v='Nicolas_orleans'/>
13:32:10:
13:32:10:  <!-- Folding Slots -->
13:32:10:  <slot id='1' type='GPU'>
13:32:10:    <client-type v='advanced'/>
13:32:10:    <gpu-index v='0'/>
13:32:10:    <next-unit-percentage v='100'/>
13:32:10:  </slot>
13:32:10:  <slot id='2' type='GPU'>
13:32:10:    <client-type v='advanced'/>
13:32:10:    <gpu-index v='1'/>
13:32:10:    <next-unit-percentage v='100'/>
13:32:10:  </slot>
13:32:10:</config>

Code: Select all

13:33:11:WU01:FS02:Connecting to assign-GPU.stanford.edu:80
13:33:12:WU00:FS01:News: Welcome to Folding@Home
13:33:12:WARNING:WU00:FS01:Failed to get assignment from 'assign-GPU.stanford.edu:80': Empty work server assignment
13:33:12:WU00:FS01:Connecting to assign-GPU.stanford.edu:8080
13:33:12:WU01:FS02:News: Welcome to Folding@Home
13:33:12:WARNING:WU01:FS02:Failed to get assignment from 'assign-GPU.stanford.edu:80': Empty work server assignment
13:33:12:WU01:FS02:Connecting to assign-GPU.stanford.edu:8080
13:33:13:WU00:FS01:News: Welcome to Folding@Home
13:33:13:WARNING:WU00:FS01:Failed to get assignment from 'assign-GPU.stanford.edu:8080': Empty work server assignment
13:33:13:ERROR:WU00:FS01:Exception: Could not get an assignment
13:33:13:WU01:FS02:News: Welcome to Folding@Home
13:33:13:WARNING:WU01:FS02:Failed to get assignment from 'assign-GPU.stanford.edu:8080': Empty work server assignment
13:33:13:ERROR:WU01:FS02:Exception: Could not get an assignment

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 2:15 pm
by ChristianVirtual
Same there on both GPU AS no assignment with client 7.4.2 on GTX 780 with Linux;

Different error message on GPU2 though.

Code: Select all

13:46:39:WU01:FS02:Connecting to assign-GPU.stanford.edu:80
13:46:40:WARNING:WU01:FS02:Failed to get assignment from 'assign-GPU.stanford.edu:80': Empty work server assignment
13:46:40:WU01:FS02:Connecting to assign-GPU2.stanford.edu:80

13:46:40:WARNING:WU01:FS02:Failed to get assignment from 'assign-GPU2.stanford.edu:80': Failed to connect to assign-GPU2.stanford.edu:80: Connection refused

13:46:40:ERROR:WU01:FS02:Exception: Could not get an assignment
13:53:31:WU01:FS02:Connecting to assign-GPU.stanford.edu:80
13:53:31:WARNING:WU01:FS02:Failed to get assignment from 'assign-GPU.stanford.edu:80': Empty work server assignment

Code: Select all

********************** Log Started 2014-01-26T14:53:16Z ***********************
14:53:16:************************* Folding@home Client *************************
14:53:16:    Website: http://folding.stanford.edu/
14:53:16:  Copyright: (c) 2009-2014 Stanford University
14:53:16:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:53:16:       Args: --child --lifeline 23881 /etc/fahclient/config.xml --run-as
14:53:16:             fahclient --pid-file=/var/run/fahclient.pid --daemon
14:53:16:     Config: /etc/fahclient/config.xml
14:53:16:******************************** Build ********************************
14:53:16:    Version: 7.4.2
14:53:16:       Date: Jan 24 2014
14:53:16:       Time: 06:33:47
14:53:16:    SVN Rev: 4112
14:53:16:     Branch: fah/trunk/client
14:53:16:   Compiler: GNU 4.4.7
14:53:16:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
14:53:16:             -fno-unsafe-math-optimizations -msse2
14:53:16:   Platform: linux2 3.2.0-1-amd64
14:53:16:       Bits: 64
14:53:16:       Mode: Release
14:53:16:******************************* System ********************************
14:53:16:        CPU: Intel(R) Core(TM) i7-2600S CPU @ 2.80GHz
14:53:16:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
14:53:16:       CPUs: 8
14:53:16:     Memory: 7.74GiB
14:53:16:Free Memory: 6.06GiB
14:53:16:    Threads: POSIX_THREADS
14:53:16: OS Version: 3.8
14:53:16:Has Battery: false
14:53:16: On Battery: false
14:53:16: UTC Offset: 9
14:53:16:        PID: 8066
14:53:16:        CWD: /var/lib/fahclient
14:53:16:         OS: Linux 3.8.0-32-generic x86_64
14:53:16:    OS Arch: AMD64
14:53:16:       GPUs: 2
14:53:16:      GPU 0: NVIDIA:3 GK110 [GeForce GTX 780]
14:53:16:      GPU 1: NVIDIA:3 GK110 [GeForce GTX 780]
14:53:16:       CUDA: 3.5
14:53:16:CUDA Driver: 6000
14:53:16:***********************************************************************
14:53:16:<config>
14:53:16:
14:53:16:  <!-- Logging -->
14:53:16:  <log-rotate-max v='1024'/>
14:53:16:
14:53:16:  <!-- Network -->
14:53:16:  <proxy v=':8080'/>
14:53:16:
14:53:16:  <!-- Remote Command Server -->
14:53:16:
14:53:16:  <!-- Slot Control -->
14:53:16:  <power v='full'/>
14:53:16:
14:53:16:  <!-- User Information -->
14:53:16:  <team v='3446'/>
14:53:16:  <user v='ChristianFAH'/>
14:53:16:
14:53:16:  <!-- Work Unit Control -->
14:53:16:  <next-unit-percentage v='100'/>
14:53:16:
14:53:16:  <!-- Folding Slots -->
14:53:16:  <slot id='0' type='CPU'>
14:53:16:    <client-type v='beta'/>
14:53:16:    <cpus v='4'/>
14:53:16:    <pause-on-start v='true'/>
14:53:16:  </slot>
14:53:16:  <slot id='1' type='GPU'>
14:53:16:    <client-type v='beta'/>
14:53:16:    <pause-on-start v='true'/>
14:53:16:  </slot>
14:53:16:  <slot id='2' type='GPU'>
14:53:16:    <client-type v='beta'/>
14:53:16:    <pause-on-start v='true'/>
14:53:16:  </slot>
14:53:16:</config>

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 4:09 pm
by JimF
I started seeing problems early this morning (east coast time) on one of my machines that had not been upgraded from 7.3.6 yet, so I upgraded to 7.4.4 in the hopes of getting the backup server.
That worked to get a Core_16 on one machine, but I am still out of luck on another. It looks like we are in for a long siege.

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 4:37 pm
by Nicolas_orleans
bruce wrote:The Server Status page does show that several servers with GPU projects have a limited supply of WUs, particularly ones with projects for the Zeta core with almost every WU has been assigned to someone. As previous WUs are completed, the server generates a new Gen, but it is quickly assigned, quickly reverting to the empty condition. I'll notify the Pande Group, but it's late in California I doubt conditions will change tonight.
Just a remark here : I noticed during the last few days a strong increase in the number of Fermi/Kepler GPU reported as folding - see http://fah-web.stanford.edu/cgi-bin/mai ... e=osstats2 - more than 44k Fermi/Kepler GPUs when in my memory the usual figure was around 30-40k. Is there any chance some anonymous corporate donor with rooms filled of idle GPUs started folding recently, thus contributing to dry out Core 17 assignments ? :)

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 4:58 pm
by bruce
That is a well-founded guess. Anonymous corporate donors do contribute to FAH from time to time. (Since they choose to remain anonymous, nobody can answer your question.) Moreover, like regular donors, they're going to contribute whatever hardware they have whenever they choose to, and if that causes unplanned distortions of the assignment process, the Pande Group will have to respond in whatever ways are necessary.

Restarting a downed server may take a few minutes unless the filesystem needs checking, and then it can take a few DAYS. Offloading completed work from a server to make room for new work often takes time. Developing new projects to replace ones which have been completed takes even more time. If new servers need to be added, it takes money, planning, and even more time. The bottom line: Since we don't know what critical resources are required, we can't predict whether solving a WU shortage may take a day or many days and all necessary steps must be done carefully.

The PG is generally MORE concerned when donors have resources that could be used than those donors (you) are yourselves.

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 5:01 pm
by planetclown
For what it's worth I'm also seeing the same issue of empty work server assignment.

Code: Select all

*********************** Log Started 2014-03-18T15:57:17Z ***********************
15:57:17:************************* Folding@home Client *************************
15:57:17:      Website: http://folding.stanford.edu/
15:57:17:    Copyright: (c) 2009-2013 Stanford University
15:57:17:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:57:17:         Args: 
15:57:17:       Config: C:/Users/admin/AppData/Roaming/FAHClient/config.xml
15:57:17:******************************** Build ********************************
15:57:17:      Version: 7.3.6
15:57:17:         Date: Feb 18 2013
15:57:17:         Time: 15:25:17
15:57:17:      SVN Rev: 3923
15:57:17:       Branch: fah/trunk/client
15:57:17:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
15:57:17:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
15:57:17:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
15:57:17:     Platform: win32 XP
15:57:17:         Bits: 32
15:57:17:         Mode: Release
15:57:17:******************************* System ********************************
15:57:17:          CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
15:57:17:       CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
15:57:17:         CPUs: 8
15:57:17:       Memory: 7.89GiB
15:57:17:  Free Memory: 5.25GiB
15:57:17:      Threads: WINDOWS_THREADS
15:57:17:  Has Battery: false
15:57:17:   On Battery: false
15:57:17:   UTC offset: -4
15:57:17:          PID: 5528
15:57:17:          CWD: C:/Users/admin/AppData/Roaming/FAHClient
15:57:17:           OS: Windows 7 Home Premium
15:57:17:      OS Arch: AMD64
15:57:17:         GPUs: 1
15:57:17:        GPU 0: ATI:5 Tahiti PRO [Radeon HD 7950]
15:57:17:         CUDA: Not detected
15:57:17:Win32 Service: false
15:57:17:***********************************************************************
15:57:17:<config>
15:57:17:  <!-- Folding Core -->
15:57:17:  <checkpoint v='30'/>
15:57:17:
15:57:17:  <!-- Folding Slot Configuration -->
15:57:17:  <power v='full'/>
15:57:17:
15:57:17:  <!-- Logging -->
15:57:17:  <verbosity v='2'/>
15:57:17:
15:57:17:  <!-- Network -->
15:57:17:  <proxy v=':8080'/>
15:57:17:
15:57:17:  <!-- User Information -->
15:57:17:  <passkey v='********************************'/>
15:57:17:  <team v='111065'/>
15:57:17:  <user v='planetclown'/>
15:57:17:
15:57:17:  <!-- Folding Slots -->
15:57:17:  <slot id='0' type='GPU'>
15:57:17:    <next-unit-percentage v='100'/>
15:57:17:    <pause-on-start v='true'/>
15:57:17:  </slot>
15:57:17:  <slot id='1' type='CPU'>
15:57:17:    <cpus v='-1'/>
15:57:17:    <next-unit-percentage v='100'/>
15:57:17:    <pause-on-start v='true'/>
15:57:17:  </slot>
15:57:17:</config>
15:57:17:Trying to access database...
15:57:17:Enabled folding slot 00: PAUSED gpu:0:Tahiti PRO [Radeon HD 7950] (paused)
15:57:17:Enabled folding slot 01: PAUSED cpu:8 (paused)
15:57:30:FS00:Unpaused
15:57:31:WU00:FS00:News: Welcome to Folding@Home
15:57:31:WARNING:WU00:FS00:Failed to get assignment from 'assign-GPU.stanford.edu:80': Empty work server assignment
15:57:31:WU00:FS00:News: Welcome to Folding@Home
...
16:42:54:WU00:FS00:News: Welcome to Folding@Home
16:42:54:WARNING:WU00:FS00:Failed to get assignment from 'assign-GPU.stanford.edu:8080': Empty work server assignment
16:42:54:ERROR:WU00:FS00:Exception: Could not get an assignment

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 6:26 pm
by Nicolas_orleans
bruce wrote:That is a well-founded guess. Anonymous corporate donors do contribute to FAH from time to time. (Since they choose to remain anonymous, nobody can answer your question.) Moreover, like regular donors, they're going to contribute whatever hardware they have whenever they choose to, and if that causes unplanned distortions of the assignment process, the Pande Group will have to respond in whatever ways are necessary. .
I guess so, Dr Pande just confirmed a big donation from corporate partner, see https://folding.stanford.edu/home/worki ... f-gpu-wus/

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 8:37 pm
by thebluebumblebee
Help us reach 1,000,000 Today we are 227,848 computers strong outputting 38,796 teraflops of computing power and growing fast.
Really? Just a few thousand new GPU's brought F@H to its knees. AGAIN!

GPU on "WAIT" for several days now

Posted: Tue Mar 18, 2014 9:24 pm
by Viceroy
My GPU has not been Folding for several days now. It just says "WAIT". What is all this about? I have not changed any settings what so ever.

Image

Re: GPU on "WAIT" for several days now

Posted: Tue Mar 18, 2014 9:36 pm
by Mstenholm
There are no WUs atm...see the other recent posts.

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 10:15 pm
by billford
thebluebumblebee wrote:
Help us reach 1,000,000 Today we are 227,848 computers strong outputting 38,796 teraflops of computing power and growing fast.
Really? Just a few thousand new GPU's brought F@H to its knees. AGAIN!
I can entirely understand that PG cannot decline large donations of computing power, wherever they come from, but they need to find better ways of dealing with them.

A short-term benefit at the expense of hacking off numbers of long-term donors could very easily become counter-productive.

Dr Pande is clearly not unaware of the the problem, but phrases like "online shortly" and "hopefully just briefly annoying" are not particularly helpful. How long is "shortly"? How prolonged is "brief"? Hours? Days?

At least let's have some useful information!!!!!

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 10:22 pm
by VijayPande
Sorry, soon = later today. I'll work to be more precise in future posts. We're under high load and we had to take down 2 servers for maintenance, leaving fewer WUs available.

Moreover, we as a rule avoid "busy work," i.e. WUs that we send out just to keep donors getting points but don't have scientific value, so there will be on occasion times when we run out of WUs.

We are also rolling out more GPU WUs in general as we're expecting continued growth in GPUs, at least for a little bit.

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 10:25 pm
by billford
VijayPande wrote:Sorry, soon = later today.
I have to translate that to UK time so I shall probably be fast asleep when my GPU wakes up, but thank you.

Re: Unable to acquire new GPU work

Posted: Tue Mar 18, 2014 10:34 pm
by VijayPande
PS It's also important to know that generating new WUs, even when the science is done, just takes a long time unfortunately, typically 4 to 8 hours depending on the number of WUs being built.