Page 6 of 10

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 3:49 pm
by Gary480six
VijayPande wrote:ok, we'll set 9406, 13000 and 13001 to be beta only for Maxwell. That should keep them away from the Maxwell GPUs and still give us beta team feedback for what's going wrong w/those WUs.
Dr. Pande,

Can you also move the P10467 and P10469 core 17 work units back to beta only for Maxwell?

They are failing on my GTX 750Ti cards with the same 'Force RMSE error' that was crashing the P13000 and P13001 work.

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 3:53 pm
by Breach
Confirmed for 10467, 10468 and 10469.

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 4:41 pm
by Nathan_P
Kjetil wrote:Okay, running 78xx and 9202 now. Thanks.
Edit: Can you fix hfm.net to? It show 0 points and core unknown on 78xx.
HFM is a 3rd party app and as such is unsupported by PG. Which version are you running, the latest version is 0.9.2. If you still have problems or are already upgraded I would suggest contacting the developer - harlam357

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 5:03 pm
by Kjetil
HFM.Net is not the problems, ps running 0.9.2. This is http://fah-web.stanford.edu/psummaryC.html

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 6:00 pm
by Gary480six
Just a quick anecdotal update.

As of 1:30 PM Eastern.. I'm now getting core 15 work on my GTX 750Ti cards - rather than the failed core 17 work I was getting.

Version 7.4.4 client on Windows 7 systems with the GPU slot set to normal and the 340.xx video drivers.

I was issued a P7621, a P7622 and a P7623 work unit - and all seem to be Folding normally.

Still wish I could get back to the successful core 17 Folding production I had two weeks ago, but at least now I can be of some help to the science.

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 6:10 pm
by Tim_H
Hi, I'm having some issues getting work since last night. I have restarted the machine, as well as tried removing all flags and no change.
I keep seeing "Empty work server assignment" I can ping the server but no work.

Any help would be appreciated.

Code: Select all

18:00:02:<config>
18:00:02:  <!-- Folding Core -->
18:00:02:  <core-priority v='low'/>
18:00:02:
18:00:02:  <!-- Logging -->
18:00:02:  <verbosity v='5'/>
18:00:02:
18:00:02:  <!-- Network -->
18:00:02:  <proxy v=':8080'/>
18:00:02:
18:00:02:  <!-- Remote Command Server -->
18:00:02:  <password v='*******'/>
18:00:02:
18:00:02:  <!-- User Information -->
18:00:02:  <passkey v='********************************'/>
18:00:02:  <team v='37412'/>
18:00:02:  <user v='Tim_H'/>
18:00:02:
18:00:02:  <!-- Folding Slots -->
18:00:02:  <slot id='0' type='CPU'>
18:00:02:    <client-type v='bigadv'/>
18:00:02:    <cpus v='48'/>
18:00:02:    <max-packet-size v='big'/>
18:00:02:  </slot>
18:00:02:</config>
18:01:19:WU00:FS00:Connecting to 171.67.108.200:8080
18:01:20:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.200:8080': Empty work server assignment
18:01:20:WU00:FS00:Connecting to 171.64.65.121:80
18:02:23:WARNING:WU00:FS00:Failed to get assignment from '171.64.65.121:80': Failed to connect to 171.64.65.121:80: Connection timed out
18:02:23:ERROR:WU00:FS00:Exception: Could not get an assignment
18:05:34:WU00:FS00:Connecting to 171.67.108.200:8080
18:05:34:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.200:8080': Empty work server assignment
18:05:34:WU00:FS00:Connecting to 171.64.65.121:80
18:06:37:WARNING:WU00:FS00:Failed to get assignment from '171.64.65.121:80': Failed to connect to 171.64.65.121:80: Connection timed out
18:06:37:ERROR:WU00:FS00:Exception: Could not get an assignment

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 6:31 pm
by kimben777
VijayPande wrote:ok, we'll set 9406, 13000 and 13001 to be beta only for Maxwell. That should keep them away from the Maxwell GPUs and still give us beta team feedback for what's going wrong w/those WUs.
I have picked up core 15 on all four of my 780 ti's with the advanced flag, I thought things were going to stay the same for kepler, I'll go to beta flag now and see what happens.

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 6:51 pm
by johnim
hi thanks what ever you did the 970s are folding on beta now im using 344.11 drivers

Image

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 6:55 pm
by Breach
@johnim that's good news, but if you run beta you may get one of the problematic WUs unless they have been completely blocked for Maxwell users.

FYI, 10466-10469 should no longer be assigned on Maxwell at least in fah and advanced:

viewtopic.php?f=66&t=26405&start=30

Let's hope that whatever c17 projects are left are OK. I guess it's normal to start getting c15 units too now that the selection pool has been reduced.

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 8:07 pm
by bruce
Gary480six wrote: Still wish I could get back to the successful core 17 Folding production I had two weeks ago, but at least now I can be of some help to the science.
We all wish that.

Core_17 uses some features which were not needed by Cores 15/16. Given a choice between WUs that do fold and WUs that do not, you're probably stuck with cores 15/16 for a while. There seem to be strong reasons to blame the drivers for not supporting the features they're supposed to support and nobody has identified drivers which work. Even if somebody says I'm successfully using drivers xxx.xx, results seem to depend on which class of GPU you're running, perhaps on which version of the OS you're running, and possibly on other factors.

At this point, there is no reason to believe that we're looking at a bug in the Assignment Server code. Rather, as changes were made to minimize the impact of the real AS bugs, attention was focused on problems that were already present but which had remained "under the radar."

Unfortunately, when dealing with bugs which might be in the drivers, which might be in the FahCore, or which might be in the latest hardware, it takes time to isolate each problem (since there may be several) and test the validity of the fix. If all of the bugs are in the FahCore (which I doubt) then Stanford can fix them. If some are in the drivers or the hardware (which I suspect), all Stanford can do is (A) Wait for NV/ATI to create drivers which bypass the defective sections of the hardware,or (B) Wait for NV/ATI to fix their drivers or (C) Rewrite segments of the FahCore to bypass the errors encountered when the drivers produce the correct results.

Note that "(A)" presumes that issuing a recall for defective hardware is not on the list, even if the automakers have sometimes used that option.

Re: New Assignment Server feedback/problem

Posted: Mon Oct 06, 2014 10:59 pm
by Biffa
VijayPande wrote:ok, we'll set 9406, 13000 and 13001 to be beta only for Maxwell. That should keep them away from the Maxwell GPUs and still give us beta team feedback for what's going wrong w/those WUs.
22:57:47:WU00:FS01:0x17:Project: 7814 (Run 57, Clone 0, Gen 14)

Running fine on Maxwell GTX970 now :)

Stock clocks (1076/1753), Win 8.1, Driver 344.16

Re: New Assignment Server feedback/problem

Posted: Tue Oct 07, 2014 1:41 pm
by Gary480six
More data for the diagnostic team:

My GTX 750Ti cards have all picked up P9201 core 17 work units. So some core 17 work is still available to my Maxwell cards - and the work does complete!


If it helps, in each case it is running on version 0.0.52 of core 17.


The configuration on all three systems is the same - version 7.4.4 client on Windows 7 systems with the GPU slot set to normal and the 340.52 video drivers. The GPUs are at stock clocks and I am not SMP Folding on these systems.

Also, two of the systems are Windows 7 Ultimate 64-bit and one is Windows 7 Professional 32-bit - yet they all seemed to be reacting exactly the same to the failed P13001 and P10467 series of core 17 work.


The only other factor I can think of - from the donors side of the equation... is that perhaps a Windows Update fouled things up for some of the 'core 17 on Maxwell' Folding?

It's been about two weeks since I hit my peak Folding day. I looked at one of those boxes - and in that time, I have installed about a dozen Windows updates and about 15 updates to Microsoft Security Essentials. (yeah - I know.. but it is just a Folding box)

If the Pande Group wants to persue updates as being the problem, I could probably cut and paste a list of those updates.

Re: New Assignment Server feedback/problem

Posted: Wed Oct 08, 2014 10:06 am
by Biffa
Its not updates. It was an issue with some projects on Maxwell

I've managed to complete the following projects on my 970

22:57:47:WU00:FS01:0x17:Project: 7814 (Run 57, Clone 0, Gen 14)
2:51:16:WU00:FS01:0x17:Project: 9202 (Run 110, Clone 2, Gen 215)
21:47:39:WU00:FS01:0x18:Project: 10473 (Run 0, Clone 194, Gen 44)

But not getting any of the ones that were causing problems.

Re: New Assignment Server feedback/problem

Posted: Wed Oct 08, 2014 9:30 pm
by runpaint
I'm running all gtx750 & 750ti cards, I haven't changed anything or updated the drivers, and 8 of them failed this week for the first time. I just re-started another one today (removed the gpu slot and added it back), I think it was working yesterday but it might have been sitting there doing nothing for 2 or 3 days. They all run 24 hours a day so I don't always remember to check each one daily. I went from 500,000 ppd to a little over 200,000, although that's partly because of all the core 15s.

Re: New Assignment Server feedback/problem

Posted: Wed Oct 08, 2014 9:36 pm
by Breach
runpaint wrote:I'm running all gtx750 & 750ti cards, I haven't changed anything or updated the drivers, and 8 of them failed this week for the first time. I just re-started another one today (removed the gpu slot and added it back), I think it was working yesterday but it might have been sitting there doing nothing for 2 or 3 days. They all run 24 hours a day so I don't always remember to check each one daily. I went from 500,000 ppd to a little over 200,000, although that's partly because of all the core 15s.
Which projects failed? If WUs from 13000, 13001, 10467, 10468, 10469 please dump the WUs and you'll pick up working ones. If others please report them.