Page 2 of 2

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 5:43 am
by snapshot
Here are the last failures, from my test box this time. i3-3240, 16GB RAM, GTX750ti, W7 64bit, Nvidia drivers 340.52

Code: Select all

05:19:07:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 2012 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
05:19:07:WU00:FS00:Started FahCore on PID 3516
05:19:07:WU00:FS00:Core PID:4016
05:19:07:WU00:FS00:FahCore 0x17 started
05:19:08:WU00:FS00:0x17:*********************** Log Started 2014-10-06T05:19:07Z ***********************
05:19:08:WU00:FS00:0x17:Project: 13000 (Run 778, Clone 9, Gen 7)
05:19:08:WU00:FS00:0x17:Unit: 0x00000012538b3db753107775eb8fc202
05:19:08:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
05:19:08:WU00:FS00:0x17:Machine: 0
05:19:08:WU00:FS00:0x17:Reading tar file state.xml
05:19:09:WU00:FS00:0x17:Reading tar file system.xml
05:19:10:WU00:FS00:0x17:Reading tar file integrator.xml
05:19:10:WU00:FS00:0x17:Reading tar file core.xml
05:19:10:WU00:FS00:0x17:Digital signatures verified
05:19:10:WU00:FS00:0x17:Folding@home GPU core17
05:19:10:WU00:FS00:0x17:Version 0.0.52
05:19:13:WU02:FS00:Upload 39.45%
05:19:19:WU02:FS00:Upload 86.80%
05:19:22:WU02:FS00:Upload complete
05:19:22:WU02:FS00:Server responded WORK_ACK (400)
05:19:22:WU02:FS00:Final credit estimate, 14093.00 points
05:19:22:WU02:FS00:Cleaning up
05:21:02:WU01:FS01:0xa4:Completed 70000 out of 500000 steps  (14%)
05:22:17:WU00:FS00:0x17:ERROR:exception: Force RMSE error of 454.919 with threshold of 5
05:22:17:WU00:FS00:0x17:Saving result file logfile_01.txt
05:22:17:WU00:FS00:0x17:Saving result file log.txt
05:22:17:WU00:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
05:22:17:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:22:17:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:778 clone:9 gen:7 core:0x17 unit:0x00000012538b3db753107775eb8fc202
05:22:17:WU00:FS00:Uploading 2.23KiB to 140.163.4.231
05:22:17:WU00:FS00:Connecting to 140.163.4.231:8080
05:22:18:WU02:FS00:Connecting to 171.67.108.201:80
05:22:18:WU00:FS00:Upload complete
05:22:18:WU00:FS00:Server responded WORK_ACK (400)
05:22:18:WU00:FS00:Cleaning up
05:22:19:WU02:FS00:Assigned to work server 140.163.4.231
05:22:19:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
05:22:19:WU02:FS00:Connecting to 140.163.4.231:8080
05:22:19:WU02:FS00:Downloading 4.84MiB
05:22:23:WU02:FS00:Download complete
05:22:23:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:13001 run:348 clone:6 gen:13 core:0x17 unit:0x00000021538b3db75328b27afdc8abcd
05:22:23:WU02:FS00:Starting
05:22:23:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 02 -suffix 01 -version 704 -lifeline 2012 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
05:22:23:WU02:FS00:Started FahCore on PID 3280
05:22:23:WU02:FS00:Core PID:1404
05:22:23:WU02:FS00:FahCore 0x17 started
05:22:23:WU02:FS00:0x17:*********************** Log Started 2014-10-06T05:22:23Z ***********************
05:22:23:WU02:FS00:0x17:Project: 13001 (Run 348, Clone 6, Gen 13)
05:22:23:WU02:FS00:0x17:Unit: 0x00000021538b3db75328b27afdc8abcd
05:22:23:WU02:FS00:0x17:CPU: 0x00000000000000000000000000000000
05:22:23:WU02:FS00:0x17:Machine: 0
05:22:23:WU02:FS00:0x17:Reading tar file state.xml
05:22:24:WU02:FS00:0x17:Reading tar file system.xml
05:22:25:WU02:FS00:0x17:Reading tar file integrator.xml
05:22:25:WU02:FS00:0x17:Reading tar file core.xml
05:22:25:WU02:FS00:0x17:Digital signatures verified
05:22:25:WU02:FS00:0x17:Folding@home GPU core17
05:22:25:WU02:FS00:0x17:Version 0.0.52
05:25:26:WU02:FS00:0x17:ERROR:exception: Force RMSE error of 453.209 with threshold of 5
05:25:26:WU02:FS00:0x17:Saving result file logfile_01.txt
05:25:26:WU02:FS00:0x17:Saving result file log.txt
05:25:26:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
05:25:26:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:25:26:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13001 run:348 clone:6 gen:13 core:0x17 unit:0x00000021538b3db75328b27afdc8abcd
05:25:26:WU02:FS00:Uploading 2.23KiB to 140.163.4.231
05:25:26:WU02:FS00:Connecting to 140.163.4.231:8080
05:25:27:WU02:FS00:Upload complete
05:25:27:WU02:FS00:Server responded WORK_ACK (400)
05:25:27:WU02:FS00:Cleaning up
05:25:27:WU00:FS00:Connecting to 171.67.108.201:80
05:25:27:WU00:FS00:Assigned to work server 140.163.4.231
05:25:27:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
05:25:27:WU00:FS00:Connecting to 140.163.4.231:8080
05:25:28:WU00:FS00:Downloading 4.83MiB
05:25:31:WU00:FS00:Download complete
05:25:31:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:139 clone:9 gen:5 core:0x17 unit:0x0000000f538b3db75328774391aa6ead
05:25:31:WU00:FS00:Starting
05:25:31:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 2012 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
05:25:31:WU00:FS00:Started FahCore on PID 4068
05:25:31:WU00:FS00:Core PID:432
05:25:31:WU00:FS00:FahCore 0x17 started
05:25:32:WU00:FS00:0x17:*********************** Log Started 2014-10-06T05:25:31Z ***********************
05:25:32:WU00:FS00:0x17:Project: 13001 (Run 139, Clone 9, Gen 5)
05:25:32:WU00:FS00:0x17:Unit: 0x0000000f538b3db75328774391aa6ead
05:25:32:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
05:25:32:WU00:FS00:0x17:Machine: 0
05:25:32:WU00:FS00:0x17:Reading tar file state.xml
05:25:33:WU00:FS00:0x17:Reading tar file system.xml
05:25:34:WU00:FS00:0x17:Reading tar file integrator.xml
05:25:34:WU00:FS00:0x17:Reading tar file core.xml
05:25:34:WU00:FS00:0x17:Digital signatures verified
05:25:34:WU00:FS00:0x17:Folding@home GPU core17
05:25:34:WU00:FS00:0x17:Version 0.0.52
05:26:42:FS00:Finishing
05:28:35:WU00:FS00:0x17:ERROR:exception: Force RMSE error of 450.841 with threshold of 5
05:28:35:WU00:FS00:0x17:Saving result file logfile_01.txt
05:28:35:WU00:FS00:0x17:Saving result file log.txt
05:28:35:WU00:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
05:28:36:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:28:36:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:139 clone:9 gen:5 core:0x17 unit:0x0000000f538b3db75328774391aa6ead
05:28:36:WU00:FS00:Uploading 2.22KiB to 140.163.4.231
05:28:36:WU00:FS00:Connecting to 140.163.4.231:8080
05:28:36:WU00:FS00:Upload complete
05:28:36:WU00:FS00:Server responded WORK_ACK (400)
05:28:36:WU00:FS00:Cleaning up
05:29:33:WU01:FS01:0xa4:Completed 75000 out of 500000 steps  (15%)

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 7:15 am
by snapshot
I updated the Nvidia drivers from 340.52 to 344.11 and this made no difference at all. Watching the log screen and GPU-Z sensor screen, it's clearly just at the moment when the GPU starts working that the WU fails. The CPU core has been working for a couple of minutes without any major GPU usage then GPU-Z shows a very narrow spike of GPU usage and the log screen shows the WU failing.

Please get someone to understand that 13000 and 13001 simply aren't working and need fixing yesterday. When my current two CPU WUs finish, that's it from me until I hear GPU folding is stable again.

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 8:50 am
by Breach
People understand and have been on top of this, unfortunately it was the weekend. See here:
viewtopic.php?f=18&t=26807&start=15

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 2:06 pm
by snapshot
Excuses, excuses. :wink:

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 4:12 pm
by 7im
P13000 has been moved back to beta along with several other projects for Maxwell cards. To avoid these problems move back to client type advanced.

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 8:56 pm
by Razzaa
bruce wrote:
Razzaa wrote:I am having the exact same issues. I have tried numerous things to fix it but now my GPU wont fold at all.
Please report which GPU you have and which drivers you are running.
Gigabyte GTX 970 344.16

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 9:02 pm
by Kjetil
It is not the driver so are the problem, i am running 344.11 on 3 x750Ti and 3 x 980

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 10:44 pm
by bfromcolo
7im wrote:P13000 has been moved back to beta along with several other projects for Maxwell cards. To avoid these problems move back to client type advanced.
I just retried it and got a 9201, so looks like things are working again. I am not running any client type flags at the moment, should I add advanced or are you just trying to avoid beta?

Mint 17, 750ti, NVIDIA 343.22

Re: 13001 WU failure

Posted: Mon Oct 06, 2014 11:29 pm
by bruce
Kjetil wrote:It is not the driver so are the problem, i am running 344.11 on 3 x750Ti and 3 x 980
Consider the possibilities:
Project wwww uses a computational feature that that exercises a bug in driver mmm.mm but which is NOT a bug in drivers nnn.nn
Project xxxx does not use any computational features that that exercises any driver bug.
Project yyyy uses a computational feature that that exercizes a bug in driver nnn,nn but which is NOT a bug in drivers mmm.mm
Project zzzz uses a computational feature that that exercises a bug in all drivers.

Assuming first, the any bugs will be fixed eventually, but not soon, how can anyone gather enough information to send the right projects to the right people?

Clearly project xxxx should be assigned and project zzzz should not be assigned, even if that means some people run out of work to do.

Presuming that project xxxx doesnt' have enough WUs for everyone or if they produce low PPD then some of wwww and/or yyyy must be assigned, but here's the rub. Stanford does not know who is running nnn.nn or mmm.mm, only that people are reporting problems with incomplete information.

Please summarize all problem reports and categorize all projects under the four possibilities wwww,xxxx,yyyy,zzzz and under wwww and yyyy, which drivers everybody is using. If one person says the driver fails with the project and another person says they're having no problem, note that mixed results were reported for that combination and now try to figure out if they're reporting different types of GPUs.