9634 (Run 0, Clone 9, Gen 5)

Moderators: Site Moderators, FAHC Science Team

toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

Yes, core 21 has been optimized more than older cores (17 and 18), so it is pushing the hardware harder. The projects run on core 21 are also often bigger than we used to run, so it might be using the chip more efficiently and pushing it even harder.

As always when such situation occurs, the advice for overclockers is to lower their clocks (or to increase their GPU voltage) if they're seeing higher failure rate on their GPUs ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by bruce »

toTOW wrote:Yes, core 21 has been optimized more than older cores (17 and 18), so it is pushing the hardware harder. The projects run on core 21 are also often bigger than we used to run, so it might be using the chip more efficiently and pushing it even harder.

As always when such situation occurs, the advice for overclockers is to lower their clocks (or to increase their GPU voltage) if they're seeing higher failure rate on their GPUs ...
It's not just Core_21. NVidia is pushing their hardware harder. They are using what the call "boost clocks" to overclock their own equipment. It's a variable overclock depending on what they perceive the GPU is doing. If they decide to overclock a GPU that you've already overclocked, there's a perfect opportunity for stability problems.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by Grandpa_01 »

I am pretty sure it does not have anything to do with OC.This is what I came home to today. The first log is a 9205 running on windows on a GTX 980 Classified @ a fairly heavy OC of 1480Mhz with 0 slowdown.

The second is a 9206 on a Linux machine GTX 980 Classified with a very light OC 1437Mhz With the slowdown problem 2 hrs. This card runs all other WU's with a OC running at 1497Mhz all of the core 21 10XXX, Core 18 work ok with this card at high OC.

The 3rd is Linux running a 9626 on a brand new GTX 980 Classified that was running with a fairly heavy OC 1500Mhz and did get some bad states but that card has no problem running other WU's 10xxx Core 21 and core 18.

The last is a Linux running a 9628 on a GTX 970 sc running at Default oc It did get some bad states after the pause and restart once again it has no problem running other WU,s



The point is the windows machine has no problem running the new released 9xxx Core 21 WU's with a heavy OC the Linux machines wont even run them at their default clocks. I have run 355.11 and 352.86 drivers and both have the same results. I will give the 347.xx a shot when I get a chance which may be a while. Perhaps these should be limitrd to Windows until the problem is figured out. I know I can not leave the Linux boxes running F@H the way things are now. But when I get a chance I will try and help figure it out. But due to other sircunstance that may be a while.

Code: Select all

11:17:57:</config>
18:06:18:WU01:FS00:FahCore 0x21 started
18:06:19:WU01:FS00:0x21:*********************** Log Started 2015-10-15T18:06:18Z ***********************
18:06:19:WU01:FS00:0x21:Project: 9205 (Run 10, Clone 39, Gen 0)
18:06:19:WU01:FS00:0x21:Unit: 0x00000009664f2dd055d4ccf256a1a88d
18:06:19:WU01:FS00:0x21:CPU: 0x00000000000000000000000000000000
18:06:19:WU01:FS00:0x21:Machine: 0
18:06:19:WU01:FS00:0x21:Reading tar file core.xml
18:06:19:WU01:FS00:0x21:Reading tar file system.xml
18:06:19:WU01:FS00:0x21:Reading tar file integrator.xml
18:06:19:WU01:FS00:0x21:Reading tar file state.xml
18:06:20:WU01:FS00:0x21:Digital signatures verified
18:06:20:WU01:FS00:0x21:Folding@home GPU Core21 Folding@home Core
18:06:20:WU01:FS00:0x21:Version 0.0.11
18:06:24:WU00:FS00:Upload 23.76%
18:06:30:WU00:FS00:Upload 49.50%
18:06:36:WU00:FS00:Upload 75.25%
18:06:47:WU00:FS00:Upload complete
18:06:47:WU00:FS00:Server responded WORK_ACK (400)
18:06:47:WU00:FS00:Final credit estimate, 42463.00 points
18:06:47:WU00:FS00:Cleaning up
18:07:11:WU01:FS00:0x21:Completed 0 out of 2500000 steps (0%)
18:07:11:WU01:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
18:11:43:WU01:FS00:0x21:Completed 25000 out of 2500000 steps (1%)
18:16:16:WU01:FS00:0x21:Completed 50000 out of 2500000 steps (2%)
18:21:01:WU01:FS00:0x21:Completed 75000 out of 2500000 steps (3%)
18:25:45:WU01:FS00:0x21:Completed 100000 out of 2500000 steps (4%)
18:30:49:WU01:FS00:0x21:Completed 125000 out of 2500000 steps (5%)
18:35:32:WU01:FS00:0x21:Completed 150000 out of 2500000 steps (6%)
18:40:16:WU01:FS00:0x21:Completed 175000 out of 2500000 steps (7%)
18:44:59:WU01:FS00:0x21:Completed 200000 out of 2500000 steps (8%)
18:50:03:WU01:FS00:0x21:Completed 225000 out of 2500000 steps (9%)
18:54:47:WU01:FS00:0x21:Completed 250000 out of 2500000 steps (10%)
18:59:31:WU01:FS00:0x21:Completed 275000 out of 2500000 steps (11%)
19:04:15:WU01:FS00:0x21:Completed 300000 out of 2500000 steps (12%)
19:09:20:WU01:FS00:0x21:Completed 325000 out of 2500000 steps (13%)
19:14:04:WU01:FS00:0x21:Completed 350000 out of 2500000 steps (14%)
19:18:48:WU01:FS00:0x21:Completed 375000 out of 2500000 steps (15%)
19:23:32:WU01:FS00:0x21:Completed 400000 out of 2500000 steps (16%)
19:28:37:WU01:FS00:0x21:Completed 425000 out of 2500000 steps (17%)
19:33:21:WU01:FS00:0x21:Completed 450000 out of 2500000 steps (18%)
19:38:04:WU01:FS00:0x21:Completed 475000 out of 2500000 steps (19%)
19:42:48:WU01:FS00:0x21:Completed 500000 out of 2500000 steps (20%)
19:47:52:WU01:FS00:0x21:Completed 525000 out of 2500000 steps (21%)
19:52:36:WU01:FS00:0x21:Completed 550000 out of 2500000 steps (22%)
19:57:20:WU01:FS00:0x21:Completed 575000 out of 2500000 steps (23%)
20:02:04:WU01:FS00:0x21:Completed 600000 out of 2500000 steps (24%)
20:07:09:WU01:FS00:0x21:Completed 625000 out of 2500000 steps (25%)
20:11:53:WU01:FS00:0x21:Completed 650000 out of 2500000 steps (26%)
20:16:37:WU01:FS00:0x21:Completed 675000 out of 2500000 steps (27%)
20:21:20:WU01:FS00:0x21:Completed 700000 out of 2500000 steps (28%)
20:26:26:WU01:FS00:0x21:Completed 725000 out of 2500000 steps (29%)
20:31:09:WU01:FS00:0x21:Completed 750000 out of 2500000 steps (30%)
20:35:54:WU01:FS00:0x21:Completed 775000 out of 2500000 steps (31%)
20:40:39:WU01:FS00:0x21:Completed 800000 out of 2500000 steps (32%)
20:45:44:WU01:FS00:0x21:Completed 825000 out of 2500000 steps (33%)
20:50:28:WU01:FS00:0x21:Completed 850000 out of 2500000 steps (34%)
20:55:13:WU01:FS00:0x21:Completed 875000 out of 2500000 steps (35%)
20:59:57:WU01:FS00:0x21:Completed 900000 out of 2500000 steps (36%)
21:05:02:WU01:FS00:0x21:Completed 925000 out of 2500000 steps (37%)
21:09:46:WU01:FS00:0x21:Completed 950000 out of 2500000 steps (38%)
21:14:31:WU01:FS00:0x21:Completed 975000 out of 2500000 steps (39%)
21:19:16:WU01:FS00:0x21:Completed 1000000 out of 2500000 steps (40%)
21:24:20:WU01:FS00:0x21:Completed 1025000 out of 2500000 steps (41%)
21:29:04:WU01:FS00:0x21:Completed 1050000 out of 2500000 steps (42%)
21:33:49:WU01:FS00:0x21:Completed 1075000 out of 2500000 steps (43%)
21:38:34:WU01:FS00:0x21:Completed 1100000 out of 2500000 steps (44%)
21:43:39:WU01:FS00:0x21:Completed 1125000 out of 2500000 steps (45%)
21:48:23:WU01:FS00:0x21:Completed 1150000 out of 2500000 steps (46%)
21:53:07:WU01:FS00:0x21:Completed 1175000 out of 2500000 steps (47%)
21:57:52:WU01:FS00:0x21:Completed 1200000 out of 2500000 steps (48%)
22:02:57:WU01:FS00:0x21:Completed 1225000 out of 2500000 steps (49%)
22:07:41:WU01:FS00:0x21:Completed 1250000 out of 2500000 steps (50%)
22:12:25:WU01:FS00:0x21:Completed 1275000 out of 2500000 steps (51%)
22:17:09:WU01:FS00:0x21:Completed 1300000 out of 2500000 steps (52%)
22:22:14:WU01:FS00:0x21:Completed 1325000 out of 2500000 steps (53%)
22:26:58:WU01:FS00:0x21:Completed 1350000 out of 2500000 steps (54%)
22:31:43:WU01:FS00:0x21:Completed 1375000 out of 2500000 steps (55%)
22:36:28:WU01:FS00:0x21:Completed 1400000 out of 2500000 steps (56%)
22:41:33:WU01:FS00:0x21:Completed 1425000 out of 2500000 steps (57%)
22:46:16:WU01:FS00:0x21:Completed 1450000 out of 2500000 steps (58%)
22:51:00:WU01:FS00:0x21:Completed 1475000 out of 2500000 steps (59%)
22:55:45:WU01:FS00:0x21:Completed 1500000 out of 2500000 steps (60%)
23:00:50:WU01:FS00:0x21:Completed 1525000 out of 2500000 steps (61%)
23:05:34:WU01:FS00:0x21:Completed 1550000 out of 2500000 steps (62%)
23:10:19:WU01:FS00:0x21:Completed 1575000 out of 2500000 steps (63%)
23:15:03:WU01:FS00:0x21:Completed 1600000 out of 2500000 steps (64%)
******************************* Date: 2015-10-15 *******************************
23:20:07:WU01:FS00:0x21:Completed 1625000 out of 2500000 steps (65%)
23:24:51:WU01:FS00:0x21:Completed 1650000 out of 2500000 steps (66%)
23:29:36:WU01:FS00:0x21:Completed 1675000 out of 2500000 steps (67%)
23:34:20:WU01:FS00:0x21:Completed 1700000 out of 2500000 steps (68%)
23:39:26:WU01:FS00:0x21:Completed 1725000 out of 2500000 steps (69%)
23:44:10:WU01:FS00:0x21:Completed 1750000 out of 2500000 steps (70%)
23:48:55:WU01:FS00:0x21:Completed 1775000 out of 2500000 steps (71%)
23:53:40:WU01:FS00:0x21:Completed 1800000 out of 2500000 steps (72%)
23:58:44:WU01:FS00:0x21:Completed 1825000 out of 2500000 steps (73%)
00:03:28:WU01:FS00:0x21:Completed 1850000 out of 2500000 steps (74%)
00:08:13:WU01:FS00:0x21:Completed 1875000 out of 2500000 steps (75%)
00:12:58:WU01:FS00:0x21:Completed 1900000 out of 2500000 steps (76%)
00:18:04:WU01:FS00:0x21:Completed 1925000 out of 2500000 steps (77%)
00:22:51:WU01:FS00:0x21:Completed 1950000 out of 2500000 steps (78%)
00:27:37:WU01:FS00:0x21:Completed 1975000 out of 2500000 steps (79%)
00:32:23:WU01:FS00:0x21:Completed 2000000 out of 2500000 steps (80%)
00:37:29:WU01:FS00:0x21:Completed 2025000 out of 2500000 steps (81%)
00:42:14:WU01:FS00:0x21:Completed 2050000 out of 2500000 steps (82%)
00:46:59:WU01:FS00:0x21:Completed 2075000 out of 2500000 steps (83%)
00:51:43:WU01:FS00:0x21:Completed 2100000 out of 2500000 steps (84%)
00:56:47:WU01:FS00:0x21:Completed 2125000 out of 2500000 steps (85%)
01:01:32:WU01:FS00:0x21:Completed 2150000 out of 2500000 steps (86%)

Code: Select all

14:20:45:WU00:FS01:FahCore 0x21 started
14:20:45:WU00:FS01:0x21:*********************** Log Started 2015-10-15T14:20:45Z ***********************
14:20:45:WU00:FS01:0x21:Project: 9206 (Run 0, Clone 959, Gen 2)
14:20:45:WU00:FS01:0x21:Unit: 0x00000019664f2dd055db8d4af10ecf6c
14:20:45:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
14:20:45:WU00:FS01:0x21:Machine: 1
14:20:45:WU00:FS01:0x21:Reading tar file core.xml
14:20:45:WU00:FS01:0x21:Reading tar file system.xml
14:20:46:WU00:FS01:0x21:Reading tar file integrator.xml
14:20:46:WU00:FS01:0x21:Reading tar file state.xml
14:20:48:WU00:FS01:0x21:Digital signatures verified
14:20:48:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
14:20:48:WU00:FS01:0x21:Version 0.0.11
14:20:52:WU01:FS01:Upload 0.71%
14:20:58:WU01:FS01:Upload 59.95%
14:21:13:WU01:FS01:Upload complete
14:21:13:WU01:FS01:Server responded WORK_ACK (400)
14:21:13:WU01:FS01:Final credit estimate, 31949.00 points
14:21:13:WU01:FS01:Cleaning up
14:21:57:WU00:FS01:0x21:Completed 0 out of 2500000 steps (0%)
14:21:57:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
14:27:01:WU00:FS01:0x21:Completed 25000 out of 2500000 steps (1%)
14:31:48:WU00:FS01:0x21:Completed 50000 out of 2500000 steps (2%)
14:36:35:WU00:FS01:0x21:Completed 75000 out of 2500000 steps (3%)
14:41:21:WU00:FS01:0x21:Completed 100000 out of 2500000 steps (4%)
14:46:29:WU00:FS01:0x21:Completed 125000 out of 2500000 steps (5%)
14:51:17:WU00:FS01:0x21:Completed 150000 out of 2500000 steps (6%)
14:56:03:WU00:FS01:0x21:Completed 175000 out of 2500000 steps (7%)
15:00:49:WU00:FS01:0x21:Completed 200000 out of 2500000 steps (8%)
15:05:56:WU00:FS01:0x21:Completed 225000 out of 2500000 steps (9%)
15:10:42:WU00:FS01:0x21:Completed 250000 out of 2500000 steps (10%)
15:15:30:WU00:FS01:0x21:Completed 275000 out of 2500000 steps (11%)
15:20:17:WU00:FS01:0x21:Completed 300000 out of 2500000 steps (12%)
15:25:24:WU00:FS01:0x21:Completed 325000 out of 2500000 steps (13%)
15:30:09:WU00:FS01:0x21:Completed 350000 out of 2500000 steps (14%)
15:34:54:WU00:FS01:0x21:Completed 375000 out of 2500000 steps (15%)
15:39:40:WU00:FS01:0x21:Completed 400000 out of 2500000 steps (16%)
15:44:48:WU00:FS01:0x21:Completed 425000 out of 2500000 steps (17%)
15:49:35:WU00:FS01:0x21:Completed 450000 out of 2500000 steps (18%)
15:54:23:WU00:FS01:0x21:Completed 475000 out of 2500000 steps (19%)
15:59:11:WU00:FS01:0x21:Completed 500000 out of 2500000 steps (20%)
16:04:20:WU00:FS01:0x21:Completed 525000 out of 2500000 steps (21%)
16:09:07:WU00:FS01:0x21:Completed 550000 out of 2500000 steps (22%)
16:13:55:WU00:FS01:0x21:Completed 575000 out of 2500000 steps (23%)
16:18:42:WU00:FS01:0x21:Completed 600000 out of 2500000 steps (24%)
16:23:50:WU00:FS01:0x21:Completed 625000 out of 2500000 steps (25%)
16:28:38:WU00:FS01:0x21:Completed 650000 out of 2500000 steps (26%)
16:33:27:WU00:FS01:0x21:Completed 675000 out of 2500000 steps (27%)
16:38:14:WU00:FS01:0x21:Completed 700000 out of 2500000 steps (28%)
16:43:21:WU00:FS01:0x21:Completed 725000 out of 2500000 steps (29%)
16:48:09:WU00:FS01:0x21:Completed 750000 out of 2500000 steps (30%)
16:52:56:WU00:FS01:0x21:Completed 775000 out of 2500000 steps (31%)
******************************* Date: 2015-10-15 *******************************
16:57:44:WU00:FS01:0x21:Completed 800000 out of 2500000 steps (32%)
17:02:49:WU00:FS01:0x21:Completed 825000 out of 2500000 steps (33%)
17:07:35:WU00:FS01:0x21:Completed 850000 out of 2500000 steps (34%)
17:12:22:WU00:FS01:0x21:Completed 875000 out of 2500000 steps (35%)
17:17:10:WU00:FS01:0x21:Completed 900000 out of 2500000 steps (36%)
17:22:16:WU00:FS01:0x21:Completed 925000 out of 2500000 steps (37%)
17:27:02:WU00:FS01:0x21:Completed 950000 out of 2500000 steps (38%)
17:31:49:WU00:FS01:0x21:Completed 975000 out of 2500000 steps (39%)
17:36:37:WU00:FS01:0x21:Completed 1000000 out of 2500000 steps (40%)
17:41:44:WU00:FS01:0x21:Completed 1025000 out of 2500000 steps (41%)
17:46:31:WU00:FS01:0x21:Completed 1050000 out of 2500000 steps (42%)
17:51:18:WU00:FS01:0x21:Completed 1075000 out of 2500000 steps (43%)
17:56:05:WU00:FS01:0x21:Completed 1100000 out of 2500000 steps (44%)
18:01:12:WU00:FS01:0x21:Completed 1125000 out of 2500000 steps (45%)
18:06:00:WU00:FS01:0x21:Completed 1150000 out of 2500000 steps (46%)
18:10:46:WU00:FS01:0x21:Completed 1175000 out of 2500000 steps (47%)
18:15:34:WU00:FS01:0x21:Completed 1200000 out of 2500000 steps (48%)
21:00:32:WU00:FS01:0x21:Completed 1225000 out of 2500000 steps (49%)
******************************* Date: 2015-10-16 *******************************
00:10:33:WU00:FS01:0x21:Completed 1250000 out of 2500000 steps (50%)
00:18:38:FS01:Paused
00:18:38:FS01:Shutting core down
00:18:38:WU00:FS01:0x21:Caught signal SIGINT(2) on PID 4727
00:18:38:WU00:FS01:0x21:Exiting, please wait. . .
00:18:46:WU00:FS01:0x21:Folding@home Core Shutdown: INTERRUPTED
00:18:46:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
00:18:51:FS01:Unpaused
00:18:51:WU00:FS01:Starting
00:18:51:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1749 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
00:18:51:WU00:FS01:Started FahCore on PID 6037
00:18:51:WU00:FS01:Core PID:6041
00:18:51:WU00:FS01:FahCore 0x21 started
00:18:52:WU00:FS01:0x21:*********************** Log Started 2015-10-16T00:18:51Z ***********************
00:18:52:WU00:FS01:0x21:Project: 9206 (Run 0, Clone 959, Gen 2)
00:18:52:WU00:FS01:0x21:Unit: 0x00000019664f2dd055db8d4af10ecf6c
00:18:52:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
00:18:52:WU00:FS01:0x21:Machine: 1
00:18:52:WU00:FS01:0x21:Digital signatures verified
00:18:52:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
00:18:52:WU00:FS01:0x21:Version 0.0.11
00:18:52:WU00:FS01:0x21:  Found a checkpoint file
00:19:55:WU00:FS01:0x21:Completed 1200000 out of 2500000 steps (48%)
00:19:55:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:24:57:WU00:FS01:0x21:Completed 1225000 out of 2500000 steps (49%)
00:29:44:WU00:FS01:0x21:Completed 1250000 out of 2500000 steps (50%)
00:34:32:WU00:FS01:0x21:Completed 1275000 out of 2500000 steps (51%)
00:39:20:WU00:FS01:0x21:Completed 1300000 out of 2500000 steps (52%)

Code: Select all

18:27:01:WU01:FS01:FahCore 0x21 started
18:27:01:WU01:FS01:0x21:*********************** Log Started 2015-10-15T18:27:01Z ***********************
18:27:01:WU01:FS01:0x21:Project: 9626 (Run 1, Clone 19, Gen 13)
18:27:01:WU01:FS01:0x21:Unit: 0x00000010ab436c9b5609bee1c595f56b
18:27:01:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
18:27:01:WU01:FS01:0x21:Machine: 1
18:27:01:WU01:FS01:0x21:Reading tar file core.xml
18:27:01:WU01:FS01:0x21:Reading tar file integrator.xml
18:27:01:WU01:FS01:0x21:Reading tar file state.xml
18:27:01:WU01:FS01:0x21:Reading tar file system.xml
18:27:02:WU01:FS01:0x21:Digital signatures verified
18:27:02:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
18:27:02:WU01:FS01:0x21:Version 0.0.11
18:27:07:WU00:FS01:Upload 35.99%
18:27:13:WU00:FS01:Upload 65.53%
18:27:19:WU00:FS01:Upload 98.29%
18:27:23:WU00:FS01:Upload complete
18:27:23:WU00:FS01:Server responded WORK_ACK (400)
18:27:23:WU00:FS01:Final credit estimate, 87175.00 points
18:27:23:WU00:FS01:Cleaning up
18:27:43:WU01:FS01:0x21:Completed 0 out of 2000000 steps (0%)
18:27:43:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
18:29:25:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
18:30:58:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
18:49:57:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
******************************* Date: 2015-10-15 *******************************
19:42:22:WU01:FS01:0x21:Completed 80000 out of 2000000 steps (4%)
20:34:36:WU01:FS01:0x21:Completed 100000 out of 2000000 steps (5%)
20:34:36:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint
20:36:10:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
20:37:43:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
20:39:17:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
20:40:50:WU01:FS01:0x21:Completed 80000 out of 2000000 steps (4%)
20:42:24:WU01:FS01:0x21:Completed 100000 out of 2000000 steps (5%)
20:44:07:WU01:FS01:0x21:Completed 120000 out of 2000000 steps (6%)
20:56:27:WU01:FS01:0x21:Completed 140000 out of 2000000 steps (7%)
21:48:51:WU01:FS01:0x21:Completed 160000 out of 2000000 steps (8%)
22:41:13:WU01:FS01:0x21:Completed 180000 out of 2000000 steps (9%)
23:33:07:WU01:FS01:0x21:Completed 200000 out of 2000000 steps (10%)
23:33:07:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint
23:34:41:WU01:FS01:0x21:Completed 120000 out of 2000000 steps (6%)
23:36:15:WU01:FS01:0x21:Completed 140000 out of 2000000 steps (7%)
23:37:48:WU01:FS01:0x21:Completed 160000 out of 2000000 steps (8%)
23:39:21:WU01:FS01:0x21:Completed 180000 out of 2000000 steps (9%)
23:40:55:WU01:FS01:0x21:Completed 200000 out of 2000000 steps (10%)
23:42:39:WU01:FS01:0x21:Completed 220000 out of 2000000 steps (11%)
23:44:12:WU01:FS01:0x21:Completed 240000 out of 2000000 steps (12%)
23:45:46:WU01:FS01:0x21:Completed 260000 out of 2000000 steps (13%)
23:47:19:WU01:FS01:0x21:Completed 280000 out of 2000000 steps (14%)
23:48:52:WU01:FS01:0x21:Completed 300000 out of 2000000 steps (15%)
00:13:27:WU01:FS01:0x21:Completed 320000 out of 2000000 steps (16%)
00:20:06:FS01:Paused
00:20:06:FS01:Shutting core down
00:20:06:WU01:FS01:0x21:Caught signal SIGINT(2) on PID 7024
00:20:06:WU01:FS01:0x21:Exiting, please wait. . .
00:20:08:WU01:FS01:0x21:Folding@home Core Shutdown: INTERRUPTED
00:20:08:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
00:20:13:FS01:Unpaused
00:20:13:WU01:FS01:Starting
00:20:13:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1705 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
00:20:13:WU01:FS01:Started FahCore on PID 7786
00:20:13:WU01:FS01:Core PID:7790
00:20:13:WU01:FS01:FahCore 0x21 started
00:20:13:WU01:FS01:0x21:*********************** Log Started 2015-10-16T00:20:13Z ***********************
00:20:13:WU01:FS01:0x21:Project: 9626 (Run 1, Clone 19, Gen 13)
00:20:13:WU01:FS01:0x21:Unit: 0x00000010ab436c9b5609bee1c595f56b
00:20:13:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
00:20:13:WU01:FS01:0x21:Machine: 1
00:20:13:WU01:FS01:0x21:Digital signatures verified
00:20:13:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
00:20:13:WU01:FS01:0x21:Version 0.0.11
00:20:13:WU01:FS01:0x21:  Found a checkpoint file
00:20:53:WU01:FS01:0x21:Completed 300000 out of 2000000 steps (15%)
00:20:53:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:22:36:WU01:FS01:0x21:Completed 320000 out of 2000000 steps (16%)
00:24:12:WU01:FS01:0x21:Completed 340000 out of 2000000 steps (17%)
00:25:48:WU01:FS01:0x21:Completed 360000 out of 2000000 steps (18%)
00:27:24:WU01:FS01:0x21:Completed 380000 out of 2000000 steps (19%)
00:29:00:WU01:FS01:0x21:Completed 400000 out of 2000000 steps (20%)
00:30:47:WU01:FS01:0x21:Completed 420000 out of 2000000 steps (21%)
00:32:23:WU01:FS01:0x21:Completed 440000 out of 2000000 steps (22%)
00:33:59:WU01:FS01:0x21:Completed 460000 out of 2000000 steps (23%)
00:35:35:WU01:FS01:0x21:Completed 480000 out of 2000000 steps (24%)
00:37:11:WU01:FS01:0x21:Completed 500000 out of 2000000 steps (25%)
00:38:57:WU01:FS01:0x21:Completed 520000 out of 2000000 steps (26%)
00:40:33:WU01:FS01:0x21:Completed 540000 out of 2000000 steps (27%)
00:48:07:WU01:FS01:0x21:Completed 560000 out of 2000000 steps (28%)

Code: Select all

21:54:13:WU01:FS00:0x21:*********************** Log Started 2015-10-15T21:54:13Z ***********************
21:54:13:WU01:FS00:0x21:Project: 9628 (Run 0, Clone 10, Gen 12)
21:54:13:WU01:FS00:0x21:Unit: 0x0000000cab436c9b5609bee10ada3eab
21:54:13:WU01:FS00:0x21:CPU: 0x00000000000000000000000000000000
21:54:13:WU01:FS00:0x21:Machine: 0
21:54:13:WU01:FS00:0x21:Reading tar file core.xml
21:54:13:WU01:FS00:0x21:Reading tar file integrator.xml
21:54:13:WU01:FS00:0x21:Reading tar file state.xml
21:54:13:WU01:FS00:0x21:Reading tar file system.xml
21:54:13:WU01:FS00:0x21:Digital signatures verified
21:54:13:WU01:FS00:0x21:Folding@home GPU Core21 Folding@home Core
21:54:13:WU01:FS00:0x21:Version 0.0.11
21:54:19:WU00:FS00:Upload 28.88%
21:54:25:WU00:FS00:Upload 52.42%
21:54:31:WU00:FS00:Upload 76.31%
21:54:53:WU01:FS00:0x21:Completed 0 out of 2000000 steps (0%)
21:54:53:WU01:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:54:56:WU00:FS00:Upload complete
21:54:56:WU00:FS00:Server responded WORK_ACK (400)
21:54:56:WU00:FS00:Final credit estimate, 100276.00 points
21:54:56:WU00:FS00:Cleaning up
21:56:59:WU01:FS00:0x21:Completed 20000 out of 2000000 steps (1%)
21:58:58:WU01:FS00:0x21:Completed 40000 out of 2000000 steps (2%)
22:00:57:WU01:FS00:0x21:Completed 60000 out of 2000000 steps (3%)
22:02:55:WU01:FS00:0x21:Completed 80000 out of 2000000 steps (4%)
22:04:53:WU01:FS00:0x21:Completed 100000 out of 2000000 steps (5%)
22:06:56:WU01:FS00:0x21:Completed 120000 out of 2000000 steps (6%)
22:08:54:WU01:FS00:0x21:Completed 140000 out of 2000000 steps (7%)
22:10:53:WU01:FS00:0x21:Completed 160000 out of 2000000 steps (8%)
22:12:51:WU01:FS00:0x21:Completed 180000 out of 2000000 steps (9%)
22:14:50:WU01:FS00:0x21:Completed 200000 out of 2000000 steps (10%)
22:16:58:WU01:FS00:0x21:Completed 220000 out of 2000000 steps (11%)
22:18:57:WU01:FS00:0x21:Completed 240000 out of 2000000 steps (12%)
22:20:55:WU01:FS00:0x21:Completed 260000 out of 2000000 steps (13%)
22:22:54:WU01:FS00:0x21:Completed 280000 out of 2000000 steps (14%)
22:24:52:WU01:FS00:0x21:Completed 300000 out of 2000000 steps (15%)
******************************* Date: 2015-10-15 *******************************
22:31:36:WU01:FS00:0x21:Completed 320000 out of 2000000 steps (16%)
23:26:22:WU01:FS00:0x21:Completed 340000 out of 2000000 steps (17%)
00:19:52:FS00:Paused
00:19:52:FS00:Shutting core down
00:19:53:WU01:FS00:0x21:Caught signal SIGINT(2) on PID 6442
00:19:53:WU01:FS00:0x21:Exiting, please wait. . .
00:19:53:WU01:FS00:0x21:Folding@home Core Shutdown: INTERRUPTED
00:19:54:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
00:20:01:FS00:Unpaused
00:20:01:WU01:FS00:Starting
00:20:01:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1736 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
00:20:01:WU01:FS00:Started FahCore on PID 6839
00:20:01:WU01:FS00:Core PID:6843
00:20:01:WU01:FS00:FahCore 0x21 started
00:20:01:WU01:FS00:0x21:*********************** Log Started 2015-10-16T00:20:01Z ***********************
00:20:01:WU01:FS00:0x21:Project: 9628 (Run 0, Clone 10, Gen 12)
00:20:01:WU01:FS00:0x21:Unit: 0x0000000cab436c9b5609bee10ada3eab
00:20:01:WU01:FS00:0x21:CPU: 0x00000000000000000000000000000000
00:20:01:WU01:FS00:0x21:Machine: 0
00:20:01:WU01:FS00:0x21:Digital signatures verified
00:20:01:WU01:FS00:0x21:Folding@home GPU Core21 Folding@home Core
00:20:01:WU01:FS00:0x21:Version 0.0.11
00:20:01:WU01:FS00:0x21:  Found a checkpoint file
00:20:41:WU01:FS00:0x21:Completed 300000 out of 2000000 steps (15%)
00:20:41:WU01:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:22:47:WU01:FS00:0x21:Completed 320000 out of 2000000 steps (16%)
00:24:45:WU01:FS00:0x21:Completed 340000 out of 2000000 steps (17%)
00:26:44:WU01:FS00:0x21:Completed 360000 out of 2000000 steps (18%)
00:28:43:WU01:FS00:0x21:Completed 380000 out of 2000000 steps (19%)
00:30:41:WU01:FS00:0x21:Completed 400000 out of 2000000 steps (20%)
00:32:50:WU01:FS00:0x21:Completed 420000 out of 2000000 steps (21%)
00:34:49:WU01:FS00:0x21:Completed 440000 out of 2000000 steps (22%)
00:36:47:WU01:FS00:0x21:Completed 460000 out of 2000000 steps (23%)
00:38:46:WU01:FS00:0x21:Completed 480000 out of 2000000 steps (24%)
00:40:44:WU01:FS00:0x21:Completed 500000 out of 2000000 steps (25%)
00:42:53:WU01:FS00:0x21:Completed 520000 out of 2000000 steps (26%)
00:44:51:WU01:FS00:0x21:Completed 540000 out of 2000000 steps (27%)
00:46:50:WU01:FS00:0x21:Completed 560000 out of 2000000 steps (28%)
00:48:48:WU01:FS00:0x21:Completed 580000 out of 2000000 steps (29%)
00:50:46:WU01:FS00:0x21:Completed 600000 out of 2000000 steps (30%)
00:52:55:WU01:FS00:0x21:Completed 620000 out of 2000000 steps (31%)
00:54:54:WU01:FS00:0x21:Completed 640000 out of 2000000 steps (32%)
00:56:52:WU01:FS00:0x21:Completed 660000 out of 2000000 steps (33%)
00:58:51:WU01:FS00:0x21:Completed 680000 out of 2000000 steps (34%)
01:13:28:WU01:FS00:0x21:Completed 700000 out of 2000000 steps (35%)
01:13:28:WU01:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
01:15:27:WU01:FS00:0x21:Completed 620000 out of 2000000 steps (31%)
01:17:25:WU01:FS00:0x21:Completed 640000 out of 2000000 steps (32%)
01:19:24:WU01:FS00:0x21:Completed 660000 out of 2000000 steps (33%)
01:21:22:WU01:FS00:0x21:Completed 680000 out of 2000000 steps (34%)
01:23:21:WU01:FS00:0x21:Completed 700000 out of 2000000 steps (35%)
01:25:30:WU01:FS00:0x21:Completed 720000 out of 2000000 steps (36%)
01:27:29:WU01:FS00:0x21:Completed 740000 out of 2000000 steps (37%)
01:29:27:WU01:FS00:0x21:Completed 760000 out of 2000000 steps (38%)
01:31:25:WU01:FS00:0x21:Completed 780000 out of 2000000 steps (39%)
01:33:23:WU01:FS00:0x21:Completed 800000 out of 2000000 steps (40%)
01:35:32:WU01:FS00:0x21:Completed 820000 out of 2000000 steps (41%)
01:37:31:WU01:FS00:0x21:Completed 840000 out of 2000000 steps (42%)
01:39:29:WU01:FS00:0x21:Completed 860000 out of 2000000 steps (43%)
01:41:27:WU01:FS00:0x21:Completed 880000 out of 2000000 steps (44%)
01:43:26:WU01:FS00:0x21:Completed 900000 out of 2000000 steps (45%)
01:45:34:WU01:FS00:0x21:Completed 920000 out of 2000000 steps (46%)
01:47:32:WU01:FS00:0x21:Completed 940000 out of 2000000 steps (47%)
01:49:31:WU01:FS00:0x21:Completed 960000 out of 2000000 steps (48%)
01:51:29:WU01:FS00:0x21:Completed 980000 out of 2000000 steps (49%)
01:53:28:WU01:FS00:0x21:Completed 1000000 out of 2000000 steps (50%)
01:55:37:WU01:FS00:0x21:Completed 1020000 out of 2000000 steps (51%)
01:57:35:WU01:FS00:0x21:Completed 1040000 out of 2000000 steps (52%)
01:59:33:WU01:FS00:0x21:Completed 1060000 out of 2000000 steps (53%)
02:01:31:WU01:FS00:0x21:Completed 1080000 out of 2000000 steps (54%)
02:03:30:WU01:FS00:0x21:Completed 1100000 out of 2000000 steps (55%)
02:05:39:WU01:FS00:0x21:Completed 1120000 out of 2000000 steps (56%)
02:07:37:WU01:FS00:0x21:Completed 1140000 out of 2000000 steps (57%)
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Frodo The Hobbit
Posts: 8
Joined: Sun Dec 02, 2007 7:25 pm
Location: Bordeaux, France

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by Frodo The Hobbit »

Previous core does not use the full power of maxwell (970 ou 980) GPU was load at 85 maybe 90% with the lastest core 21 you can expect a full load of your GPU (about 97%) except in sanity check phase. IT use also different form or computation thath warm up more the gpu. I killed a GTX970 in 2 monthes with biosmod and 1400 Mhz. You also have to keep watching to the temperature. It could activate throttle of your GPU more frequently than before.
Keep in mind that your previous settings has to be revised. My GTX970 was OC at 1,5 Ghz right now it's more between 1350 / 1400. I reach 1400 only on excellent GPU. Bruce give us the good advice we need to keep in mind what are the "stock" frequency of or GPUs.
Farmer 4 Monster Folding since 2008...
Reviewer 4 FAH-@ddict
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by Grandpa_01 »

The only problem is Windows does not have the problem at High OC and Linux has the problem at default OC, I have had and completed 5 of these on the Windows card between 1480Mhz and 1530Mhz on a GTX 980 Classified. and I have had 14 of these on the other 2 - 980 Classifieds and a 970 SC running on Linux. Looking back every one of the WU's run on the Linux rigs have had the problem.

That tells me it is most likely not an OC or hardware problem although there is still a slight possibility it could be, so what does that leave. to me it leaves Drivers and code, I am working on eliminating drivers and code I can not help with that. I will swap the Windows 980 with a Linux 980 as soon as I get a chance and that will should eliminate Hardware if the Windows card has problems no when run on Linux and if the Linux card runs fine when OCed on windows. In reality the Windows card should do the same on Linux as it does on Windows and visa versa.

I think we may be barking up the wrong tree here when we say the OC is creating the problem and things do not get fixed if we look in the wrong place for the problem. :ewink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bigblock990
Posts: 20
Joined: Wed Sep 09, 2015 12:42 pm

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by bigblock990 »

Frodo The Hobbit wrote:Previous core does not use the full power of maxwell (970 ou 980) GPU was load at 85 maybe 90% with the lastest core 21 you can expect a full load of your GPU (about 97%) except in sanity check phase. IT use also different form or computation thath warm up more the gpu. I killed a GTX970 in 2 monthes with biosmod and 1400 Mhz. You also have to keep watching to the temperature. It could activate throttle of your GPU more frequently than before.
Keep in mind that your previous settings has to be revised. My GTX970 was OC at 1,5 Ghz right now it's more between 1350 / 1400. I reach 1400 only on excellent GPU. Bruce give us the good advice we need to keep in mind what are the "stock" frequency of or GPUs.
Are you folding in windows? I fold in Linux with four maxwell cards. All run 98-99% usage for both core18 and core21.

To Grandpa_01 I have one rig running 346.59, and one with 346.96 and I have the same problems as you with the openmm_21 projects. I backed my clocks way down so I can complete them without issue now.

Also regarding drivers 352.30 produced ~45k ppd less than 346.59 on gtx 970. I haven't tested anything newer to see if that improved. 346.96(needed to recognize 980ti) works as good as 346.59.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

I've got two 980's running on Linux, overclocked to ~1450MHz including boost. Fine on Core_17/18 and early Core_21's, not very stable on later core_21's with the old 346.35 drivers. I went straight to 355.11, a lot more stable but at the expense of ~5% PPD.

Still fails (too many bad states) on some Core_21 WUs, I'm trying to isolate which ones and whether it's overclock-related.

Gut feeling is that it isn't- the project that runs the cards hottest (and closest to the power cap) is P10495, but they don't result in bad states... (Yet, anyway)
Image
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

bigblock990 wrote:I backed my clocks way down so I can complete them without issue now.
I think this sumarize the situation perfectly.

We're back to the advice we've always been giving in these situations : if you get repeated failures on you GPU, reduce overclocking (or try at nVidia reference clocks to be sure).

So basically, we need to treat all modern GPUs as factory overclocked, because of the Boot mechanism. :(

Of course, if the failures continue at reference clocks (or even with an underclock), then we could start to blame the core 21 and/or the projects.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

There's something odd about Core_21, whether it's the WU or the core I couldn't say.

It's a little off-topic, but I quite often get this on the Kepler (GTX 780 Ti):

Code: Select all

13:38:05:WU01:FS01:0x21:Project: 9633 (Run 1, Clone 23, Gen 13)
.
.
13:38:05:WU01:FS01:0x21:Digital signatures verified
13:38:05:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
13:38:05:WU01:FS01:0x21:Version 0.0.11
13:38:25:WU01:FS01:0x21:Completed 0 out of 2000000 steps (0%)
13:38:25:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
13:38:29:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint
13:40:28:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
13:42:28:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
13:44:27:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
Note that it only took 4 seconds to hit a bad state (barely time to get going), restarts and then carries on to complete the WU without further error.

(Don't think I've seen it on the Maxwells, but not 100% sure)
Image
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

I've seen it on Maxwell too, and the developers know about this issue. But I don't know if the cause has been identified or if a fix has been found.

edit : acknoladgement of the issue from John : viewtopic.php?p=279802#p279802
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

OK, thanks.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by Grandpa_01 »

toTOW wrote:
bigblock990 wrote:I backed my clocks way down so I can complete them without issue now.
I think this sumarize the situation perfectly.

We're back to the advice we've always been giving in these situations : if you get repeated failures on you GPU, reduce overclocking (or try at nVidia reference clocks to be sure).

So basically, we need to treat all modern GPUs as factory overclocked, because of the Boot mechanism. :(

Of course, if the failures continue at reference clocks (or even with an underclock), then we could start to blame the core 21 and/or the projects.

I agree with the temporary solution but it shouldn't be considered a permanent solution, there is definitely a problem and that needs to be addressed, I believe removing them from Linux would be a better choice at this point in time. I am not quite sure I understand the concept of forcing a person to run something that doesn't work at default settings, when they can easily be removed at Stanford's end, they work great on windows but really struggle on Linux.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by toTOW »

If there is a problem with Boost clocks (or Boost management), it's a problem that nVidia will have to deal with, not us/Pande Group ...

In the meantime, we don't have enough evidences to conclude, and a new version of the core 21 is planned to address this : it should return more debug data to the server when a bad state occurs.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

toTOW wrote:If there is a problem with Boost clocks (or Boost management), it's a problem that nVidia will have to deal with, not us/Pande Group ...
The problem is only with FAH software, running Core_21 under Linux, nothing to do with NVidia.

Your statement exemplifies PG's ivory-tower approach and lack of appreciation of the real world outside academia.
Image
billford
Posts: 1003
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: 9634 (Run 0, Clone 9, Gen 5)

Post by billford »

Grandpa_01 wrote:I believe removing them from Linux would be a better choice at this point in time
Or at least putting them back to adv so the donors have a choice.
Image
Post Reply