P3906 Gen 1 all failing

Moderators: Site Moderators, FAHC Science Team

Pette Broad
Posts: 128
Joined: Mon Dec 03, 2007 9:38 pm
Hardware configuration: CPU folding on only one machine a laptop

GPU Hardware..
3 x 460
1 X 260
4 X 250

+ 1 X 9800GT (3 days a week)
Location: Chester U.K

P3906 Gen 1 all failing

Post by Pette Broad »

Just had a whole rash of failures, all 3906's, SEVERAL WU's on different machines


MACHINE 1

Code: Select all

[10:15:26] Project: 3906 (Run 16, Clone 4, Gen 1)
[10:15:26] 
[10:15:26] Assembly optimizations on if available.
[10:15:26] Entering M.D.
[10:15:40] CoreStatus = C000000D (-1073741811)
[10:15:40] Client-core communications error: ERROR 0xc000000d
[10:15:40] Deleting current work unit & continuing...
[10:15:44] - Preparing to get new work unit...
[10:15:44] + Attempting to get work packet
[10:15:44] - Connecting to assignment server
[10:15:44] - Successful: assigned to (171.64.122.88).
[10:15:44] + News From Folding@Home: Welcome to Folding@Home
[10:15:45] Loaded queue successfully.
[10:15:48] + Closed connections
[10:15:53] 
[10:15:53] + Processing work unit
[10:15:53] Core required: FahCore_7b.exe
[10:15:53] Core found.
[10:15:53] Working on Unit 04 [January 17 10:15:53]
[10:15:53] + Working ...
[10:15:53] 
[10:15:53] *------------------------------*
[10:15:53] Folding@Home Double Gromacs Core B
[10:15:53] Version 1.04 (Fri Aug 10 16:46:39 PDT 2007)
[10:15:53] 
[10:15:53] Preparing to commence simulation
[10:15:53] - Files status OK
[10:15:53] - Expanded 341797 -> 1169393 (decompressed 342.1 percent)
[10:15:53] 
[10:15:53] Project: 3906 (Run 16, Clone 4, Gen 1)
[10:15:53] 
[10:15:54] Assembly optimizations on if available.
[10:15:54] Entering M.D.
[10:16:05] CoreStatus = C000000D (-1073741811)
[10:16:05] Client-core communications error: ERROR 0xc000000d
[10:16:05] Deleting current work unit & continuing...
[10:16:09] - Preparing to get new work unit...
[10:16:09] + Attempting to get work packet
[10:16:09] - Connecting to assignment server
[10:16:10] - Successful: assigned to (171.64.122.88).
[10:16:10] + News From Folding@Home: Welcome to Folding@Home
[10:16:10] Loaded queue successfully.
[10:16:15] + Closed connections
MACHINE 2

Code: Select all

[09:13:51] Project: 3906 (Run 14, Clone 2, Gen 1)
[09:13:51] 
[09:13:51] Assembly optimizations on if available.
[09:13:51] Entering M.D.
[09:58:12] CoreStatus = C000000D (-1073741811)
[09:58:12] Client-core communications error: ERROR 0xc000000d
[09:58:12] Deleting current work unit & continuing...
[09:58:24] Trying to send all finished work units
[09:58:24] + No unsent completed units remaining.
[09:58:24] - Preparing to get new work unit...
[09:58:24] + Attempting to get work packet
[09:58:24] - Will indicate memory of 2047 MB.
[09:58:24] - Connecting to assignment server
[09:58:25] - Successful: assigned to (171.64.122.88).
Carried on like this for 6 or 7 attempts.


MACHINE 3

Code: Select all

[09:58:22] Project: 3906 (Run 23, Clone 0, Gen 1)
[09:58:22] 
[09:58:22] Assembly optimizations on if available.
[09:58:22] Entering M.D.
[09:58:34] CoreStatus = C000000D (-1073741811)
[09:58:34] Client-core communications error: ERROR 0xc000000d
[09:58:34] Deleting current work unit & continuing...
[09:58:38] - Preparing to get new work unit...
[09:58:38] + Attempting to get work packet
[09:58:38] - Connecting to assignment server
[09:58:39] - Successful: assigned to (171.64.122.88).
[09:58:39] + News From Folding@Home: Welcome to Folding@Home
[09:58:39] Loaded queue successfully.
[09:58:43] + Closed connections
[09:58:48] 
[09:58:48] + Processing work unit
[09:58:48] Core required: FahCore_7b.exe
[09:58:48] Core found.
[09:58:48] Working on Unit 05 [January 17 09:58:48]
[09:58:48] + Working ...
[09:58:48] 
[09:58:48] *------------------------------*
[09:58:48] Folding@Home Double Gromacs Core B
[09:58:48] Version 1.04 (Fri Aug 10 16:46:39 PDT 2007)
[09:58:48] 
[09:58:48] Preparing to commence simulation
[09:58:48] - Files status OK
[09:58:48] - Expanded 330340 -> 1132781 (decompressed 342.9 percent)
[09:58:48] 
[09:58:48] Project: 3906 (Run 23, Clone 0, Gen 1)
[09:58:48] 
[09:58:49] Assembly optimizations on if available.
[09:58:49] Entering M.D.
[09:59:00] CoreStatus = C000000D (-1073741811)
[09:59:00] Client-core communications error: ERROR 0xc000000d
[09:59:00] - Attempting to download new core...
[09:59:00] + Downloading new core: FahCore_7b.exe
[09:59:01] + 10240 bytes downloaded
--------------------------------------------
[09:59:05] + 724218 bytes downloaded
[09:59:05] Verifying core Core_7b.fah...
[09:59:05] Signature is VALID
[09:59:05] 
[09:59:05] Trying to unzip core FahCore_7b.exe
[09:59:05] Decompressed FahCore_7b.exe (2101248 bytes) successfully
[09:59:05] + Core successfully engaged
[09:59:05] Deleting current work unit & continuing...
MACHINE 4

Code: Select all

[11:29:49] Project: 3906 (Run 54, Clone 1, Gen 1)
[11:29:49] 
[11:29:50] Assembly optimizations on if available.
[11:29:50] Entering M.D.
[11:48:15] CoreStatus = C000000D (-1073741811)
[11:48:15] Client-core communications error: ERROR 0xc000000d
[11:48:15] Deleting current work unit & continuing...
The main thing is that these errors throw up a core error on the desktop and need to be restarted manually :(

EDIT, Just checked and I found that I have another seven P3906 units in progress, they are all Gen 0 and have Run numbers in the 4xxx range and are progressing O.K. The 3 machines whose logs are posted above eventually got allocated other work.

Pete
Image
[Fight]Gor
Posts: 6
Joined: Thu Jan 17, 2008 12:44 pm

Re: P3906 Gen 1 all failing

Post by [Fight]Gor »

I'll second that. The same issue here.
Pette Broad
Posts: 128
Joined: Mon Dec 03, 2007 9:38 pm
Hardware configuration: CPU folding on only one machine a laptop

GPU Hardware..
3 x 460
1 X 260
4 X 250

+ 1 X 9800GT (3 days a week)
Location: Chester U.K

Re: P3906 Gen 1 all failing

Post by Pette Broad »

2 More in the last 10 minutes :shock:

Code: Select all

[13:07:21] Project: 3906 (Run 102, Clone 4, Gen 1)
[13:07:21] 
[13:07:22] Assembly optimizations on if available.
[13:07:22] Entering M.D.
[13:07:34] CoreStatus = C000000D (-1073741811)
----------------------------------------------------------
[13:10:56] Project: 3906 (Run 96, Clone 2, Gen 1)
[13:10:56] 
[13:10:56] Assembly optimizations on if available.
[13:10:56] Entering M.D.
[13:11:08] CoreStatus = C000000D (-1073741811)
Image
Jeannie
Posts: 49
Joined: Sun Dec 02, 2007 3:07 am
Location: Central New Jersey

Re: P3906 Gen 1 all failing

Post by Jeannie »

Same problem

[10:40:36] Project: 3906 (Run 45, Clone 2, Gen 1)
[10:40:36]
[10:40:36] Assembly optimizations on if available.
[10:40:36] Entering M.D.
[10:40:42] mdrun returned -1
[10:40:42] Going to send back what have done.
[10:40:42] logfile size: 858
[10:40:42] - Writing 1396 bytes of core data to disk...
[10:40:42] Done: 884 -> 584 (compressed to 66.0 percent)
[10:40:42] ... Done.
[10:40:42]
[10:40:42] Folding@home Core Shutdown: EARLY_UNIT_END
[10:40:46] CoreStatus = 72 (114)
<snip>
[10:41:02] Project: 3906 (Run 48, Clone 2, Gen 1)
[10:41:02]
[10:41:02] Assembly optimizations on if available.
[10:41:02] Entering M.D.
[10:41:08] mdrun returned -1
[10:41:08] Going to send back what have done.
[10:41:08] logfile size: 858
[10:41:08] - Writing 1396 bytes of core data to disk...
[10:41:08] Done: 884 -> 590 (compressed to 66.7 percent)
[10:41:08] ... Done.
[10:41:08]
[10:41:08] Folding@home Core Shutdown: EARLY_UNIT_END
[10:41:11] CoreStatus = 72 (114)
[10:41:11] Sending work to server
Pette Broad
Posts: 128
Joined: Mon Dec 03, 2007 9:38 pm
Hardware configuration: CPU folding on only one machine a laptop

GPU Hardware..
3 x 460
1 X 260
4 X 250

+ 1 X 9800GT (3 days a week)
Location: Chester U.K

Re: P3906 Gen 1 all failing

Post by Pette Broad »

Those have failed in a slightly different way.......but I've had one like that too :)

Code: Select all

[09:55:16] Project: 3906 (Run 9, Clone 3, Gen 1)
[09:55:16] 
[09:55:16] Assembly optimizations on if available.
[09:55:16] Entering M.D.
[09:55:22] mdrun returned -1
[09:55:22] Going to send back what have done.
[09:55:22] logfile size: 858
[09:55:22] - Writing 1396 bytes of core data to disk...
[09:55:22] Done: 884 -> 585 (compressed to 66.1 percent)
[09:55:22]   ... Done.
[09:55:22] 
[09:55:22] Folding@home Core Shutdown: EARLY_UNIT_END
[09:55:26] CoreStatus = 72 (114)
[09:55:26] Sending work to server
Image
daveb

Re: P3906 Gen 1 all failing

Post by daveb »

I have had 3 of these in a row fail teh same way. I get a Windows message box saying that core7b has been killed and then a client-core communication error. Everyting was then deleted and the machine tried to get another unit. One thing I noticed on all of the failed units is that the download size of the wudata_0x.dat file is ~340 k on all 3 of the failed units. The earleir units of p3906 and p3907 I have seen were all around 240 k.

3906 (Run 132, Clone 4, Gen 1) payload 342246
3906 (Run 171, Clone 1, Gen 1) payload 330471
3906 (Run 183, Clone 3, Gen 1) payload 329464

Actually, the last of these did not generate the Windows error box, just a message in the console window saying mdrun returned -1 followed by a standard EUE message.

Dave
nzcarrick
Posts: 1
Joined: Fri Dec 28, 2007 1:32 pm

Re: P3906 Gen 1 all failing

Post by nzcarrick »

Yip Same Here

Tried deleting 7b core thinking it was corrupt, it downloaded again and then threw up same fault.


Any ideas?


Code: Select all

*------------------------------*
[17:38:51] Folding@Home Double Gromacs Core B
[17:38:51] Version 1.04 (Fri Aug 10 16:46:39 PDT 2007)
[17:38:51] 
[17:38:51] Preparing to commence simulation
[17:38:51] - Files status OK
[17:38:51] - Expanded 341674 -> 1169293 (decompressed 342.2 percent)
[17:38:51] 
[17:38:51] Project: 3906 (Run 98, Clone 4, Gen 1)
[17:38:51] 
[17:38:51] Assembly optimizations on if available.
[17:38:51] Entering M.D.
[17:39:05] CoreStatus = C000000D (-1073741811)
[17:39:05] Client-core communications error: ERROR 0xc000000d
[17:39:05] Deleting current work unit & continuing...
[17:39:09] Trying to send all finished work units
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: P3906 Gen 1 all failing

Post by 7im »

PM sent to kasson.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Wonder Bread
Posts: 1
Joined: Fri Jan 18, 2008 12:29 am

Re: P3906 Gen 1 all failing

Post by Wonder Bread »

Had the same problem myself, registered specifically to report it. First error I've had since I started folding a few years ago. Windows pops up and says 'FahCore_7b.exe' has crashed. I'm using the 6.0 beta 1 client if that matters.
[00:24:43] Core required: FahCore_7b.exe
[00:24:43] Core found.
[00:24:43] Working on Unit 08 [January 18 00:24:43]
[00:24:43] + Working ...
[00:24:43] - Calling 'FahCore_7b.exe -dir work/ -suffix 08 -checkpoint 15 -forceasm -verbose -lifeline 2804 -version 600'
[00:24:43] *------------------------------*
[00:24:43] Folding@Home Double Gromacs Core B
[00:24:43] Version 1.04 (Fri Aug 10 16:46:39 PDT 2007)
[00:24:43]
[00:24:43] Preparing to commence simulation
[00:24:43] - Ensuring status. Please wait.
[00:24:52] - Assembly optimizations manually forced on.
[00:24:52] - Not checking prior termination.
[00:24:52] - Expanded 329392 -> 1131813 (decompressed 343.6 percent)
[00:24:52]
[00:24:52] Project: 3906 (Run 230, Clone 3, Gen 1)
[00:24:52]
[00:24:53] Assembly optimizations on if available.
[00:24:53] Entering M.D.
[00:25:11] CoreStatus = C000000D (-1073741811)
[00:25:11] Client-core communications error: ERROR 0xc000000d
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: P3906 Gen 1 all failing

Post by 7im »

7im wrote:PM sent to kasson.
Kasson said he would take a look at the problem, as soon as he got back in to the office. Unless everyone starts having problems with Gen 2 as well, I think we have enough reports. Thanks.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
MDCRL
Posts: 41
Joined: Sun Dec 23, 2007 10:10 pm
Location: Delaware, USA

Re: P3906 Gen 1 all failing

Post by MDCRL »

for what it's worth.... and maybe the info can help narrow down the problem.....

I've burned through 16 of those WU's in the last week or so... using 5.04 console, moving really fast on a xeon processor and an AMD 64.... 10-15 minutes per step - been lucky to get a bunch of good ones I guess....
- good points for the time spent

only had one early end unit recently on a 3903
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: P3906 Gen 1 all failing

Post by 7im »

MDCRL wrote:...
I've burned through 16 of those WU's in the last week or so...

only had one early end unit recently on a 3903
I think we have it narrowed down quite a bit. Only work units from Project 3906, and only work units from Generation 1 (Run xx, Clone xx, Gen 01)

MDCRL, how many Gen 1 WUs in project 3906 were in those 16 WUs you worked on this last week or so?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
MDCRL
Posts: 41
Joined: Sun Dec 23, 2007 10:10 pm
Location: Delaware, USA

Re: P3906 Gen 1 all failing

Post by MDCRL »

I'll have to check through the logfiles after work today - I should have some #'s for you tonight.
I have about 5 or 6 currently active 3906 WU's, all progressing well
- they are all Gen 0 though....

I am noticing something else while going through these active WU's..

the Actual % complete is not matching the reported % complete in FahMonitor.
I use FahMon 2.3.1 - it is usually accurate, but not on these it seems

- any relevance in that relating to "client-core communication error"?

Let me know if you want the info on these current active WU also.....
KWSN_Dagger
Posts: 2
Joined: Wed Dec 12, 2007 10:04 pm

Re: P3906 Gen 1 all failing

Post by KWSN_Dagger »

MDCRL wrote: I am noticing something else while going through these active WU's..

the Actual % complete is not matching the reported % complete in FahMonitor.
I use FahMon 2.3.1 - it is usually accurate, but not on these it seems

- any relevance in that relating to "client-core communication error"?

Let me know if you want the info on these current active WU also.....
I use FahMon as well.. these WU's are only 50 frames long, so FahMon will work up to 50%, then after that reports it as % minus 50%. It's wierd I know, but maybe it's looked after when the new version 2.3.2 comes out. Uncle Fungus will know more than I.
Image
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: P3906 Gen 1 all failing

Post by kasson »

Thanks for the reports. I stopped P3906 assigning last night until we can have a proper look at the problem. Hopefully we can iron this out rapidly and have them back running successfully in the near future.
Post Reply