repeated failure with project 100nn [Too many cores]

Moderators: Site Moderators, FAHC Science Team

AtwaterFS
Posts: 30
Joined: Wed Jan 21, 2009 9:08 pm

Re: Project 10085 Failed (48 core system)

Post by AtwaterFS »

Also got same error and fahcore crash on vanilla setup 16 core system - Project 10085 (Run 2, Clone 31, Gen 7).

As someone else voiced - it's a constantly ocuring problem that I dont have time to repeatedly deal with it - I miss the past few months where I could set it and forget it.

Thanks,

Code: Select all

*********************** Log Started 2014-12-03T00:06:25Z ***********************
00:06:25:************************* Folding@home Client *************************
00:06:25:      Website: http://folding.stanford.edu/
00:06:25:    Copyright: (c) 2009-2014 Stanford University
00:06:25:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:06:25:         Args: --open-web-control
00:06:25:       Config: C:/Users/Administrator/AppData/Roaming/FAHClient/config.xml
00:06:25:******************************** Build ********************************
00:06:25:      Version: 7.4.4
00:06:25:         Date: Mar 4 2014
00:06:25:         Time: 20:26:54
00:06:25:      SVN Rev: 4130
00:06:25:       Branch: fah/trunk/client
00:06:25:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
00:06:25:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
00:06:25:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
00:06:25:     Platform: win32 XP
00:06:25:         Bits: 32
00:06:25:         Mode: Release
00:06:25:******************************* System ********************************
00:06:25:          CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
00:06:25:       CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
00:06:25:         CPUs: 16
00:06:25:       Memory: 7.99GiB
00:06:25:  Free Memory: 5.14GiB
00:06:25:      Threads: WINDOWS_THREADS
00:06:25:   OS Version: 6.1
00:06:25:  Has Battery: false
00:06:25:   On Battery: false
00:06:25:   UTC Offset: -5
00:06:25:          PID: 3436
00:06:25:          CWD: C:/Users/Administrator/AppData/Roaming/FAHClient
00:06:25:           OS: Windows Server 2008 R2 Standard
00:06:25:      OS Arch: AMD64
00:06:25:         GPUs: 0
00:06:25:         CUDA: Not detected
00:06:25:Win32 Service: false
00:06:25:***********************************************************************


06:19:53:WU00:FS00:0xa4:*------------------------------*
06:19:53:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
06:19:53:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
06:19:53:WU00:FS00:0xa4:
06:19:53:WU00:FS00:0xa4:Preparing to commence simulation
06:19:53:WU00:FS00:0xa4:- Looking at optimizations...
06:19:53:WU00:FS00:0xa4:- Created dyn
06:19:53:WU00:FS00:0xa4:- Files status OK
06:19:53:WU00:FS00:0xa4:- Expanded 54222 -> 201448 (decompressed 371.5 percent)
06:19:53:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=54222 data_size=201448, decompressed_data_size=201448 diff=0
06:19:53:WU00:FS00:0xa4:- Digital signature verified
06:19:53:WU00:FS00:0xa4:
06:19:53:WU00:FS00:0xa4:Project: 10085 (Run 2, Clone 31, Gen 7)
06:19:53:WU00:FS00:0xa4:
06:19:53:WU00:FS00:0xa4:Assembly optimizations on if available.
06:19:53:WU00:FS00:0xa4:Entering M.D.
06:19:59:WU01:FS00:Upload 72.75%
06:19:59:WU00:FS00:0xa4:Mapping NT from 15 to 15 
06:20:02:WU01:FS00:Upload complete
06:20:02:WU01:FS00:Server responded WORK_ACK (400)
06:20:02:WU01:FS00:Final credit estimate, 1887.00 points
06:20:02:WU01:FS00:Cleaning up
******************************* Date: 2014-12-07 *******************************
00:17:05:WARNING:WU00:FS00:FahCore returned an unknown error code which probably indicates that it crashed
00:17:05:WARNING:WU00:FS00:FahCore returned: UNKNOWN_ENUM (-1073741783 = 0xc0000029)
ImageImage
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: repeated failure with project 10090 (Run 98, Clone 23, G

Post by bruce »

I wonder if this is a very similar problem.
viewtopic.php?f=19&t=27099

It seems that certain projects do not work well with large numbers of cores. Does that description fit your situation?
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Project 10085 Failed (48 core system)

Post by Grandpa_01 »

AtwaterFS wrote:Also got same error and fahcore crash on vanilla setup 16 core system - Project 10085 (Run 2, Clone 31, Gen 7).

As someone else voiced - it's a constantly ocuring problem that I dont have time to repeatedly deal with it - I miss the past few months where I could set it and forget it.

Thanks,
There is an option which will most likely take care of this problem, (which by the way is not popular with those it does not affect) but in th past Linux V6 of F@H did not get assigned to the server that distributes the WU's in question here, I have not ran any smp on V6 since the resent upgrade of server software but prior to that Linux V6 on 48+ core machines only got A3 WU's. I would imagine it is still the same so there is still a set it and forget it option out there. :wink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Project 10085 Failed (48 core system)

Post by VijayPande »

We're on it.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Sailer
Posts: 40
Joined: Thu Jan 13, 2011 2:55 am

Re: repeated failure with project 10090 (Run 98, Clone 23, G

Post by Sailer »

bruce wrote:I wonder if this is a very similar problem.
viewtopic.php?f=19&t=27099

It seems that certain projects do not work well with large numbers of cores. Does that description fit your situation?
I'd guess that it is a very similar problem. I don't know which would be easier; to change the coding so that computers with a large number of cores could run these WUs, or to change the server setting so that it doesn't assign these WUs to computers with a large number of cores. Probably the second option, but that's only a guess.
Gooders
Posts: 83
Joined: Sun Jan 12, 2014 8:17 pm
Hardware configuration: HP z600-dual 5650 xeons (6 cores-2.67 x2) , 32g ram, gtx780
Location: UK

Re: Project 10085 Failed (48 core system)

Post by Gooders »

On the 3rd, 4th, 5th and 6th of this month, i too have a load of code i can add to this, i have saved a notpad file of log if it is needed
Image
007quick
Posts: 9
Joined: Fri Dec 05, 2014 12:37 am

Re: Project 10085 Failed (48 core system)

Post by 007quick »

Just as an update. I have since split my folding slot into 4 (24core,12core,8core,4core) and have not received any 1008* Wu to know whether they will fold or not.
Joe_H
Site Admin
Posts: 7927
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 10085 Failed (48 core system)

Post by Joe_H »

That may or may not be due to your settings. The server for this project appears to be out of WU's and is just accepting returns of WU's that were assigned.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
007quick
Posts: 9
Joined: Fri Dec 05, 2014 12:37 am

Re: Project 10085 Failed (48 core system)

Post by 007quick »

Ah... Good to know!
Joe_H
Site Admin
Posts: 7927
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: repeated failure with project 10090 (Run 98, Clone 23, G

Post by Joe_H »

I have heard from PG and the settings on this server have been adjusted so it should no longer assign WU's to systems with a large number of cores. If you see this problem reoccur, please let us know.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply