Project: 2684 (Run 2, Clone 5, Gen 10)

Moderators: Site Moderators, FAHC Science Team

DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Project: 2684 (Run 2, Clone 5, Gen 10)

Post by DrSpalding »

Hi, my -bigadv machine just turned in (or tried to) early results on the above project at 44% complete. The corestatus was 0xC0000005 (which is a STATUS_ACCESS_VIOLATION fault) and a Client-core communcications error. I suspect that the core terminated w/o telling the client what was happening. Is there any info on this WU about its stability or should I look into the machine for hardware issues?

Here is the relevant log info:

Code: Select all

[05:39:55] - Preparing to get new work unit...
[05:39:55] Cleaning up work directory
[05:39:55] + Attempting to get work packet
[05:39:55] Passkey found
[05:39:55] - Connecting to assignment server
[05:39:55] - Successful: assigned to (171.67.108.22).
[05:39:55] + News From Folding@Home: Welcome to Folding@Home
[05:39:56] Loaded queue successfully.
[05:40:54] + Closed connections
[05:40:54]
[05:40:54] + Processing work unit
[05:40:54] Core required: FahCore_a3.exe
[05:40:54] Core found.
[05:40:54] Working on queue slot 06 [July 18 05:40:54 UTC]
[05:40:54] + Working ...
[05:40:54]
[05:40:54] *------------------------------*
[05:40:54] Folding@Home Gromacs SMP Core
[05:40:54] Version 2.22 (Mar 12, 2010)
[05:40:54]
[05:40:54] Preparing to commence simulation
[05:40:54] - Looking at optimizations...
[05:40:54] - Created dyn
[05:40:54] - Files status OK
[05:40:58] - Expanded 24821153 -> 30791309 (decompressed 124.0 percent)
[05:40:58] Called DecompressByteArray: compressed_data_size=24821153 data_size=30791309, decompressed_data_size=30791309 diff=0
[05:40:59] - Digital signature verified
[05:40:59]
[05:40:59] Project: 2684 (Run 2, Clone 5, Gen 10)
[05:40:59]
[05:40:59] Assembly optimizations on if available.
[05:40:59] Entering M.D.
[05:41:09] Completed 0 out of 250000 steps  (0%)
[06:30:30] Completed 2500 out of 250000 steps  (1%)
[07:15:45] Completed 5000 out of 250000 steps  (2%)
...
[14:30:35] Completed 107500 out of 250000 steps  (43%)
[15:15:24] Completed 110000 out of 250000 steps  (44%)
[15:19:19] Gromacs cannot continue further.
[15:19:19] Going to send back what have done -- stepsTotalG=250000
[15:19:19] Work fraction=-1.#IND steps=250000.
[15:19:49] logfile size=97434 infoLength=97434 edr=0 trr=23
[15:19:49] logfile size: 97434 info=97434 bed=0 hdr=23
[15:19:49] - Writing 97970 bytes of core data to disk...
[15:19:52] CoreStatus = C0000005 (-1073741819)
[15:19:52] Client-core communications error: ERROR 0xc0000005
[15:19:52] Deleting current work unit & continuing...
[15:20:34] - Preparing to get new work unit...
[15:20:34] Cleaning up work directory
[15:20:34] + Attempting to get work packet
[15:20:34] Passkey found
[15:20:34] - Connecting to assignment server
[15:20:34] - Successful: assigned to (171.67.108.22).
[15:20:34] + News From Folding@Home: Welcome to Folding@Home
[15:20:35] Loaded queue successfully.
[15:21:05] + Closed connections
[15:21:10]
Not a real doctor, I just play one on the 'net!
Image
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by toTOW »

No data for this WU in the DB yet ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by DrSpalding »

The assignment server gave the same WU back to the machine, so the results must not have gotten uploaded. We'll see in another 33 hours if it does the same thing again and if so, I'll have to get the machine to move on to another WU manually.
Not a real doctor, I just play one on the 'net!
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by bruce »

Please do a thorough memory test on your system at the actual temperatures that your system sees when folding. There is no single cause for 0xC0000005 but the most common one is memory errors (including memory timing settings that are just a bit too tight for your memory as well as chipset errors that result in memory errors).
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by Grandpa_01 »

bruce wrote:Please do a thorough memory test on your system at the actual temperatures that your system sees when folding. There is no single cause for 0xC0000005 but the most common one is memory errors (including memory timing settings that are just a bit too tight for your memory as well as chipset errors that result in memory errors).
Good advice bruce especiall this part
Please do a thorough memory test on your system at the actual temperatures that your system see's when folding.
which is very hard to do.

Do you know of a memory test that will use 100% of the CPU and create the heat that folding does. Or a memory test that can be run while folding.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by PantherX »

Grandpa_01 wrote:...Do you know of a memory test that will use 100% of the CPU and create the heat that folding does...
I use IntelBurnTest and configure it for Maximum RAM and it generates more heat than F@H does. It stress the RAM and CPU at the same time so I really like it. After that, I run StressCPU to ensure that the stable system is also scientifically stable too as it is for F@H.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by DrSpalding »

What do you consider a suitable memory test? Prime95 with a 10GB data set while running the GPU client on the GTX275 as well? The problem with doing a standalone memory test (a la memtest x86 or the Win7 memdiag) is that I can't saturate the machine further with the bandwidth and heat from the GPU as well.

The memory is 7-7-7-24 @ 1333 MHz, but it is running at a slower speed, at least according to the BIOS, at 7-7-7-24 @ ~1146MHz.
Not a real doctor, I just play one on the 'net!
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by 7im »

Prime is lame, it's not up to the task, hence your problem with not being able to saturate the system.

You could also use the memtestG80, which is a memory testing for the NV Cards. Then combine with whatever tool you like that they recommended above.

IBT and OCCT (mixed test) are about as good as they get for maxing both CPU and Memory. Throw in the memtestG80 and you're all set.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by PantherX »

@DrSpalding: (Based on my experience) Prime95 produces less stress than F@H and IntelBurnTest is more stressful than F@H if configured properly. I prefer IntelBurnTest with 10 iterations @ Maximum setting. What I do is first fire off IBT @ Maximum for 10 iterations. If passed successfully, I set it at 6 threads with 3 GB RAM and then run Furmark @ Maximum settings and run Hyper PI 0.99 Beta @ 32 Million on the last free thread. Thus I have stressed my CPU and GPU. I use HWMonitor and if any temperature passes 90C, I terminate everything and downclock and repeat until the Maximum temperature Value is =<90C.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by DrSpalding »

PantherX wrote:@DrSpalding: (Based on my experience) Prime95 produces less stress than F@H and IntelBurnTest is more stressful than F@H if configured properly. I prefer IntelBurnTest with 10 iterations @ Maximum setting. What I do is first fire off IBT @ Maximum for 10 iterations. If passed successfully, I set it at 6 threads with 3 GB RAM and then run Furmark @ Maximum settings and run Hyper PI 0.99 Beta @ 32 Million on the last free thread. Thus I have stressed my CPU and GPU. I use HWMonitor and if any temperature passes 90C, I terminate everything and downclock and repeat until the Maximum temperature Value is =<90C.
1. Where do you get IBT and/or OCCT? I found downloads for both (IBT v2.3 and OCCT v3) on guru3d.com but don' t know which versions are the up-to-date ones.
2. I have noted that Prime95 gets the cpu cores a couple of degrees hotter than F@H seems to. It seems to hold that high temperature more stably than F@H does too, FWIW.
3. Is running a GPU client sufficient to test the GPU + CPU at the same time when running IBT or OCCT?

Thanks,
Dan
Not a real doctor, I just play one on the 'net!
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by PantherX »

1) You can check the tools list in most cases, it contains the links to the latest softwares (FYI, IBT 2.4 is latest).
2) YMMV but on my system, it took Prime95 longer to reach the tempratures of F@H and never exceed them. IBT on the other hand, overshoot the F@H temps in <5 minutes.
3) IBT is specific to CPU only. OCCT I have heard that it includes the LinX (used in IBT) and also has its own GPU stress software. I haven't used OCCT so can't be specific.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by bruce »

DrSpalding wrote:Is running a GPU client sufficient to test the GPU + CPU at the same time when running IBT or OCCT?
Probably not.

First, running a GPU client means different things to the CPU, depending on whether you have ATi, Fermi, or a G80.

Second (as noted earlier), it's almost impossible to test everything simultaneously. For simplicity's sake I'm going to divide a system into RAM, GPU, and CPU and further divide the CPU into ALU and FPU. A single test can maximize the use of any one of them but not all of them simultaneously. Picking a test that deals with each one separately is fairly easily, but finding something that comes close to maximizing all of the simultaneously is next to impossible. FAH will also be limited by the maximum of one of them but will use the others at somewhat less than maximum so finding something that is close to the way your system runs FAH means you'll probably have to run more than one test. That's one reason why you always have to back down from whatever settings seem to be stable.

Prime probably maximizes the use of the ALU but doesn't maximize the FPU or RAM and certainly not the GPU.
Memtest86 probably maximizes RAM but doesn't use much ALU or FPU or GPU.
The GPU client probably comes close to maximizing the use of the GPU but doesn't saturate the ALU and uses virtually no FPU. (Then, too, the various GPU benchmark tests may maximize different aspects of the GPU, but let's not go into that.)
StressCPU2 probably maximizes the FPU similar to FAH's SMP client but may not catch errors in other components.

Integrated tests do a better job of balancing the use of ALU/FPU/RAM so adding a GPU client or benchmark helps find heating issues but we can then debate the relative priority of the two tasks.

No matter what tests you run, you'll probably need more than one and you'll still need to add additional margin.
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by DrSpalding »

I ran the IBT several times standalone, including a set of 10 at 10+GB of memory allocation and it worked flawlessly, ending in about 3 hours. However, the next set I ran, I also included the running of the GPU client with its priority set to normal so that it would make sure and run. Within about 11 minutes, the machine bug-checked on me with the nebulous "MACHINE_CHECK_EXCEPTION" of 9C. That one is a catch-all for various MCA exceptions from the CPU and w/o a debugger attached to the machine, it is hard to figure anything else out from it. I suspect memory or bus issues since the nVidia GTX275 GPU running an F@H client really only intersects the machine at the bus and memory. If anyone has an idea what I should tweak first (vCore for CPU, memory timings, etc.), please feel free to drop me a message.

For now, I am running w/o the GPU client until I get it sorted out.
Not a real doctor, I just play one on the 'net!
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by PantherX »

@DrSpalding: Have you overclocked the System? If yes, return everything to stock and see if the error arises.
Have you changed any variables in the motherboard? If yes, change everything to stock and then try it again.
Have you tried to change the PCI-E Slot of the GPU and repeat the test? If yes, was the error a same one or not?
Can you run MemtestG80 on the GPU without any problems? (mode details in my guide; link in sig)
Is your PSU stable enough to provided enough power to both the CPU and GPU when both are at 100% load?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
B2K24
Posts: 13
Joined: Wed May 19, 2010 3:44 pm

Re: Project: 2684 (Run 2, Clone 5, Gen 10)

Post by B2K24 »

I got errors with bigadv when I manually set the timings in bios 7-7-7-20 as the sticker on my Corsair Dominator C7's reads but when I put all timings to AUTO bios gives them 9-9-9-24 and have had no stability issues running with auto timings.
Post Reply