Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Moderators: Site Moderators, FAHC Science Team

Post Reply
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by Napoleon »

Bad WU?

Code: Select all

[21:44:48] Project: 2633 (Run 8, Clone 24, Gen 6)
[21:44:48] 
[21:44:48] Assembly optimizations on if available.
[21:44:48] Entering M.D.
[21:44:54] Completed 0 out of 625000 steps  (0%)
[21:56:55] Completed 6250 out of 625000 steps  (1%)
[21:59:24] - Autosending finished units... [August 29 21:59:24 UTC]
[21:59:24] Trying to send all finished work units
[21:59:24] + No unsent completed units remaining.
[21:59:24] - Autosend completed
[22:12:13] mdrun returned 255
[22:12:13] Going to send back what have done -- stepsTotalG=625000
[22:12:13] Work fraction=0.0188 steps=625000.
[22:12:17] logfile size=0 infoLength=0 edr=0 trr=25
[22:12:17] logfile size: 0 info=0 bed=0 hdr=25
[22:12:17] - Writing 642 bytes of core data to disk...
[22:12:17]   ... Done.
[22:12:17] 
[22:12:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:12:20] CoreStatus = 7A (122)
[22:12:20] Sending work to server
[22:12:20] Project: 2633 (Run 8, Clone 24, Gen 6)


[22:12:20] + Attempting to send results [August 29 22:12:20 UTC]
[22:12:20] - Reading file work/wuresults_08.dat from core
[22:12:20]   (Read 642 bytes from disk)
[22:12:20] Connecting to http://171.67.108.24:8080/
[22:12:21] Posted data.
[22:12:21] Initial: 0000; - Uploaded at ~1 kB/s
[22:12:21] - Averaged speed for that direction ~75 kB/s
[22:12:21] + Results successfully sent
[22:12:21] Thank you for your contribution to Folding@Home.
[22:12:25] Trying to send all finished work units
[22:12:25] + No unsent completed units remaining.
[22:12:25] - Preparing to get new work unit...
[22:12:25] Cleaning up work directory
[22:12:25] + Attempting to get work packet
[22:12:25] Passkey found
[22:12:25] - Will indicate memory of 1536 MB
[22:12:25] - Connecting to assignment server
[22:12:25] Connecting to http://assign.stanford.edu:8080/
[22:12:26] Posted data.
[22:12:26] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[22:12:27] + News From Folding@Home: Welcome to Folding@Home
[22:12:27] Loaded queue successfully.
[22:12:27] Sent data
[22:12:27] Connecting to http://171.64.65.54:8080/
[22:12:28] Posted data.
[22:12:28] Initial: 0000; - Receiving payload (expected size: 1768007)
[22:12:33] - Downloaded at ~345 kB/s
[22:12:33] - Averaged speed for that direction ~608 kB/s
[22:12:33] + Received work.
[22:12:33] Trying to send all finished work units
[22:12:33] + No unsent completed units remaining.
[22:12:33] + Closed connections
[22:12:38] 
[22:12:38] + Processing work unit
[22:12:38] Core required: FahCore_a3.exe
[22:12:38] Core found.
[22:12:38] Working on queue slot 09 [August 29 22:12:38 UTC]
[22:12:38] + Working ...
[22:12:38] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 09 -np 2 -priority 96 -checkpoint 30 -forceasm -verbose -lifeline 1984 -version 630'

[22:12:38] 
[22:12:38] *------------------------------*
[22:12:38] Folding@Home Gromacs SMP Core
[22:12:38] Version 2.22 (Mar 12, 2010)
[22:12:38] 
[22:12:38] Preparing to commence simulation
[22:12:38] - Assembly optimizations manually forced on.
[22:12:38] - Not checking prior termination.
[22:12:38] - Expanded 1767495 -> 1971489 (decompressed 111.5 percent)
[22:12:38] Called DecompressByteArray: compressed_data_size=1767495 data_size=1971489, decompressed_data_size=1971489 diff=0
[22:12:38] - Digital signature verified
[22:12:38] 
[22:12:38] Project: 6020 (Run 0, Clone 114, Gen 278)
[22:12:38] 
[22:12:38] Assembly optimizations on if available.
[22:12:38] Entering M.D.
[22:12:46] Completed 0 out of 500000 steps  (0%)
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by sortofageek »

It is too soon to tell so far, sorry. The only results I can see at this time are yours.

Hi Napoleon (team 191980),
Your WU (P2633 R8 C24 G6) was added to the stats database on 2010-08-29 15:13:02 for 4.42 points of credit.
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by sortofageek »

This work unit was a good one. Another folder was able to complete it for full credit.
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by Napoleon »

Interesting... I'll keep an eye on my rig, then. It's old, but still, a workstation board, not overclocked, has ECC memory, temps are well within specifications. Anyway, I tightened the memory timings manually a while back. Have to check if something has appeared in the ECC logs at next reboot. 2633 was the first one to give me trouble, though. And looks like a few other fellow folders have had trouble with 2633, too.
Last edited by Napoleon on Mon Aug 30, 2010 9:02 pm, edited 2 times in total.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by sortofageek »

Maybe just keep an eye on that one. If this doesn't continue to happen, it may be a glitch unrelated to your equipment.
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by Napoleon »

Logs actually showed corrected memory errors happening quite frequently, but I didn't spot a single uncorrectable one, no BSODs either. Back to stock memory timings. At a glance, the constant error corrections are no longer taking place. Will still keep monitoring the memory just in case some mem chip is (going) bad. But certainly looks like the problem really was overly tight memory timings, not a bad WU. Sorry for the hassle, my bad.

I did run memtest86+ for quite a long time before deploying the timings to everyday use, but I guess synthetic tests aren't perfect at catching subtle real world problems. :|
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by sortofageek »

Excellent troubleshooting. Always suspect the last change made before trouble began. Hope this resolves your issues.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by PantherX »

Napoleon wrote:...I did run memtest86+ for quite a long time before deploying the timings to everyday use, but I guess synthetic tests aren't perfect at catching subtle real world problems. :|
Recently I did run v4.10 for 2 passes (that's the minimum) and it was without any errors. However, It would fail IBT @ 2048 MB. I did a little research and found that Memtest86+ is actually used for testing hardware faults like bad modules, slots, etc. I also read that just because RAM passed Memtest86+, it doesn't necessarily mean that it is Windows stable. Once I found this out, I loosened the RAM timings and it passed IBT @ 2048 MB. However, under some RAM timings, I found that IBT @ 1024 MB and 2048 MB were stable for 5 or 10 runs but when I went for Maximum (>3 GB), it would fail. Although, I was able to fold a single bigadv WU when the machine was unstable at Maximum, I decided that the risk is too high and tweaked the timings so that my system is stable at IBT @ Maximum.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by bruce »

Napoleon wrote:Interesting... I'll keep an eye on my rig, then. It's old, but still, a workstation board, not overclocked, has ECC memory, temps are well within specifications.
According to your profile this is a dual Opteron. Which model?

It has full support for both SSE and SSE2, right?
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: Project: 2633 (Run 8, Clone 24, Gen 6) - UNSTABLE_MACHINE

Post by Napoleon »

Here's what CPUz says, I have 2 of these. Tyan Thunder K8W is a dual socket motherboard.

Image

EDIT: checked back on the logs, apparently I tightened the memory timings 9th August 2010, that's the earliest occurrence of memory error corrections happening. Haven't seen them anymore after reverting to stock memory timings. Anyway, this was the first obvious sign of trouble after running FAH for 3 weeks, so I've completed WUs other than 2633 succesfully with the tighter timings. Of course, there shouldn't be any ECC events happening at all, or at least they should be extremely rare.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
Post Reply