Moderators: Site Moderators , FAHC Science Team
bapriebe
Posts: 44 Joined: Sun Apr 20, 2008 8:33 am
Hardware configuration: HP xw4600 workstation (4GB)+Q9650+Sapphire Vapor-X HD4890, HP Z600 workstation (4GB)+2xXEON E5540+Sapphire HD5770, HP ML350 server (4GB)+2xXEON E5520+Diamond HD3850
Location: Ottawa, Ontario
Post
by bapriebe » Thu Jul 30, 2009 7:05 pm
5 times in a row got a "SHAKE violations on GPU" error precisely 3sec after the GUI server started up. (Some say this can be caused by overheating. The 4890's fan is set to 100% and temperatures don't seem to rise above 66C.) After restarting the client, it started right into another WU from project 5732 without incident.
Code: Select all
[18:59:07] Folding@Home GPU Core - Beta
[18:59:07] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[18:59:07]
[18:59:07] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:59:07] Build host: amoeba
[18:59:07] Board Type: AMD
[18:59:07] Core :
[18:59:07] Preparing to commence simulation
[18:59:07] - Looking at optimizations...
[18:59:07] - Created dyn
[18:59:07] - Files status OK
[18:59:07] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[18:59:07] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[18:59:07] - Digital signature verified
[18:59:07]
[18:59:07] Project: 5745 (Run 2, Clone 34, Gen 337)
[18:59:07]
[18:59:07] Assembly optimizations on if available.
[18:59:07] Entering M.D.
[18:59:13] Tpr hash work/wudata_03.tpr: 3292426534 3612594659 715314425 3296096959 2941335996
[18:59:13] Working on Protein
[18:59:14] Client config found, loading data.
[18:59:14] Starting GUI Server
[18:59:17] mdrun_gpu returned
[18:59:17] SHAKE violations on GPU
[18:59:17]
[18:59:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:59:19] CoreStatus = 7A (122)
[18:59:19] Sending work to server
[18:59:19] Project: 5745 (Run 2, Clone 34, Gen 337)
[18:59:19] - Read packet limit of 540015616... Set to 524286976.
[18:59:19] - Error: Could not get length of results file work/wuresults_03.dat
[18:59:19] - Error: Could not read unit 03 file. Removing from queue.
valleton
Posts: 13 Joined: Tue Mar 24, 2009 5:22 pm
Hardware configuration: Intel E8600 Club3D Radeon HD4870 1GB
Location: Estonia
Post
by valleton » Sat Aug 01, 2009 12:11 am
Same here with 4870 1GB, but definitely not a temperature issue in my case.
EDIT: Vista SP1, Catalyst 9.5 (and no VPU recoveries on Vista)
Code: Select all
[21:46:31] Folding@Home GPU Core - Beta
[21:46:31] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[21:46:31]
[21:46:31] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[21:46:31] Build host: amoeba
[21:46:31] Board Type: AMD
[21:46:31] Core :
[21:46:31] Preparing to commence simulation
[21:46:31] - Looking at optimizations...
[21:46:31] - Created dyn
[21:46:31] - Files status OK
[21:46:31] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[21:46:31] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[21:46:31] - Digital signature verified
[21:46:31]
[21:46:31] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:31]
[21:46:31] Assembly optimizations on if available.
[21:46:31] Entering M.D.
[21:46:37] Tpr hash work/wudata_04.tpr: 3292426534 3612594659 715314425 3296096959 2941335996
[21:46:37] Working on Protein
[21:46:37] Client config found, loading data.
[21:46:37] Starting GUI Server
[21:46:39] mdrun_gpu returned
[21:46:39] SHAKE violations on GPU
[21:46:39]
[21:46:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:46:43] CoreStatus = 7A (122)
[21:46:43] Sending work to server
[21:46:43] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:43] - Read packet limit of 540015616... Set to 524286976.
[21:46:43] - Error: Could not get length of results file work/wuresults_04.dat
[21:46:43] - Error: Could not read unit 04 file. Removing from queue.
[21:46:43] - Preparing to get new work unit...
[21:46:43] + Attempting to get work packet
[21:46:43] - Connecting to assignment server
[21:46:44] - Successful: assigned to (171.64.65.102).
[21:46:44] + News From Folding@Home: Welcome to Folding@Home
[21:46:44] Loaded queue successfully.
[21:46:46] + Closed connections
[21:46:51]
[21:46:51] + Processing work unit
[21:46:51] Core required: FahCore_11.exe
[21:46:51] Core found.
[21:46:51] Working on queue slot 05 [July 31 21:46:51 UTC]
[21:46:51] + Working ...
[21:46:51]
[21:46:51] *------------------------------*
[21:46:51] Folding@Home GPU Core - Beta
[21:46:51] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[21:46:51]
[21:46:51] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[21:46:51] Build host: amoeba
[21:46:51] Board Type: AMD
[21:46:51] Core :
[21:46:51] Preparing to commence simulation
[21:46:51] - Looking at optimizations...
[21:46:51] - Created dyn
[21:46:51] - Files status OK
[21:46:51] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[21:46:51] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[21:46:51] - Digital signature verified
[21:46:51]
[21:46:51] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:51]
[21:46:51] Assembly optimizations on if available.
[21:46:51] Entering M.D.
[21:46:57] Tpr hash work/wudata_05.tpr: 3292426534 3612594659 715314425 3296096959 2941335996
[21:46:57] Working on Protein
[21:46:57] Client config found, loading data.
[21:46:57] Starting GUI Server
[21:46:59] mdrun_gpu returned
[21:46:59] SHAKE violations on GPU
[21:46:59]
[21:46:59] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:47:03] CoreStatus = 7A (122)
[21:47:03] Sending work to server
[21:47:03] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:47:03] - Read packet limit of 540015616... Set to 524286976.
[21:47:03] - Error: Could not get length of results file work/wuresults_05.dat
[21:47:03] - Error: Could not read unit 05 file. Removing from queue.
[21:47:03] EUE limit exceeded. Pausing 24 hours.
LIVESTRONG
Posts: 14 Joined: Sat Jul 25, 2009 9:01 pm
Location: Papillion, NE
Contact:
Post
by LIVESTRONG » Sat Aug 01, 2009 3:23 am
I've seen a couple threads about 480x series running into this issue. Have you overclocked the cards?
You can experience an unstable machine with high overclocks even with low and stable temperatures >80c.
If clock speed is not the issue perhaps check if your voltages are adequate compared to other 480x cards.
Main: E8400 4.0GHz, 4GB DDR2, BFG GTX 275, WD Raptor 300GB 10k RPM, WD 500GB, SB X-Fi Fatal1ty Pro
2nd: P4 3.8GHz, 2GB DDR2, PNY GTX 260 (Core 216), 160GB 7.2k RPM
LIVESTRONG.
FOLDSTRONG.
susato
Site Moderator
Posts: 511 Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:
Post
by susato » Sat Aug 01, 2009 6:01 am
Thanks for the reports folks - the similar reports of early failures on otherwise stable rigs tend to point to the WU as the problem, rather than the hardware.
DrSpalding
Posts: 136 Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275 Dell 5150 + nVidia 9800GT
Post
by DrSpalding » Fri Aug 07, 2009 3:46 pm
You can add my machine to the pile on this WU as well. Same error: SHAKE violations on GPU.
Running Vista 64-bit with a ATI/AMD 48xx of some sort. Definitely not overclocked and the room temperature is 65°F this morning so I don't suspect any overheating issues. Never had any stability issues thus far with this pretty new rig.
Not a real doctor, I just play one on the 'net!
vladh4x0r
Posts: 5 Joined: Tue Jul 28, 2009 5:04 am
Hardware configuration: 1) Core i7 860 @ 3.5 GHz, 6GB DDR3 GPUs: Radeon 4850 and 4830 (not folding) OS: Windows 7 64-bit SMP2 client 2) QX9650 @ 3.0 GHz, 4GB DDR2 GPU: GT240 OS: Vista 64-bit SMP2 client GPU2 client
Location: Folsom, CA, USA
Post
by vladh4x0r » Thu Aug 20, 2009 4:03 am
Just got this work unit (P5745,R2,C34,G337), same experience as others - "SHAKE violations on GPU" 2-3 seconds after starting. This is on a secondary 483 0 that crunched hundreds of other WUs so far with few complaints.