Project: 5745 (Run 2, Clone 34, Gen 337)

Moderators: Site Moderators, FAHC Science Team

Post Reply
bapriebe
Posts: 44
Joined: Sun Apr 20, 2008 8:33 am
Hardware configuration: HP xw4600 workstation (4GB)+Q9650+Sapphire Vapor-X HD4890,
HP Z600 workstation (4GB)+2xXEON E5540+Sapphire HD5770,
HP ML350 server (4GB)+2xXEON E5520+Diamond HD3850
Location: Ottawa, Ontario

Project: 5745 (Run 2, Clone 34, Gen 337)

Post by bapriebe »

5 times in a row got a "SHAKE violations on GPU" error precisely 3sec after the GUI server started up. (Some say this can be caused by overheating. The 4890's fan is set to 100% and temperatures don't seem to rise above 66C.) After restarting the client, it started right into another WU from project 5732 without incident.

Code: Select all

[18:59:07] Folding@Home GPU Core - Beta
[18:59:07] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[18:59:07] 
[18:59:07] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[18:59:07] Build host: amoeba
[18:59:07] Board Type: AMD
[18:59:07] Core      : 
[18:59:07] Preparing to commence simulation
[18:59:07] - Looking at optimizations...
[18:59:07] - Created dyn
[18:59:07] - Files status OK
[18:59:07] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[18:59:07] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[18:59:07] - Digital signature verified
[18:59:07] 
[18:59:07] Project: 5745 (Run 2, Clone 34, Gen 337)
[18:59:07] 
[18:59:07] Assembly optimizations on if available.
[18:59:07] Entering M.D.
[18:59:13] Tpr hash work/wudata_03.tpr:  3292426534 3612594659 715314425 3296096959 2941335996
[18:59:13] Working on Protein
[18:59:14] Client config found, loading data.
[18:59:14] Starting GUI Server
[18:59:17] mdrun_gpu returned 
[18:59:17] SHAKE violations on GPU
[18:59:17] 
[18:59:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:59:19] CoreStatus = 7A (122)
[18:59:19] Sending work to server
[18:59:19] Project: 5745 (Run 2, Clone 34, Gen 337)
[18:59:19] - Read packet limit of 540015616... Set to 524286976.
[18:59:19] - Error: Could not get length of results file work/wuresults_03.dat
[18:59:19] - Error: Could not read unit 03 file. Removing from queue.
valleton
Posts: 13
Joined: Tue Mar 24, 2009 5:22 pm
Hardware configuration: Intel E8600
Club3D Radeon HD4870 1GB
Location: Estonia

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Post by valleton »

Same here with 4870 1GB, but definitely not a temperature issue in my case.

EDIT: Vista SP1, Catalyst 9.5 (and no VPU recoveries on Vista)

Code: Select all

[21:46:31] Folding@Home GPU Core - Beta
[21:46:31] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[21:46:31] 
[21:46:31] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:46:31] Build host: amoeba
[21:46:31] Board Type: AMD
[21:46:31] Core      : 
[21:46:31] Preparing to commence simulation
[21:46:31] - Looking at optimizations...
[21:46:31] - Created dyn
[21:46:31] - Files status OK
[21:46:31] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[21:46:31] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[21:46:31] - Digital signature verified
[21:46:31] 
[21:46:31] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:31] 
[21:46:31] Assembly optimizations on if available.
[21:46:31] Entering M.D.
[21:46:37] Tpr hash work/wudata_04.tpr:  3292426534 3612594659 715314425 3296096959 2941335996
[21:46:37] Working on Protein
[21:46:37] Client config found, loading data.
[21:46:37] Starting GUI Server
[21:46:39] mdrun_gpu returned 
[21:46:39] SHAKE violations on GPU
[21:46:39] 
[21:46:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:46:43] CoreStatus = 7A (122)
[21:46:43] Sending work to server
[21:46:43] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:43] - Read packet limit of 540015616... Set to 524286976.
[21:46:43] - Error: Could not get length of results file work/wuresults_04.dat
[21:46:43] - Error: Could not read unit 04 file. Removing from queue.
[21:46:43] - Preparing to get new work unit...
[21:46:43] + Attempting to get work packet
[21:46:43] - Connecting to assignment server
[21:46:44] - Successful: assigned to (171.64.65.102).
[21:46:44] + News From Folding@Home: Welcome to Folding@Home
[21:46:44] Loaded queue successfully.
[21:46:46] + Closed connections
[21:46:51] 
[21:46:51] + Processing work unit
[21:46:51] Core required: FahCore_11.exe
[21:46:51] Core found.
[21:46:51] Working on queue slot 05 [July 31 21:46:51 UTC]
[21:46:51] + Working ...
[21:46:51] 
[21:46:51] *------------------------------*
[21:46:51] Folding@Home GPU Core - Beta
[21:46:51] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[21:46:51] 
[21:46:51] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:46:51] Build host: amoeba
[21:46:51] Board Type: AMD
[21:46:51] Core      : 
[21:46:51] Preparing to commence simulation
[21:46:51] - Looking at optimizations...
[21:46:51] - Created dyn
[21:46:51] - Files status OK
[21:46:51] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[21:46:51] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[21:46:51] - Digital signature verified
[21:46:51] 
[21:46:51] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:51] 
[21:46:51] Assembly optimizations on if available.
[21:46:51] Entering M.D.
[21:46:57] Tpr hash work/wudata_05.tpr:  3292426534 3612594659 715314425 3296096959 2941335996
[21:46:57] Working on Protein
[21:46:57] Client config found, loading data.
[21:46:57] Starting GUI Server
[21:46:59] mdrun_gpu returned 
[21:46:59] SHAKE violations on GPU
[21:46:59] 
[21:46:59] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:47:03] CoreStatus = 7A (122)
[21:47:03] Sending work to server
[21:47:03] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:47:03] - Read packet limit of 540015616... Set to 524286976.
[21:47:03] - Error: Could not get length of results file work/wuresults_05.dat
[21:47:03] - Error: Could not read unit 05 file. Removing from queue.
[21:47:03] EUE limit exceeded. Pausing 24 hours.
LIVESTRONG
Posts: 14
Joined: Sat Jul 25, 2009 9:01 pm
Location: Papillion, NE
Contact:

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Post by LIVESTRONG »

I've seen a couple threads about 480x series running into this issue. Have you overclocked the cards?

You can experience an unstable machine with high overclocks even with low and stable temperatures >80c.

If clock speed is not the issue perhaps check if your voltages are adequate compared to other 480x cards.
Main: E8400 4.0GHz, 4GB DDR2, BFG GTX 275, WD Raptor 300GB 10k RPM, WD 500GB, SB X-Fi Fatal1ty Pro

2nd: P4 3.8GHz, 2GB DDR2, PNY GTX 260 (Core 216), 160GB 7.2k RPM

LIVESTRONG. FOLDSTRONG.
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Post by susato »

Thanks for the reports folks - the similar reports of early failures on otherwise stable rigs tend to point to the WU as the problem, rather than the hardware.
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Post by DrSpalding »

You can add my machine to the pile on this WU as well. Same error: SHAKE violations on GPU.

Running Vista 64-bit with a ATI/AMD 48xx of some sort. Definitely not overclocked and the room temperature is 65°F this morning so I don't suspect any overheating issues. Never had any stability issues thus far with this pretty new rig.
Not a real doctor, I just play one on the 'net!
Image
vladh4x0r
Posts: 5
Joined: Tue Jul 28, 2009 5:04 am
Hardware configuration: 1) Core i7 860 @ 3.5 GHz, 6GB DDR3
GPUs: Radeon 4850 and 4830 (not folding)
OS: Windows 7 64-bit
SMP2 client

2) QX9650 @ 3.0 GHz, 4GB DDR2
GPU: GT240
OS: Vista 64-bit
SMP2 client
GPU2 client
Location: Folsom, CA, USA

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Post by vladh4x0r »

Just got this work unit (P5745,R2,C34,G337), same experience as others - "SHAKE violations on GPU" 2-3 seconds after starting. This is on a secondary 4830 that crunched hundreds of other WUs so far with few complaints.
Image
Post Reply