Page 1 of 1

Project: 5745 (Run 2, Clone 34, Gen 337)

Posted: Thu Jul 30, 2009 7:05 pm
by bapriebe
5 times in a row got a "SHAKE violations on GPU" error precisely 3sec after the GUI server started up. (Some say this can be caused by overheating. The 4890's fan is set to 100% and temperatures don't seem to rise above 66C.) After restarting the client, it started right into another WU from project 5732 without incident.

Code: Select all

[18:59:07] Folding@Home GPU Core - Beta
[18:59:07] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[18:59:07] 
[18:59:07] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[18:59:07] Build host: amoeba
[18:59:07] Board Type: AMD
[18:59:07] Core      : 
[18:59:07] Preparing to commence simulation
[18:59:07] - Looking at optimizations...
[18:59:07] - Created dyn
[18:59:07] - Files status OK
[18:59:07] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[18:59:07] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[18:59:07] - Digital signature verified
[18:59:07] 
[18:59:07] Project: 5745 (Run 2, Clone 34, Gen 337)
[18:59:07] 
[18:59:07] Assembly optimizations on if available.
[18:59:07] Entering M.D.
[18:59:13] Tpr hash work/wudata_03.tpr:  3292426534 3612594659 715314425 3296096959 2941335996
[18:59:13] Working on Protein
[18:59:14] Client config found, loading data.
[18:59:14] Starting GUI Server
[18:59:17] mdrun_gpu returned 
[18:59:17] SHAKE violations on GPU
[18:59:17] 
[18:59:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:59:19] CoreStatus = 7A (122)
[18:59:19] Sending work to server
[18:59:19] Project: 5745 (Run 2, Clone 34, Gen 337)
[18:59:19] - Read packet limit of 540015616... Set to 524286976.
[18:59:19] - Error: Could not get length of results file work/wuresults_03.dat
[18:59:19] - Error: Could not read unit 03 file. Removing from queue.

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Posted: Sat Aug 01, 2009 12:11 am
by valleton
Same here with 4870 1GB, but definitely not a temperature issue in my case.

EDIT: Vista SP1, Catalyst 9.5 (and no VPU recoveries on Vista)

Code: Select all

[21:46:31] Folding@Home GPU Core - Beta
[21:46:31] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[21:46:31] 
[21:46:31] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:46:31] Build host: amoeba
[21:46:31] Board Type: AMD
[21:46:31] Core      : 
[21:46:31] Preparing to commence simulation
[21:46:31] - Looking at optimizations...
[21:46:31] - Created dyn
[21:46:31] - Files status OK
[21:46:31] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[21:46:31] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[21:46:31] - Digital signature verified
[21:46:31] 
[21:46:31] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:31] 
[21:46:31] Assembly optimizations on if available.
[21:46:31] Entering M.D.
[21:46:37] Tpr hash work/wudata_04.tpr:  3292426534 3612594659 715314425 3296096959 2941335996
[21:46:37] Working on Protein
[21:46:37] Client config found, loading data.
[21:46:37] Starting GUI Server
[21:46:39] mdrun_gpu returned 
[21:46:39] SHAKE violations on GPU
[21:46:39] 
[21:46:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:46:43] CoreStatus = 7A (122)
[21:46:43] Sending work to server
[21:46:43] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:43] - Read packet limit of 540015616... Set to 524286976.
[21:46:43] - Error: Could not get length of results file work/wuresults_04.dat
[21:46:43] - Error: Could not read unit 04 file. Removing from queue.
[21:46:43] - Preparing to get new work unit...
[21:46:43] + Attempting to get work packet
[21:46:43] - Connecting to assignment server
[21:46:44] - Successful: assigned to (171.64.65.102).
[21:46:44] + News From Folding@Home: Welcome to Folding@Home
[21:46:44] Loaded queue successfully.
[21:46:46] + Closed connections
[21:46:51] 
[21:46:51] + Processing work unit
[21:46:51] Core required: FahCore_11.exe
[21:46:51] Core found.
[21:46:51] Working on queue slot 05 [July 31 21:46:51 UTC]
[21:46:51] + Working ...
[21:46:51] 
[21:46:51] *------------------------------*
[21:46:51] Folding@Home GPU Core - Beta
[21:46:51] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[21:46:51] 
[21:46:51] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:46:51] Build host: amoeba
[21:46:51] Board Type: AMD
[21:46:51] Core      : 
[21:46:51] Preparing to commence simulation
[21:46:51] - Looking at optimizations...
[21:46:51] - Created dyn
[21:46:51] - Files status OK
[21:46:51] - Expanded 68495 -> 357580 (decompressed 522.0 percent)
[21:46:51] Called DecompressByteArray: compressed_data_size=68495 data_size=357580, decompressed_data_size=357580 diff=0
[21:46:51] - Digital signature verified
[21:46:51] 
[21:46:51] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:46:51] 
[21:46:51] Assembly optimizations on if available.
[21:46:51] Entering M.D.
[21:46:57] Tpr hash work/wudata_05.tpr:  3292426534 3612594659 715314425 3296096959 2941335996
[21:46:57] Working on Protein
[21:46:57] Client config found, loading data.
[21:46:57] Starting GUI Server
[21:46:59] mdrun_gpu returned 
[21:46:59] SHAKE violations on GPU
[21:46:59] 
[21:46:59] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:47:03] CoreStatus = 7A (122)
[21:47:03] Sending work to server
[21:47:03] Project: 5745 (Run 2, Clone 34, Gen 337)
[21:47:03] - Read packet limit of 540015616... Set to 524286976.
[21:47:03] - Error: Could not get length of results file work/wuresults_05.dat
[21:47:03] - Error: Could not read unit 05 file. Removing from queue.
[21:47:03] EUE limit exceeded. Pausing 24 hours.

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Posted: Sat Aug 01, 2009 3:23 am
by LIVESTRONG
I've seen a couple threads about 480x series running into this issue. Have you overclocked the cards?

You can experience an unstable machine with high overclocks even with low and stable temperatures >80c.

If clock speed is not the issue perhaps check if your voltages are adequate compared to other 480x cards.

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Posted: Sat Aug 01, 2009 6:01 am
by susato
Thanks for the reports folks - the similar reports of early failures on otherwise stable rigs tend to point to the WU as the problem, rather than the hardware.

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Posted: Fri Aug 07, 2009 3:46 pm
by DrSpalding
You can add my machine to the pile on this WU as well. Same error: SHAKE violations on GPU.

Running Vista 64-bit with a ATI/AMD 48xx of some sort. Definitely not overclocked and the room temperature is 65°F this morning so I don't suspect any overheating issues. Never had any stability issues thus far with this pretty new rig.

Re: Project: 5745 (Run 2, Clone 34, Gen 337)

Posted: Thu Aug 20, 2009 4:03 am
by vladh4x0r
Just got this work unit (P5745,R2,C34,G337), same experience as others - "SHAKE violations on GPU" 2-3 seconds after starting. This is on a secondary 4830 that crunched hundreds of other WUs so far with few complaints.