GPU folding - random freezes

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

GPU folding - random freezes

Post by Breach »

Hi,

I recently switched to a GTX 670 card and have a new issue with random Windows freezes when folding on Full. What happens is that all of a sudden the OS will freeze (meaning literally everything will freeze) for like 30 secs - 1 minute. Then things will continue as before... until 5 minutes later. When this happens there are no crashes, no errors in the FAH or the Windows event log - no nothing. Note that I am talking about freezes - not 1-2 sec lag...

I have been folding on Full on my GTX 295 before with the same driver without any such issues whatsoever (different core though).

Any thoughts? Log below:

Code: Select all

*********************** Log Started 2013-04-05T12:21:12Z ***********************
12:21:12:************************* Folding@home Client *************************
12:21:12:      Website: http://folding.stanford.edu/
12:21:12:    Copyright: (c) 2009-2013 Stanford University
12:21:12:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:21:12:         Args: --open-web-control
12:21:12:       Config: C:/ProgramData/FAHClient/config.xml
12:21:12:******************************** Build ********************************
12:21:12:      Version: 7.3.6
12:21:12:         Date: Feb 18 2013
12:21:12:         Time: 15:25:17
12:21:12:      SVN Rev: 3923
12:21:12:       Branch: fah/trunk/client
12:21:12:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
12:21:12:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
12:21:12:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
12:21:12:     Platform: win32 XP
12:21:12:         Bits: 32
12:21:12:         Mode: Release
12:21:12:******************************* System ********************************
12:21:12:          CPU: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
12:21:12:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
12:21:12:         CPUs: 8
12:21:12:       Memory: 7.88GiB
12:21:12:  Free Memory: 6.49GiB
12:21:12:      Threads: WINDOWS_THREADS
12:21:12:  Has Battery: false
12:21:12:   On Battery: false
12:21:12:   UTC offset: 2
12:21:12:          PID: 3536
12:21:12:          CWD: C:/ProgramData/FAHClient
12:21:12:           OS: Windows 8 Pro with Media Center
12:21:12:      OS Arch: AMD64
12:21:12:         GPUs: 1
12:21:12:        GPU 0: NVIDIA:3 GK104 [GeForce GTX 670]
12:21:12:         CUDA: 3.0
12:21:12:  CUDA Driver: 5000
12:21:12:Win32 Service: false
12:21:12:***********************************************************************
12:21:12:<config>
12:21:12:  <!-- Folding Core -->
12:21:12:  <checkpoint v='10'/>
12:21:12:
12:21:12:  <!-- Folding Slot Configuration -->
12:21:12:  <power v='full'/>
12:21:12:
12:21:12:  <!-- Network -->
12:21:12:  <proxy v=':8080'/>
12:21:12:
12:21:12:  <!-- User Information -->
12:21:12:  <passkey v='********************************'/>
12:21:12:  <team v='845'/>
12:21:12:  <user v='Alexander_Ivanchev'/>
12:21:12:
12:21:12:  <!-- Folding Slots -->
12:21:12:  <slot id='0' type='GPU'/>
12:21:12:  <slot id='2' type='CPU'/>
12:21:12:</config>
12:21:12:Trying to access database...
12:21:12:Successfully acquired database lock
12:21:12:Enabled folding slot 00: READY gpu:0:GK104 [GeForce GTX 670]
12:21:12:Enabled folding slot 02: READY cpu:8
12:21:12:WU01:FS02:Starting
12:21:12:WU01:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 703 -lifeline 3536 -checkpoint 10 -np 8
12:21:12:WU01:FS02:Started FahCore on PID 4564
12:21:12:WU01:FS02:Core PID:320
12:21:12:WU01:FS02:FahCore 0xa4 started
12:21:12:WU00:FS00:Starting
12:21:12:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -version 703 -lifeline 3536 -checkpoint 10 -gpu 0 -gpu-vendor nvidia
12:21:12:WU00:FS00:Started FahCore on PID 5144
12:21:12:WU00:FS00:Core PID:1140
12:21:12:WU00:FS00:FahCore 0x15 started
12:21:13:WU01:FS02:0xa4:
12:21:13:WU01:FS02:0xa4:*------------------------------*
12:21:13:WU01:FS02:0xa4:Folding@Home Gromacs GB Core
12:21:13:WU01:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
12:21:13:WU01:FS02:0xa4:
12:21:13:WU01:FS02:0xa4:Preparing to commence simulation
12:21:13:WU01:FS02:0xa4:- Looking at optimizations...
12:21:13:WU01:FS02:0xa4:- Files status OK
12:21:13:WU01:FS02:0xa4:- Expanded 108998 -> 443200 (decompressed 406.6 percent)
12:21:13:WU01:FS02:0xa4:Called DecompressByteArray: compressed_data_size=108998 data_size=443200, decompressed_data_size=443200 diff=0
12:21:13:WU01:FS02:0xa4:- Digital signature verified
12:21:13:WU01:FS02:0xa4:
12:21:13:WU01:FS02:0xa4:Project: 7085 (Run 0, Clone 891, Gen 9)
12:21:13:WU01:FS02:0xa4:
12:21:13:WU01:FS02:0xa4:Assembly optimizations on if available.
12:21:13:WU01:FS02:0xa4:Entering M.D.
12:21:13:WU00:FS00:0x15:
12:21:13:WU00:FS00:0x15:*------------------------------*
12:21:13:WU00:FS00:0x15:Folding@Home GPU Core
12:21:13:WU00:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
12:21:13:WU00:FS00:0x15:Build host             AmoebaRemote
12:21:13:WU00:FS00:0x15:Board Type             NVIDIA/CUDA
12:21:13:WU00:FS00:0x15:Core                   15
12:21:13:WU00:FS00:0x15:
12:21:13:WU00:FS00:0x15:Window's signal control handler registered.
12:21:13:WU00:FS00:0x15:Preparing to commence simulation
12:21:13:WU00:FS00:0x15:- Looking at optimizations...
12:21:13:WU00:FS00:0x15:- Files status OK
12:21:13:WU00:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
12:21:13:WU00:FS00:0x15:- Expanded 58269 -> 257358 (decompressed 441.6 percent)
12:21:13:WU00:FS00:0x15:Called DecompressByteArray: compressed_data_size=58269 data_size=257358, decompressed_data_size=257358 diff=0
12:21:13:WU00:FS00:0x15:- Digital signature verified
12:21:13:WU00:FS00:0x15:
12:21:13:WU00:FS00:0x15:Project: 8070 (Run 149, Clone 5, Gen 25)
12:21:13:WU00:FS00:0x15:
12:21:13:WU00:FS00:0x15:Assembly optimizations on if available.
12:21:13:WU00:FS00:0x15:Entering M.D.
12:21:14:WU00:FS00:0x15:Will resume from checkpoint file 00/wudata_01.ckp
12:21:14:WU00:FS00:0x15:Tpr hash 00/wudata_01.tpr:  2829821335 2146261007 1216469112 4170169621 3443570522
12:21:14:WU00:FS00:0x15:GPU device id=0
12:21:14:WU00:FS00:0x15:Working on Gallium Rubidium Oxygen Manganese Argon Carbon Silicon t= 149.00000
12:21:14:WU00:FS00:0x15:Client config unavailable.
12:21:15:WU00:FS00:0x15:Starting GUI Server
12:21:17:3:127.0.0.1:New Web connection
12:21:18:WU01:FS02:0xa4:Mapping NT from 8 to 8 
12:21:18:WU01:FS02:0xa4:Completed 0 out of 10000000 steps  (0%)
12:22:16:WU00:FS00:0x15:Resuming from checkpoint
12:22:16:WU00:FS00:0x15:fcCheckPointResume: retreived and current tpr file hash:
12:22:16:WU00:FS00:0x15:   0   2829821335   2829821335
12:22:16:WU00:FS00:0x15:   1   2146261007   2146261007
12:22:16:WU00:FS00:0x15:   2   1216469112   1216469112
12:22:16:WU00:FS00:0x15:   3   4170169621   4170169621
12:22:16:WU00:FS00:0x15:   4   3443570522   3443570522
12:22:16:WU00:FS00:0x15:fcCheckPointResume: file hashes same.
12:22:16:WU00:FS00:0x15:fcCheckPointResume: state restored.
12:22:16:WU00:FS00:0x15:fcCheckPointResume: name 00/wudata_01.log Verified 00/wudata_01.log
12:22:16:WU00:FS00:0x15:fcCheckPointResume: name 00/wudata_01.trr Verified 00/wudata_01.trr
12:22:16:WU00:FS00:0x15:fcCheckPointResume: name 00/wudata_01.xtc Verified 00/wudata_01.xtc
12:22:16:WU00:FS00:0x15:fcCheckPointResume: name 00/wudata_01.edr Verified 00/wudata_01.edr
12:22:16:WU00:FS00:0x15:fcCheckPointResume: state restored 2
12:22:16:WU00:FS00:0x15:Resumed from checkpoint
12:22:16:WU00:FS00:0x15:Setting checkpoint frequency: 500000
12:22:16:WU00:FS00:0x15:Completed  11000001 out of 50000000 steps (22%).
12:22:17:WARNING:WU00:FS00:Detected clock skew (1 mins 05 secs), adjusting time estimates
12:24:12:WU00:FS00:0x15:Completed  11500000 out of 50000000 steps (23%).
12:26:06:WU00:FS00:0x15:Completed  12000000 out of 50000000 steps (24%).
12:28:00:WU00:FS00:0x15:Completed  12500000 out of 50000000 steps (25%).
12:29:55:WU00:FS00:0x15:Completed  13000000 out of 50000000 steps (26%).
12:31:49:WU00:FS00:0x15:Completed  13500000 out of 50000000 steps (27%).
12:32:08:WU01:FS02:0xa4:Completed 100000 out of 10000000 steps  (1%)
12:33:44:WU00:FS00:0x15:Completed  14000000 out of 50000000 steps (28%).
12:35:39:WU00:FS00:0x15:Completed  14500000 out of 50000000 steps (29%).
12:37:32:WU00:FS00:0x15:Completed  15000000 out of 50000000 steps (30%).
12:39:26:WU00:FS00:0x15:Completed  15500000 out of 50000000 steps (31%).
12:41:20:WU00:FS00:0x15:Completed  16000000 out of 50000000 steps (32%).
12:43:03:WU01:FS02:0xa4:Completed 200000 out of 10000000 steps  (2%)
12:43:14:WU00:FS00:0x15:Completed  16500000 out of 50000000 steps (33%).
12:45:07:WU00:FS00:0x15:Completed  17000000 out of 50000000 steps (34%).
12:47:01:WU00:FS00:0x15:Completed  17500000 out of 50000000 steps (35%).
12:48:54:WU00:FS00:0x15:Completed  18000000 out of 50000000 steps (36%).
12:50:48:WU00:FS00:0x15:Completed  18500000 out of 50000000 steps (37%).
12:52:42:WU00:FS00:0x15:Completed  19000000 out of 50000000 steps (38%).
12:54:35:WU00:FS00:0x15:Completed  19500000 out of 50000000 steps (39%).
12:56:28:WU00:FS00:0x15:Completed  20000000 out of 50000000 steps (40%).
12:58:21:WU00:FS00:0x15:Completed  20500000 out of 50000000 steps (41%).
13:00:15:WU00:FS00:0x15:Completed  21000000 out of 50000000 steps (42%).
13:02:08:WU00:FS00:0x15:Completed  21500000 out of 50000000 steps (43%).
13:04:01:WU00:FS00:0x15:Completed  22000000 out of 50000000 steps (44%).
13:05:55:WU00:FS00:0x15:Completed  22500000 out of 50000000 steps (45%).
13:07:48:WU00:FS00:0x15:Completed  23000000 out of 50000000 steps (46%).
13:09:41:WU00:FS00:0x15:Completed  23500000 out of 50000000 steps (47%).
13:11:35:WU00:FS00:0x15:Completed  24000000 out of 50000000 steps (48%).
13:13:28:WU00:FS00:0x15:Completed  24500000 out of 50000000 steps (49%).
13:15:21:WU00:FS00:0x15:Completed  25000000 out of 50000000 steps (50%).
13:17:15:WU00:FS00:0x15:Completed  25500000 out of 50000000 steps (51%).
13:19:08:WU00:FS00:0x15:Completed  26000000 out of 50000000 steps (52%).
13:21:01:WU00:FS00:0x15:Completed  26500000 out of 50000000 steps (53%).
13:22:55:WU00:FS00:0x15:Completed  27000000 out of 50000000 steps (54%).
13:24:48:WU00:FS00:0x15:Completed  27500000 out of 50000000 steps (55%).
13:26:41:WU00:FS00:0x15:Completed  28000000 out of 50000000 steps (56%).
13:28:35:WU00:FS00:0x15:Completed  28500000 out of 50000000 steps (57%).
13:30:28:WU00:FS00:0x15:Completed  29000000 out of 50000000 steps (58%).
13:32:21:WU00:FS00:0x15:Completed  29500000 out of 50000000 steps (59%).
13:34:15:WU00:FS00:0x15:Completed  30000000 out of 50000000 steps (60%).
13:36:08:WU00:FS00:0x15:Completed  30500000 out of 50000000 steps (61%).
13:38:01:WU00:FS00:0x15:Completed  31000000 out of 50000000 steps (62%).
13:38:41:WU01:FS02:0xa4:Completed 300000 out of 10000000 steps  (3%)
13:39:56:WU00:FS00:0x15:Completed  31500000 out of 50000000 steps (63%).
13:41:49:WU00:FS00:0x15:Completed  32000000 out of 50000000 steps (64%).
13:43:43:WU00:FS00:0x15:Completed  32500000 out of 50000000 steps (65%).
13:45:37:WU00:FS00:0x15:Completed  33000000 out of 50000000 steps (66%).
13:47:31:WU00:FS00:0x15:Completed  33500000 out of 50000000 steps (67%).
13:48:29:WU01:FS02:0xa4:Completed 400000 out of 10000000 steps  (4%)
13:49:25:WU00:FS00:0x15:Completed  34000000 out of 50000000 steps (68%).
13:51:19:WU00:FS00:0x15:Completed  34500000 out of 50000000 steps (69%).
13:53:13:WU00:FS00:0x15:Completed  35000000 out of 50000000 steps (70%).
13:55:07:WU00:FS00:0x15:Completed  35500000 out of 50000000 steps (71%).
13:57:01:WU00:FS00:0x15:Completed  36000000 out of 50000000 steps (72%).
13:58:05:WU01:FS02:0xa4:Completed 500000 out of 10000000 steps  (5%)
13:58:54:WU00:FS00:0x15:Completed  36500000 out of 50000000 steps (73%).
14:00:48:WU00:FS00:0x15:Completed  37000000 out of 50000000 steps (74%).
14:02:42:WU00:FS00:0x15:Completed  37500000 out of 50000000 steps (75%).
14:04:36:WU00:FS00:0x15:Completed  38000000 out of 50000000 steps (76%).
14:06:29:WU00:FS00:0x15:Completed  38500000 out of 50000000 steps (77%).
14:07:51:WU01:FS02:0xa4:Completed 600000 out of 10000000 steps  (6%)
14:08:23:WU00:FS00:0x15:Completed  39000000 out of 50000000 steps (78%).
14:10:17:WU00:FS00:0x15:Completed  39500000 out of 50000000 steps (79%).
14:12:11:WU00:FS00:0x15:Completed  40000000 out of 50000000 steps (80%).
14:14:04:WU00:FS00:0x15:Completed  40500000 out of 50000000 steps (81%).
14:15:58:WU00:FS00:0x15:Completed  41000000 out of 50000000 steps (82%).
14:17:41:WU01:FS02:0xa4:Completed 700000 out of 10000000 steps  (7%)
14:17:52:WU00:FS00:0x15:Completed  41500000 out of 50000000 steps (83%).
14:19:46:WU00:FS00:0x15:Completed  42000000 out of 50000000 steps (84%).
14:21:39:WU00:FS00:0x15:Completed  42500000 out of 50000000 steps (85%).
14:23:33:WU00:FS00:0x15:Completed  43000000 out of 50000000 steps (86%).
14:25:27:WU00:FS00:0x15:Completed  43500000 out of 50000000 steps (87%).
14:27:20:WU00:FS00:0x15:Completed  44000000 out of 50000000 steps (88%).
14:27:38:WU01:FS02:0xa4:Completed 800000 out of 10000000 steps  (8%)
14:29:14:WU00:FS00:0x15:Completed  44500000 out of 50000000 steps (89%).
14:31:08:WU00:FS00:0x15:Completed  45000000 out of 50000000 steps (90%).
14:33:02:WU00:FS00:0x15:Completed  45500000 out of 50000000 steps (91%).
14:34:56:WU00:FS00:0x15:Completed  46000000 out of 50000000 steps (92%).
14:36:50:WU00:FS00:0x15:Completed  46500000 out of 50000000 steps (93%).
14:38:18:WU01:FS02:0xa4:Completed 900000 out of 10000000 steps  (9%)
14:38:44:WU00:FS00:0x15:Completed  47000000 out of 50000000 steps (94%).
14:40:38:WU00:FS00:0x15:Completed  47500000 out of 50000000 steps (95%).
14:42:32:WU00:FS00:0x15:Completed  48000000 out of 50000000 steps (96%).
14:44:26:WU00:FS00:0x15:Completed  48500000 out of 50000000 steps (97%).
14:46:20:WU00:FS00:0x15:Completed  49000000 out of 50000000 steps (98%).
14:46:20:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
14:46:20:WU02:FS00:News: Welcome to Folding@Home
14:46:20:WU02:FS00:Assigned to work server 171.67.108.36
14:46:20:WU02:FS00:Requesting new work unit for slot 00: RUNNING gpu:0:GK104 [GeForce GTX 670] from 171.67.108.36
14:46:20:WU02:FS00:Connecting to 171.67.108.36:8080
14:46:21:WU02:FS00:Downloading 57.46KiB
14:46:22:WU02:FS00:Download complete
14:46:22:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:8070 run:43 clone:8 gen:7 core:0x15 unit:0x0000000c6652edb45122df35bd7fcaed
14:48:13:WU00:FS00:0x15:Completed  49500000 out of 50000000 steps (99%).
14:48:14:WU01:FS02:0xa4:Completed 1000000 out of 10000000 steps  (10%)
14:50:07:WU00:FS00:0x15:Completed  50000000 out of 50000000 steps (100%).
14:50:07:WU00:FS00:0x15:Finished fah_main status=0
14:50:07:WU00:FS00:0x15:Successful run
14:50:07:WU00:FS00:0x15:DynamicWrapper: Finished Work Unit: sleep=10000
14:50:17:WU00:FS00:0x15:Reserved 322372 bytes for xtc file; Cosm status=0
14:50:17:WU00:FS00:0x15:Allocated 322372 bytes for xtc file
14:50:17:WU00:FS00:0x15:- Reading up to 322372 from "00/wudata_01.xtc": Read 322372
14:50:17:WU00:FS00:0x15:Read 322372 bytes from xtc file; available packet space=786108092
14:50:17:WU00:FS00:0x15:xtc file hash check passed.
14:50:17:WU00:FS00:0x15:Reserved 20112 20112 786108092 bytes for arc file=<00/wudata_01.trr> Cosm status=0
14:50:17:WU00:FS00:0x15:Allocated 20112 bytes for arc file
14:50:17:WU00:FS00:0x15:- Reading up to 20112 from "00/wudata_01.trr": Read 20112
14:50:17:WU00:FS00:0x15:Read 20112 bytes from arc file; available packet space=786087980
14:50:17:WU00:FS00:0x15:trr file hash check passed.
14:50:17:WU00:FS00:0x15:Allocated 544 bytes for edr file
14:50:17:WU00:FS00:0x15:Read bedfile
14:50:17:WU00:FS00:0x15:edr file hash check passed.
14:50:17:WU00:FS00:0x15:Allocated 37406 bytes for logfile
14:50:17:WU00:FS00:0x15:Read logfile
14:50:17:WU00:FS00:0x15:GuardedRun: success in DynamicWrapper
14:50:17:WU00:FS00:0x15:GuardedRun: done
14:50:17:WU00:FS00:0x15:Run: GuardedRun completed.
14:50:21:WU00:FS00:0x15:+ Opened results file
14:50:21:WU00:FS00:0x15:- Writing 380946 bytes of core data to disk...
14:50:21:WU00:FS00:0x15:Done: 380434 -> 349204 (compressed to 91.7 percent)
14:50:21:WU00:FS00:0x15:  ... Done.
14:50:21:WU00:FS00:0x15:DeleteFrameFiles: successfully deleted file=00/wudata_01.ckp
14:50:21:WU00:FS00:0x15:Shutting down core 
14:50:21:WU00:FS00:0x15:
14:50:21:WU00:FS00:0x15:Folding@home Core Shutdown: FINISHED_UNIT
14:50:21:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:50:21:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:8070 run:149 clone:5 gen:25 core:0x15 unit:0x0000001e6652edb45122dfcd0e1ed2e0
14:50:21:WU00:FS00:Uploading 341.52KiB to 171.67.108.36
14:50:21:WU00:FS00:Connecting to 171.67.108.36:8080
14:50:22:WU02:FS00:Starting
14:50:22:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 02 -suffix 01 -version 703 -lifeline 3536 -checkpoint 10 -gpu 0 -gpu-vendor nvidia
14:50:22:WU02:FS00:Started FahCore on PID 1116
14:50:22:WU02:FS00:Core PID:5844
14:50:22:WU02:FS00:FahCore 0x15 started
14:50:22:WU02:FS00:0x15:
14:50:22:WU02:FS00:0x15:*------------------------------*
14:50:22:WU02:FS00:0x15:Folding@Home GPU Core
14:50:22:WU02:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
14:50:22:WU02:FS00:0x15:Build host             AmoebaRemote
14:50:22:WU02:FS00:0x15:Board Type             NVIDIA/CUDA
14:50:22:WU02:FS00:0x15:Core                   15
14:50:22:WU02:FS00:0x15:
14:50:22:WU02:FS00:0x15:Window's signal control handler registered.
14:50:22:WU02:FS00:0x15:Preparing to commence simulation
14:50:22:WU02:FS00:0x15:- Looking at optimizations...
14:50:22:WU02:FS00:0x15:DeleteFrameFiles: successfully deleted file=02/wudata_01.ckp
14:50:22:WU02:FS00:0x15:- Created dyn
14:50:22:WU02:FS00:0x15:- Files status OK
14:50:22:WU02:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
14:50:22:WU02:FS00:0x15:- Expanded 58324 -> 257358 (decompressed 441.2 percent)
14:50:22:WU02:FS00:0x15:Called DecompressByteArray: compressed_data_size=58324 data_size=257358, decompressed_data_size=257358 diff=0
14:50:22:WU02:FS00:0x15:- Digital signature verified
14:50:22:WU02:FS00:0x15:
14:50:22:WU02:FS00:0x15:Project: 8070 (Run 43, Clone 8, Gen 7)
14:50:22:WU02:FS00:0x15:
14:50:22:WU02:FS00:0x15:Assembly optimizations on if available.
14:50:22:WU02:FS00:0x15:Entering M.D.
14:50:23:WU00:FS00:Upload complete
14:50:23:WU00:FS00:Server responded WORK_ACK (400)
14:50:23:WU00:FS00:Final credit estimate, 3874.00 points
14:50:23:WU00:FS00:Cleaning up
14:50:24:WU02:FS00:0x15:Tpr hash 02/wudata_01.tpr:  2115211127 3947659710 2124516579 1129211837 175691432
14:50:24:WU02:FS00:0x15:GPU device id=0
14:50:24:WU02:FS00:0x15:Working on Gallium Rubidium Oxygen Manganese Argon Carbon Silicon t=  43.00000
14:50:24:WU02:FS00:0x15:Client config unavailable.
14:50:24:WU02:FS00:0x15:Starting GUI Server
14:51:25:WU02:FS00:0x15:Setting checkpoint frequency: 500000
14:51:25:WU02:FS00:0x15:Completed         3 out of 50000000 steps (0%).
14:53:19:WU02:FS00:0x15:Completed    500000 out of 50000000 steps (1%).
14:55:14:WU02:FS00:0x15:Completed   1000000 out of 50000000 steps (2%).
14:57:08:WU02:FS00:0x15:Completed   1500000 out of 50000000 steps (3%).
14:57:08:WARNING:WU01:FS02:Detected clock skew (1 mins 01 secs), adjusting time estimates
14:57:08:WARNING:WU02:FS00:Detected clock skew (1 mins 01 secs), adjusting time estimates
14:59:01:WARNING:WU01:FS02:Detected clock skew (1 mins 44 secs), adjusting time estimates
14:59:01:WARNING:WU02:FS00:Detected clock skew (1 mins 44 secs), adjusting time estimates
14:59:01:WU02:FS00:0x15:Completed   2000000 out of 50000000 steps (4%).
14:59:19:WU01:FS02:0xa4:Completed 1100000 out of 10000000 steps  (11%)
15:00:54:WARNING:WU01:FS02:Detected clock skew (1 mins 49 secs), adjusting time estimates
15:00:54:WARNING:WU02:FS00:Detected clock skew (1 mins 49 secs), adjusting time estimates
15:00:54:WU02:FS00:0x15:Completed   2500000 out of 50000000 steps (5%).
15:00:59:FS00:Shutting core down
15:00:59:FS02:Shutting core down
15:01:01:WU01:FS02:0xa4:Client no longer detected. Shutting down core 
15:01:01:WU01:FS02:0xa4:
15:01:01:WU01:FS02:0xa4:Folding@home Core Shutdown: CLIENT_DIED
15:01:02:WU01:FS02:FahCore returned: INTERRUPTED (102 = 0x66)
15:01:02:WU02:FS00:0x15:Client no longer detected. Shutting down core 
15:01:02:WU02:FS00:0x15:
15:01:02:WU02:FS00:0x15:Folding@home Core Shutdown: CLIENT_DIED
15:01:02:WU01:FS02:Starting
15:01:02:WARNING:WU01:FS02:Changed SMP threads from 8 to 7 this can cause some work units to fail
15:01:02:WU01:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 703 -lifeline 3536 -checkpoint 10 -np 7
15:01:02:WU01:FS02:Started FahCore on PID 3416
15:01:02:WU01:FS02:Core PID:1396
15:01:02:WU01:FS02:FahCore 0xa4 started
15:01:02:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
15:01:02:WU01:FS02:0xa4:
15:01:02:WU01:FS02:0xa4:*------------------------------*
15:01:02:WU01:FS02:0xa4:Folding@Home Gromacs GB Core
15:01:02:WU01:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
15:01:02:WU01:FS02:0xa4:
15:01:02:WU01:FS02:0xa4:Preparing to commence simulation
15:01:02:WU01:FS02:0xa4:- Looking at optimizations...
15:01:02:WU01:FS02:0xa4:- Files status OK
15:01:02:WU01:FS02:0xa4:- Expanded 108998 -> 443200 (decompressed 406.6 percent)
15:01:02:WU01:FS02:0xa4:Called DecompressByteArray: compressed_data_size=108998 data_size=443200, decompressed_data_size=443200 diff=0
15:01:02:WU01:FS02:0xa4:- Digital signature verified
15:01:02:WU01:FS02:0xa4:
15:01:02:WU01:FS02:0xa4:Project: 7085 (Run 0, Clone 891, Gen 9)
15:01:02:WU01:FS02:0xa4:
15:01:02:WU01:FS02:0xa4:Assembly optimizations on if available.
15:01:02:WU01:FS02:0xa4:Entering M.D.
15:01:08:WU01:FS02:0xa4:Using Gromacs checkpoints
15:01:08:WU01:FS02:0xa4:Mapping NT from 7 to 7 
15:01:08:WU01:FS02:0xa4:Resuming from checkpoint
15:01:08:WU01:FS02:0xa4:Verified 01/wudata_01.log
15:01:08:WU01:FS02:0xa4:Verified 01/wudata_01.trr
15:01:08:WU01:FS02:0xa4:Verified 01/wudata_01.xtc
15:01:08:WU01:FS02:0xa4:Verified 01/wudata_01.edr
15:01:08:WU01:FS02:0xa4:Completed 1000001 out of 10000000 steps  (10%)
15:08:53:WU01:FS02:0xa4:Completed 1100000 out of 10000000 steps  (11%)
15:16:16:WU01:FS02:0xa4:Completed 1200000 out of 10000000 steps  (12%)
15:23:34:WU01:FS02:0xa4:Completed 1300000 out of 10000000 steps  (13%)
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU folding - random freezes

Post by bruce »

Welcome to foldiongforum.org, Breach.

Are you saying that FahCore_15 is being used now where FahCore_11 might have been used in the past? The stability of the two cores is not appreciably different. Both have had an occasional bad WU but that would show as an error in your log, which I don't see.

Most likely it's either a driver issue and the long pauses you're talking about will show as an error in Windows Event Viewer or it's a heat issue (which also might show in the Event Viewer). The 670 uses less power than the 295 but you may have selected a board manfacturer that dissipates heat less effectively. Have you tried adjusting you fan profile?
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GPU folding - random freezes

Post by Breach »

Yes, I'm using FahCore_15 now. Thanks. No, it's not heat related. I have a fan curve set up to keep load temps under 70C in order to prevent Kepler throttling. My power usage (TDP) is about 80% under FAH load.
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GPU folding - random freezes

Post by Breach »

Any other ideas? I reverted my CPU/GPU clocks to stock just in case it could be OC related and I'm still experiencing the exact same behaviour... When it happens FAH would afterwards log:
23:31:54:WARNING:WU00:FS00:Detected clock skew (1 mins 56 secs), adjusting time estimates

Probably indicating that freeze discrepancy. There are absolutely no errors logged - I only see warnings that Windows' services were unable to connect to the Internet... looks like the computer is completely not responsive during that time (which I see myself). It could be either:

a) The FAH core - unlikely as there'd be other reports I'd guess
b) The nvidia driver - 314.22 - possible, but see above
c) My hardware - which is 100% stable in all other circumstances except for FAH GPU folding... the last BSOD I saw was probably 9 months ago...
d) Windows itself, driver, etc.

Really strange... not really sure how to proceed...
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU folding - random freezes

Post by bruce »

You'll get the clock skew message any time the FahCore is stopped / resumed except if the actual pause in production is very short. It has nothing to do with the freeze ... except that the freeze forced you to pause/resume processing that WU.

Check the Windows event viewer. Some hardware and/or drivers have detected GPU errors which used to result in a BSOD but the GPU can be reset so now they have become recoverable errors ... but they do get logged.
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GPU folding - random freezes

Post by Breach »

Yes, the WU was apparently paused in the sense of not being worked on during the duration of the freeze - otherwise the core does not shutdown/resume.

No, no driver crashes - I know what a TDR crash looks like - with a TDR crash there's a screen redraw, a pop-up saying your driver has stopped responding and has recovered and there's a display warning entry in the event log - don't see that...
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
johnerz
Posts: 30
Joined: Thu Jun 19, 2008 3:21 pm
Hardware configuration: Intel 2600K @ stock
EVGA GTX 970 FTW+ @ stock
8gb Samsung green @ 1.55v 2040 9,10,10,1T
Asus P67 Sabertooth bios version 3209
Corsair hx 1000 psu
WD Black 500 GB
Win 7 64, updated,Microsoft Security Essentials - updated daily

SupermicroH8QGi+-F, 4 X AMD 6168 @ 1.9 no OC
Corsair HX 850 PSU 16 x 2GB HyperX 1600 ram
Ubuntu 12.04, using the musky/tear mods

Updated 03 Feb 2015

Re: GPU folding - random freezes

Post by johnerz »

There have been reports of a similar issue related to windows Aero, and it is best to have it switched on, other than that have you changed the Priority of the Process?
johnerz

Intel 2600K @ stock
EVGA 670 FTW @ stock
12GB 1600
Asus P67 Sabertooth bios version 3209
Corsair hx 1000 psu
WD Black 500 GB

Win 7 64, updated
Microsoft Security Essentials - updated daily

Updated 4 Dec 2012
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GPU folding - random freezes

Post by Breach »

Thanks. I am using Windows 8 - hence no Aero ;-) No, I have not changed the priority of the process.
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GPU folding - random freezes

Post by Breach »

Just to give an update to any future potential victims - I have isolated the issue to an apparent incompatibility with ESET Smart Security 6's firewall module. I am following up on this on the ESET forum:

http://www.wilderssecurity.com/showthread.php?t=345564
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU folding - random freezes

Post by bruce »

I know nothing about your security software but we have seen a few reports of problems with other AV products. Suggestions that might or might not help:
Adjust the virus scan to avoid the fah work directory.
Authorize FAHClient to connect to the internet.
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GPU folding - random freezes

Post by Breach »

Unfortunately this saga continued, and though I did extensive testing this turned out to be unrelated to ESS after all.

The issue seems fixed though (finally!). I moved to the latest nVidia beta drivers (320.00) and also flashed the latest BIOS for my EVGA GTX 670 FTW card (v80.04.5C) and it's gone for good now. Not sure whether it was the drivers or the BIOS, but I suspect the latter. Hopefully this will be useful to someone else.

Cheers.
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
Post Reply