CPU WU does not finish or progress
Posted: Thu Sep 10, 2015 5:36 pm
I have a CPU WU that has been running a long time but not making progress.
On the FAHControl window it makes some progress, but the log does not reflect that progress.
The job is listed at 50,500,000 steps (yes I double checked counting the zeros).
I am running on a Win 8.1 machine, Intel I7 Quad Core, NVIDIA GeForce GTX 980 Ti
I just started folding on Folding@Home having loaded the SW something after 1-Sep.
Below is a portion of my LOG file. I had caused a PAUSE then FOLD to restart things so the log file starts from the restart:
Mod edit: Please use Code tags around log file listings
On the FAHControl window it makes some progress, but the log does not reflect that progress.
The job is listed at 50,500,000 steps (yes I double checked counting the zeros).
I am running on a Win 8.1 machine, Intel I7 Quad Core, NVIDIA GeForce GTX 980 Ti
I just started folding on Folding@Home having loaded the SW something after 1-Sep.
Below is a portion of my LOG file. I had caused a PAUSE then FOLD to restart things so the log file starts from the restart:
Code: Select all
17:11:42:WU00:FS00:Starting
17:11:42:WARNING:WU00:FS00:Changed SMP threads from 2 to 1 this can cause some work units to fail
17:11:42:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 5680 -checkpoint 5 -np 1
17:11:42:WU00:FS00:Started FahCore on PID 1480
17:11:42:WU00:FS00:Core PID:280
17:11:42:WU00:FS00:FahCore 0xa4 started
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:*------------------------------*
17:11:42:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
17:11:42:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:Preparing to commence simulation
17:11:42:WU00:FS00:0xa4:- Looking at optimizations...
17:11:42:WU00:FS00:0xa4:- Files status OK
17:11:42:WU00:FS00:0xa4:- Expanded 1866264 -> 3294644 (decompressed 176.5 percent)
17:11:42:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1866264 data_size=3294644, decompressed_data_size=3294644 diff=0
17:11:42:WU00:FS00:0xa4:- Digital signature verified
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:Project: 7520 (Run 19, Clone 27, Gen 99)
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:Assembly optimizations on if available.
17:11:42:WU00:FS00:0xa4:Entering M.D.
17:11:48:WU00:FS00:0xa4:Using Gromacs checkpoints
17:11:48:WU00:FS00:0xa4:Mapping NT from 1 to 1
17:11:48:WU00:FS00:0xa4:Resuming from checkpoint
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.log
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.trr
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.edr
17:11:49:WU00:FS00:0xa4:Completed 606150 out of 50500000 steps (1%)
17:11:59:Removing old file 'configs/config-20150909-135939.xml'
17:11:59:Saving configuration to config.xml
17:11:59:<config>
17:11:59: <!-- Folding Core -->
17:11:59: <checkpoint v='5'/>
17:11:59:
17:11:59: <!-- Network -->
17:11:59: <proxy v=':8080'/>
17:11:59:
17:11:59: <!-- User Information -->
17:11:59: <passkey v='********************************'/>
17:11:59: <team v='40051'/>
17:11:59: <user v='ChuckSommer'/>
17:11:59:
17:11:59: <!-- Folding Slots -->
17:11:59: <slot id='0' type='CPU'>
17:11:59: <cpus v='1'/>
17:11:59: </slot>
17:11:59: <slot id='1' type='GPU'>
17:11:59: <client-type v='bigadv'/>
17:11:59: </slot>
17:11:59:</config>
17:13:48:108:127.0.0.1:New Web connection
17:14:28:WU02:FS01:0x18:Completed 7840000 out of 16000000 steps (49%)
17:18:35:WU02:FS01:0x18:Completed 8000000 out of 16000000 steps (50%)
17:20:20:FS00:Paused
17:20:20:FS01:Paused
17:20:20:FS00:Shutting core down
17:20:20:FS01:Shutting core down
17:20:20:WU02:FS01:0x18:WARNING:Console control signal 1 on PID 5196
17:20:20:WU02:FS01:0x18:Exiting, please wait. . .
17:20:21:WU02:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
17:20:22:WU00:FS00:0xa4:Client no longer detected. Shutting down core
17:20:22:WU00:FS00:0xa4:
17:20:22:WU00:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
17:20:23:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
17:20:39:FS00:Unpaused
17:20:39:FS01:Unpaused
17:20:39:WU00:FS00:Starting
17:20:39:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 5680 -checkpoint 5 -np 1
17:20:39:WU00:FS00:Started FahCore on PID 5288
17:20:39:WU00:FS00:Core PID:1052
17:20:39:WU00:FS00:FahCore 0xa4 started
17:20:39:WU02:FS01:Starting
17:20:39:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 02 -suffix 01 -version 704 -lifeline 5680 -checkpoint 5 -gpu 0 -gpu-vendor nvidia
17:20:39:WU02:FS01:Started FahCore on PID 4360
17:20:39:WU02:FS01:Core PID:932
17:20:39:WU02:FS01:FahCore 0x18 started
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:*------------------------------*
17:20:40:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
17:20:40:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:Preparing to commence simulation
17:20:40:WU00:FS00:0xa4:- Looking at optimizations...
17:20:40:WU00:FS00:0xa4:- Files status OK
17:20:40:WU00:FS00:0xa4:- Expanded 1866264 -> 3294644 (decompressed 176.5 percent)
17:20:40:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1866264 data_size=3294644, decompressed_data_size=3294644 diff=0
17:20:40:WU00:FS00:0xa4:- Digital signature verified
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:Project: 7520 (Run 19, Clone 27, Gen 99)
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:Assembly optimizations on if available.
17:20:40:WU00:FS00:0xa4:Entering M.D.
17:20:40:WU02:FS01:0x18:*********************** Log Started 2015-09-10T17:20:39Z ***********************
17:20:40:WU02:FS01:0x18:Project: 9430 (Run 30, Clone 1, Gen 115)
17:20:40:WU02:FS01:0x18:Unit: 0x00000087ab40413855474c1071a0c13e
17:20:40:WU02:FS01:0x18:CPU: 0x00000000000000000000000000000000
17:20:40:WU02:FS01:0x18:Machine: 1
17:20:40:WU02:FS01:0x18:Digital signatures verified
17:20:40:WU02:FS01:0x18:Folding@home GPU core18
17:20:40:WU02:FS01:0x18:Version 0.0.4
17:20:40:WU02:FS01:0x18: Found a checkpoint file
17:20:45:WU00:FS00:0xa4:Using Gromacs checkpoints
17:20:45:WU00:FS00:0xa4:Mapping NT from 1 to 1
17:20:46:WU00:FS00:0xa4:Resuming from checkpoint
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.log
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.trr
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.edr
17:20:46:WU00:FS00:0xa4:Completed 607130 out of 50500000 steps (1%)
17:20:48:WU02:FS01:0x18:Completed 8050000 out of 16000000 steps (50%)
17:20:48:WU02:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
17:23:43:WU02:FS01:0x18:Completed 8160000 out of 16000000 steps (51%)