CPU WU does not finish or progress

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
ChuckSommer
Posts: 10
Joined: Wed Sep 09, 2015 2:25 pm

CPU WU does not finish or progress

Post by ChuckSommer »

I have a CPU WU that has been running a long time but not making progress.

On the FAHControl window it makes some progress, but the log does not reflect that progress.

The job is listed at 50,500,000 steps (yes I double checked counting the zeros).

I am running on a Win 8.1 machine, Intel I7 Quad Core, NVIDIA GeForce GTX 980 Ti
I just started folding on Folding@Home having loaded the SW something after 1-Sep.

Below is a portion of my LOG file. I had caused a PAUSE then FOLD to restart things so the log file starts from the restart:

Code: Select all

17:11:42:WU00:FS00:Starting
17:11:42:WARNING:WU00:FS00:Changed SMP threads from 2 to 1 this can cause some work units to fail
17:11:42:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 5680 -checkpoint 5 -np 1
17:11:42:WU00:FS00:Started FahCore on PID 1480
17:11:42:WU00:FS00:Core PID:280
17:11:42:WU00:FS00:FahCore 0xa4 started
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:*------------------------------*
17:11:42:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
17:11:42:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:Preparing to commence simulation
17:11:42:WU00:FS00:0xa4:- Looking at optimizations...
17:11:42:WU00:FS00:0xa4:- Files status OK
17:11:42:WU00:FS00:0xa4:- Expanded 1866264 -> 3294644 (decompressed 176.5 percent)
17:11:42:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1866264 data_size=3294644, decompressed_data_size=3294644 diff=0
17:11:42:WU00:FS00:0xa4:- Digital signature verified
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:Project: 7520 (Run 19, Clone 27, Gen 99)
17:11:42:WU00:FS00:0xa4:
17:11:42:WU00:FS00:0xa4:Assembly optimizations on if available.
17:11:42:WU00:FS00:0xa4:Entering M.D.
17:11:48:WU00:FS00:0xa4:Using Gromacs checkpoints
17:11:48:WU00:FS00:0xa4:Mapping NT from 1 to 1 
17:11:48:WU00:FS00:0xa4:Resuming from checkpoint
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.log
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.trr
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
17:11:48:WU00:FS00:0xa4:Verified 00/wudata_01.edr
17:11:49:WU00:FS00:0xa4:Completed 606150 out of 50500000 steps  (1%)
17:11:59:Removing old file 'configs/config-20150909-135939.xml'
17:11:59:Saving configuration to config.xml
17:11:59:<config>
17:11:59:  <!-- Folding Core -->
17:11:59:  <checkpoint v='5'/>
17:11:59:
17:11:59:  <!-- Network -->
17:11:59:  <proxy v=':8080'/>
17:11:59:
17:11:59:  <!-- User Information -->
17:11:59:  <passkey v='********************************'/>
17:11:59:  <team v='40051'/>
17:11:59:  <user v='ChuckSommer'/>
17:11:59:
17:11:59:  <!-- Folding Slots -->
17:11:59:  <slot id='0' type='CPU'>
17:11:59:    <cpus v='1'/>
17:11:59:  </slot>
17:11:59:  <slot id='1' type='GPU'>
17:11:59:    <client-type v='bigadv'/>
17:11:59:  </slot>
17:11:59:</config>
17:13:48:108:127.0.0.1:New Web connection
17:14:28:WU02:FS01:0x18:Completed 7840000 out of 16000000 steps (49%)
17:18:35:WU02:FS01:0x18:Completed 8000000 out of 16000000 steps (50%)
17:20:20:FS00:Paused
17:20:20:FS01:Paused
17:20:20:FS00:Shutting core down
17:20:20:FS01:Shutting core down
17:20:20:WU02:FS01:0x18:WARNING:Console control signal 1 on PID 5196
17:20:20:WU02:FS01:0x18:Exiting, please wait. . .
17:20:21:WU02:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
17:20:22:WU00:FS00:0xa4:Client no longer detected. Shutting down core 
17:20:22:WU00:FS00:0xa4:
17:20:22:WU00:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
17:20:23:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
17:20:39:FS00:Unpaused
17:20:39:FS01:Unpaused
17:20:39:WU00:FS00:Starting
17:20:39:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 5680 -checkpoint 5 -np 1
17:20:39:WU00:FS00:Started FahCore on PID 5288
17:20:39:WU00:FS00:Core PID:1052
17:20:39:WU00:FS00:FahCore 0xa4 started
17:20:39:WU02:FS01:Starting
17:20:39:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 02 -suffix 01 -version 704 -lifeline 5680 -checkpoint 5 -gpu 0 -gpu-vendor nvidia
17:20:39:WU02:FS01:Started FahCore on PID 4360
17:20:39:WU02:FS01:Core PID:932
17:20:39:WU02:FS01:FahCore 0x18 started
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:*------------------------------*
17:20:40:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
17:20:40:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:Preparing to commence simulation
17:20:40:WU00:FS00:0xa4:- Looking at optimizations...
17:20:40:WU00:FS00:0xa4:- Files status OK
17:20:40:WU00:FS00:0xa4:- Expanded 1866264 -> 3294644 (decompressed 176.5 percent)
17:20:40:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1866264 data_size=3294644, decompressed_data_size=3294644 diff=0
17:20:40:WU00:FS00:0xa4:- Digital signature verified
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:Project: 7520 (Run 19, Clone 27, Gen 99)
17:20:40:WU00:FS00:0xa4:
17:20:40:WU00:FS00:0xa4:Assembly optimizations on if available.
17:20:40:WU00:FS00:0xa4:Entering M.D.
17:20:40:WU02:FS01:0x18:*********************** Log Started 2015-09-10T17:20:39Z ***********************
17:20:40:WU02:FS01:0x18:Project: 9430 (Run 30, Clone 1, Gen 115)
17:20:40:WU02:FS01:0x18:Unit: 0x00000087ab40413855474c1071a0c13e
17:20:40:WU02:FS01:0x18:CPU: 0x00000000000000000000000000000000
17:20:40:WU02:FS01:0x18:Machine: 1
17:20:40:WU02:FS01:0x18:Digital signatures verified
17:20:40:WU02:FS01:0x18:Folding@home GPU core18
17:20:40:WU02:FS01:0x18:Version 0.0.4
17:20:40:WU02:FS01:0x18:  Found a checkpoint file
17:20:45:WU00:FS00:0xa4:Using Gromacs checkpoints
17:20:45:WU00:FS00:0xa4:Mapping NT from 1 to 1 
17:20:46:WU00:FS00:0xa4:Resuming from checkpoint
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.log
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.trr
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
17:20:46:WU00:FS00:0xa4:Verified 00/wudata_01.edr
17:20:46:WU00:FS00:0xa4:Completed 607130 out of 50500000 steps  (1%)
17:20:48:WU02:FS01:0x18:Completed 8050000 out of 16000000 steps (50%)
17:20:48:WU02:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
17:23:43:WU02:FS01:0x18:Completed 8160000 out of 16000000 steps (51%)
Mod edit: Please use Code tags around log file listings
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: CPU WU does not finish or progress

Post by bruce »

Please include the first couple of pages of your log (where the configuration is shown).

There is no "bigadv" for GPU projects and for CPU projects, you need a minimum of 12 threads so it doesn't apply to your dual CPU. Apparently there's a new source circulating inaccurate recommendations.

With only a single CPU thread folding project 7520, it's going to take a very, very long time to complete such a big assignment.
ChuckSommer
Posts: 10
Joined: Wed Sep 09, 2015 2:25 pm

Re: CPU WU does not finish or progress

Post by ChuckSommer »

Hi Bruce,

Thanks for telling me what I am doing wrong.
I thought there was a 24 thread minimum for the 'bigadv' switch.
I had been using 6 CPUs for this project, but reduced it to 1 when I saw it was not making progress.
I did not make it clear, but I think this WU (or project) is broken and wish to flush it from my machine.
It looked suspicious that a CPU project would require 50.50MegaSteps where the largest GPU project I saw took 16.00MegaSteps.

I will remove the 'bigadv' switch from my GPU as I have been told (another thread) that it does not apply to GPU projects.
ChuckSommer
Posts: 10
Joined: Wed Sep 09, 2015 2:25 pm

Re: CPU WU does not finish or progress

Post by ChuckSommer »

Hi Bruce,

I have paused the folding, rebooted my system and restarted folding.

Below is the log

Code: Select all

*********************** Log Started 2015-09-10T18:17:49Z ***********************
18:17:49:************************* Folding@home Client *************************
18:17:49:      Website: http://folding.stanford.edu/
18:17:49:    Copyright: (c) 2009-2014 Stanford University
18:17:49:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:17:49:         Args: 
18:17:49:       Config: C:/Users/Chuck/AppData/Roaming/FAHClient/config.xml
18:17:49:******************************** Build ********************************
18:17:49:      Version: 7.4.4
18:17:49:         Date: Mar 4 2014
18:17:49:         Time: 20:26:54
18:17:49:      SVN Rev: 4130
18:17:49:       Branch: fah/trunk/client
18:17:49:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
18:17:49:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
18:17:49:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
18:17:49:     Platform: win32 XP
18:17:49:         Bits: 32
18:17:49:         Mode: Release
18:17:49:******************************* System ********************************
18:17:49:          CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
18:17:49:       CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
18:17:49:         CPUs: 8
18:17:49:       Memory: 31.95GiB
18:17:49:  Free Memory: 30.15GiB
18:17:49:      Threads: WINDOWS_THREADS
18:17:49:   OS Version: 6.2
18:17:49:  Has Battery: false
18:17:49:   On Battery: false
18:17:49:   UTC Offset: -4
18:17:49:          PID: 5408
18:17:49:          CWD: C:/Users/Chuck/AppData/Roaming/FAHClient
18:17:49:           OS: Windows 8.1 Pro
18:17:49:      OS Arch: AMD64
18:17:49:         GPUs: 1
18:17:49:        GPU 0: NVIDIA:5 GM200 [GeForce GTX 980 Ti]
18:17:49:         CUDA: 5.2
18:17:49:  CUDA Driver: 7050
18:17:49:Win32 Service: false
18:17:49:***********************************************************************
18:17:49:<config>
18:17:49:  <!-- Folding Core -->
18:17:49:  <checkpoint v='5'/>
18:17:49:
18:17:49:  <!-- Network -->
18:17:49:  <proxy v=':8080'/>
18:17:49:
18:17:49:  <!-- User Information -->
18:17:49:  <passkey v='********************************'/>
18:17:49:  <team v='40051'/>
18:17:49:  <user v='ChuckSommer'/>
18:17:49:
18:17:49:  <!-- Folding Slots -->
18:17:49:  <slot id='0' type='CPU'>
18:17:49:    <cpus v='1'/>
18:17:49:    <paused v='true'/>
18:17:49:  </slot>
18:17:49:  <slot id='1' type='GPU'>
18:17:49:    <client-type v='bigadv'/>
18:17:49:    <paused v='true'/>
18:17:49:  </slot>
18:17:49:</config>
18:17:49:Trying to access database...
18:17:49:Successfully acquired database lock
18:17:49:Enabled folding slot 00: PAUSED cpu:1 (by user)
18:17:49:Enabled folding slot 01: PAUSED gpu:0:GM200 [GeForce GTX 980 Ti] (by user)
18:23:27:FS00:Unpaused
18:23:27:FS01:Unpaused
18:23:27:WU02:FS01:Starting
18:23:27:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 02 -suffix 01 -version 704 -lifeline 5408 -checkpoint 5 -gpu 0 -gpu-vendor nvidia
18:23:27:WU02:FS01:Started FahCore on PID 2704
18:23:27:WU02:FS01:Core PID:1296
18:23:27:WU02:FS01:FahCore 0x18 started
18:23:27:WU00:FS00:Starting
18:23:27:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Chuck/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 5408 -checkpoint 5 -np 1
18:23:27:WU00:FS00:Started FahCore on PID 3760
18:23:27:WU00:FS00:Core PID:668
18:23:27:WU00:FS00:FahCore 0xa4 started
18:23:28:WU02:FS01:0x18:*********************** Log Started 2015-09-10T18:23:27Z ***********************
18:23:28:WU02:FS01:0x18:Project: 9430 (Run 30, Clone 1, Gen 115)
18:23:28:WU02:FS01:0x18:Unit: 0x00000087ab40413855474c1071a0c13e
18:23:28:WU02:FS01:0x18:CPU: 0x00000000000000000000000000000000
18:23:28:WU02:FS01:0x18:Machine: 1
18:23:28:WU02:FS01:0x18:Digital signatures verified
18:23:28:WU02:FS01:0x18:Folding@home GPU core18
18:23:28:WU02:FS01:0x18:Version 0.0.4
18:23:28:WU02:FS01:0x18:  Found a checkpoint file
18:23:28:WU00:FS00:0xa4:
18:23:28:WU00:FS00:0xa4:*------------------------------*
18:23:28:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
18:23:28:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
18:23:28:WU00:FS00:0xa4:
18:23:28:WU00:FS00:0xa4:Preparing to commence simulation
18:23:28:WU00:FS00:0xa4:- Looking at optimizations...
18:23:28:WU00:FS00:0xa4:- Files status OK
18:23:28:WU00:FS00:0xa4:- Expanded 1866264 -> 3294644 (decompressed 176.5 percent)
18:23:28:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1866264 data_size=3294644, decompressed_data_size=3294644 diff=0
18:23:28:WU00:FS00:0xa4:- Digital signature verified
18:23:28:WU00:FS00:0xa4:
18:23:28:WU00:FS00:0xa4:Project: 7520 (Run 19, Clone 27, Gen 99)
18:23:28:WU00:FS00:0xa4:
18:23:28:WU00:FS00:0xa4:Assembly optimizations on if available.
18:23:28:WU00:FS00:0xa4:Entering M.D.
18:23:34:WU00:FS00:0xa4:Using Gromacs checkpoints
18:23:34:WU00:FS00:0xa4:Mapping NT from 1 to 1 
18:23:34:WU00:FS00:0xa4:Resuming from checkpoint
18:23:34:WU00:FS00:0xa4:Verified 00/wudata_01.log
18:23:34:WU00:FS00:0xa4:Verified 00/wudata_01.trr
18:23:34:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
18:23:34:WU00:FS00:0xa4:Verified 00/wudata_01.edr
18:23:35:WU00:FS00:0xa4:Completed 617870 out of 50500000 steps  (1%)
18:23:36:WU02:FS01:0x18:Completed 10150000 out of 16000000 steps (63%)
18:23:36:WU02:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
18:23:55:Removing old file 'configs/config-20150909-153240.xml'
18:23:55:Saving configuration to config.xml
18:23:55:<config>
18:23:55:  <!-- Folding Core -->
18:23:55:  <checkpoint v='5'/>
18:23:55:
18:23:55:  <!-- Network -->
18:23:55:  <proxy v=':8080'/>
18:23:55:
18:23:55:  <!-- User Information -->
18:23:55:  <passkey v='********************************'/>
18:23:55:  <team v='40051'/>
18:23:55:  <user v='ChuckSommer'/>
18:23:55:
18:23:55:  <!-- Folding Slots -->
18:23:55:  <slot id='0' type='CPU'>
18:23:55:    <cpus v='1'/>
18:23:55:  </slot>
18:23:55:  <slot id='1' type='GPU'>
18:23:55:    <client-type v='bigadv'/>
18:23:55:  </slot>
18:23:55:</config>
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: CPU WU does not finish or progress

Post by bruce »

I agree that t of 50500000 steps must be wrong. I'll report it to the server's owner.

This may just be Project: 7520 (Run 19, Clone 27, Gen 99) but I think that's unlikely.

That server has been having troubles recently.

From This post, the correct number of steps should be 500,000.
ChuckSommer
Posts: 10
Joined: Wed Sep 09, 2015 2:25 pm

Re: CPU WU does not finish or progress

Post by ChuckSommer »

Thanks for getting back to me.

So how do I flush it and move on to something useful?

Or should I donate to Seti@home instead.

Chuck
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: CPU WU does not finish or progress

Post by bruce »

The most direct method of dumping a WU is to delete (all of) the slot(s) that could process the WU, save the configuration. Wait briefly, then re-add the slot.
ChuckSommer
Posts: 10
Joined: Wed Sep 09, 2015 2:25 pm

Re: CPU WU does not finish or progress

Post by ChuckSommer »

Thanks Bruce, I have moved on to

18:59:38:WU00:FS00:0xa4:Project: 9015 (Run 454, Clone 1, Gen 38)


Chuck
Post Reply