Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%
Posted: Thu Oct 30, 2008 5:40 pm
Hi,
found my smp client hung on the mentioned WU
part of logfile on client startup
part of logfile on unit startup
part of logfile on unit crashing
Following the sticky post on returning partial results I downloaded this QFIX
when I ran it I got
so I tried to send all
as the "Folding@Home Client Shutdown." message appeared instantly I then tried to send just unit 6
So now I'm stuck. As, after more than a days work, the WU is at 98% and I'd hate to lose this,
has anyone got any idea how I can return the almost finished unit ?
thanks
Marc
edit : any tips on how to complete it are welcome as well
found my smp client hung on the mentioned WU
part of logfile on client startup
Code: Select all
--- Opening Log file [October 23 03:18:40 UTC]
# Windows SMP Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.22 SMP Beta2
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Program Files\fah_smp_1
Executable: C:\Program Files\fah_smp_1\Folding@home-Win32-x86.exe
Arguments: -smp -config -forceasm -verbosity 9
Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.
[03:18:40] - Ask before connecting: No
[03:18:40] - User name: Duffelcoat-minion-1480 (Team 53338)
[03:18:40] - User ID: 763D5F8B1A3ADB10
[03:18:40] - Machine ID: 1
[03:18:40]
[03:18:40] Configuring Folding@Home...
[03:18:46] - Ask before connecting: No
[03:18:46] - User name: Duffelcoat-minion-1480 (Team 53338)
[03:18:46] - User ID: 763D5F8B1A3ADB10
[03:18:46] - Machine ID: 1
[03:18:46]
[03:18:46] Loaded queue successfully.
[03:18:46]
[03:18:46] - Autosending finished units... [October 23 03:18:46 UTC]
[03:18:46] + Processing work unit
Code: Select all
[01:40:59] + Received work.
[01:40:59] Trying to send all finished work units
[01:40:59] + No unsent completed units remaining.
[01:40:59] + Closed connections
[01:40:59]
[01:40:59] + Processing work unit
[01:40:59] Work type a1 not eligible for variable processors
[01:40:59] Core required: FahCore_a1.exe
[01:40:59] Core found.
[01:40:59] Using generic mpiexec calls
[01:40:59] Working on queue slot 06 [October 29 01:40:59 UTC]
[01:40:59] + Working ...
[01:40:59] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 10 -forceasm -verbose -lifeline 4504 -version 622'
[01:40:59]
[01:40:59] *------------------------------*
[01:40:59] Folding@Home Gromacs SMP Core
[01:40:59] Version 1.74 (March 10, 2007)
[01:40:59]
[01:40:59] Preparing to commence simulation
[01:40:59] - Ensuring status. Please wait.
[01:41:16] - Assembly optimizations manually forced on.
[01:41:16] - Not checking prior termination.
[01:41:36] - Expanded 4823966 -> 24810145 (decompressed 514.3 percent)
[01:41:36] - Starting from initial work packet
[01:41:36]
[01:41:36] Project: 2665 (Run 2, Clone 430, Gen 46)
[01:41:36]
[01:41:40] Assembly optimizations on if available.
[01:41:40] Entering M.D.
[01:41:46] Rejecting checkpoint
[01:41:48] cosylations
[01:41:48] Writing local files
[01:41:49]
[01:41:49] Writing local files
[01:42:01] Extra SSE boost OK.
[01:42:01] Writing local files
[01:42:02] Completed 0 out of 250000 steps (0 percent)
[01:52:02] Timered checkpoint triggered.
[02:02:03] Timered checkpoint triggered.
[02:02:55] Writing local files
[02:02:56] Completed 2500 out of 250000 steps (1 percent)
Code: Select all
[11:53:42] Completed 242500 out of 250000 steps (97 percent)
[12:03:43] Timered checkpoint triggered.
[12:13:43] Timered checkpoint triggered.
[12:14:47] Writing local files
[12:14:47] Completed 245000 out of 250000 steps (98 percent)
[12:24:48] Timered checkpoint triggered.
[12:30:23] Warning: long 1-4 interactions
[12:30:23] Gromacs cannot continue further.
[12:30:23] Going to send back what have done.
[12:30:23] logfile size: 193520
[12:30:23] - Writing 194056 bytes of core data to disk...
[12:30:23] ... Done.
[12:30:23] - Failed to delete work/wudata_06.sas
[12:30:23] - Failed to delete work/wudata_06.goe
[12:30:23] Warning: check for stray files
[12:32:23]
[12:32:23] Folding@home Core Shutdown: EARLY_UNIT_END
[12:32:23]
[12:32:23] Folding@home Core Shutdown: EARLY_UNIT_END
[12:32:28] CoreStatus = 7B (123)
[12:32:28] Client-core communications error: ERROR 0x7b
[12:32:28] This is a sign of more serious problems, shutting down.
[15:18:56] - Autosending finished units... [October 30 15:18:56 UTC]
[15:18:56] Trying to send all finished work units
[15:18:56] + No unsent completed units remaining.
[15:18:56] - Autosend completed
Code: Select all
- Windows/x86 : qfix.exe (10.00 KB)
Compiled with : i586-mingw32msvc-gcc -Wall -DSYSTYPE=1 -s -O2 -o qfix.exe qfix.c
Compiled on : Debian GNU/Linux 4.0 "Etch" with gcc version 3.4.5 (mingw special)
Modified : Sat Nov 17 14:09:56 2007
Code: Select all
C:\Program Files\fah_smp_1>qfix
entry 7, status 0, address 171.64.65.64:8080
entry 8, status 0, address 171.64.65.64:8080
entry 9, status 0, address 171.64.65.64:8080
entry 0, status 0, address 171.64.65.64:8080
entry 1, status 0, address 171.64.65.64:8080
entry 2, status 0, address 171.64.65.64:8080
entry 3, status 0, address 171.64.65.63:8080
entry 4, status 0, address 171.64.65.64:8080
entry 5, status 0, address 171.64.65.64:8080
entry 6, status 1, address 171.64.65.64:8080
Found results <work\wuresults_06.dat>: proj 2665, run 2, clone 430, gen 46
-- queue entry: proj 2665, run 2, clone 430, gen 46
-- queue entry isn't empty
File is OK
Code: Select all
Directory of C:\Program Files\fah_smp_1
28/07/2008 16:58 422.400 Folding@home-Win32-x86.exe
1 File(s) 422.400 bytes
0 Dir(s) 12.395.061.248 bytes free
C:\Program Files\fah_smp_1>Folding@home-Win32-x86.exe -send all
Note: Please read the license agreement (Folding@home-Win32-x86.exe -license). F
urther
use of this software requires that you have read and accepted this agreement.
--- Opening Log file [October 30 17:09:27 UTC]
# Windows CPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.22 SMP Beta2
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Program Files\fah_smp_1
Executable: Folding@home-Win32-x86.exe
Arguments: -send all
[17:09:27] - Ask before connecting: No
[17:09:27] - User name: Duffelcoat-minion-1480 (Team 53338)
[17:09:27] - User ID: 763D5F8B1A3ADB10
[17:09:27] - Machine ID: 1
[17:09:27]
[17:09:27] Loaded queue successfully.
[17:09:27] Attempting to return result(s) to server...
Folding@Home Client Shutdown.
Code: Select all
C:\Program Files\fah_smp_1>Folding@home-Win32-x86.exe -send #6
Note: Please read the license agreement (Folding@home-Win32-x86.exe -license). F
urther
use of this software requires that you have read and accepted this agreement.
--- Opening Log file [October 30 17:09:50 UTC]
# Windows CPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.22 SMP Beta2
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Program Files\fah_smp_1
Executable: Folding@home-Win32-x86.exe
Arguments: -send #6
[17:09:50] - Ask before connecting: No
[17:09:50] - User name: Duffelcoat-minion-1480 (Team 53338)
[17:09:50] - User ID: 763D5F8B1A3ADB10
[17:09:50] - Machine ID: 1
[17:09:50]
[17:09:50] Loaded queue successfully.
[17:09:50] Attempting to return result(s) to server...
[17:09:50] Project: 2665 (Run 2, Clone 562, Gen 60)
[17:09:50] - Failed to send unit 00 to server
Folding@Home Client Shutdown.
has anyone got any idea how I can return the almost finished unit ?
thanks
Marc
edit : any tips on how to complete it are welcome as well