Page 1 of 1

Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Posted: Thu Oct 30, 2008 5:40 pm
by matte.2

found my smp client hung on the mentioned WU

part of logfile on client startup

Code: Select all

--- Opening Log file [October 23 03:18:40 UTC] 

# Windows SMP Console Edition #################################################

                       Folding@Home Client Version 6.22 SMP Beta2



Launch directory: C:\Program Files\fah_smp_1
Executable: C:\Program Files\fah_smp_1\Folding@home-Win32-x86.exe
Arguments: -smp -config -forceasm -verbosity 9 

 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[03:18:40] - Ask before connecting: No
[03:18:40] - User name: Duffelcoat-minion-1480 (Team 53338)
[03:18:40] - User ID: 763D5F8B1A3ADB10
[03:18:40] - Machine ID: 1
[03:18:40] Configuring Folding@Home...

[03:18:46] - Ask before connecting: No
[03:18:46] - User name: Duffelcoat-minion-1480 (Team 53338)
[03:18:46] - User ID: 763D5F8B1A3ADB10
[03:18:46] - Machine ID: 1
[03:18:46] Loaded queue successfully.
[03:18:46] - Autosending finished units... [October 23 03:18:46 UTC]
[03:18:46] + Processing work unit
part of logfile on unit startup

Code: Select all

[01:40:59] + Received work.
[01:40:59] Trying to send all finished work units
[01:40:59] + No unsent completed units remaining.
[01:40:59] + Closed connections
[01:40:59] + Processing work unit
[01:40:59] Work type a1 not eligible for variable processors
[01:40:59] Core required: FahCore_a1.exe
[01:40:59] Core found.
[01:40:59] Using generic mpiexec calls
[01:40:59] Working on queue slot 06 [October 29 01:40:59 UTC]
[01:40:59] + Working ...
[01:40:59] - Calling 'mpiexec -np 4 -channel auto -host FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 10 -forceasm -verbose -lifeline 4504 -version 622'

[01:40:59] *------------------------------*
[01:40:59] Folding@Home Gromacs SMP Core
[01:40:59] Version 1.74 (March 10, 2007)
[01:40:59] Preparing to commence simulation
[01:40:59] - Ensuring status. Please wait.
[01:41:16] - Assembly optimizations manually forced on.
[01:41:16] - Not checking prior termination.
[01:41:36] - Expanded 4823966 -> 24810145 (decompressed 514.3 percent)
[01:41:36] - Starting from initial work packet
[01:41:36] Project: 2665 (Run 2, Clone 430, Gen 46)
[01:41:40] Assembly optimizations on if available.
[01:41:40] Entering M.D.
[01:41:46] Rejecting checkpoint
[01:41:48] cosylations
[01:41:48] Writing local files
[01:41:49] Writing local files
[01:42:01] Extra SSE boost OK.
[01:42:01] Writing local files
[01:42:02] Completed 0 out of 250000 steps  (0 percent)
[01:52:02] Timered checkpoint triggered.
[02:02:03] Timered checkpoint triggered.
[02:02:55] Writing local files
[02:02:56] Completed 2500 out of 250000 steps  (1 percent)
part of logfile on unit crashing

Code: Select all

[11:53:42] Completed 242500 out of 250000 steps  (97 percent)
[12:03:43] Timered checkpoint triggered.
[12:13:43] Timered checkpoint triggered.
[12:14:47] Writing local files
[12:14:47] Completed 245000 out of 250000 steps  (98 percent)
[12:24:48] Timered checkpoint triggered.
[12:30:23] Warning:  long 1-4 interactions
[12:30:23] Gromacs cannot continue further.
[12:30:23] Going to send back what have done.
[12:30:23] logfile size: 193520
[12:30:23] - Writing 194056 bytes of core data to disk...
[12:30:23]   ... Done.
[12:30:23] - Failed to delete work/
[12:30:23] - Failed to delete work/wudata_06.goe
[12:30:23] Warning:  check for stray files
[12:32:23] Folding@home Core Shutdown: EARLY_UNIT_END
[12:32:23] Folding@home Core Shutdown: EARLY_UNIT_END
[12:32:28] CoreStatus = 7B (123)
[12:32:28] Client-core communications error: ERROR 0x7b
[12:32:28] This is a sign of more serious problems, shutting down.
[15:18:56] - Autosending finished units... [October 30 15:18:56 UTC]
[15:18:56] Trying to send all finished work units
[15:18:56] + No unsent completed units remaining.
[15:18:56] - Autosend completed
Following the sticky post on returning partial results I downloaded this QFIX

Code: Select all

- Windows/x86 : qfix.exe (10.00 KB) 
  Compiled with : i586-mingw32msvc-gcc -Wall -DSYSTYPE=1 -s -O2 -o qfix.exe qfix.c 
  Compiled on : Debian GNU/Linux 4.0 "Etch" with gcc version 3.4.5 (mingw special) 
  Modified : Sat Nov 17 14:09:56 2007 
when I ran it I got

Code: Select all

C:\Program Files\fah_smp_1>qfix
entry 7, status 0, address
entry 8, status 0, address
entry 9, status 0, address
entry 0, status 0, address
entry 1, status 0, address
entry 2, status 0, address
entry 3, status 0, address
entry 4, status 0, address
entry 5, status 0, address
entry 6, status 1, address
  Found results <work\wuresults_06.dat>: proj 2665, run 2, clone 430, gen 46
   -- queue entry: proj 2665, run 2, clone 430, gen 46
   -- queue entry isn't empty
File is OK
so I tried to send all

Code: Select all

 Directory of C:\Program Files\fah_smp_1

28/07/2008  16:58           422.400 Folding@home-Win32-x86.exe
               1 File(s)        422.400 bytes
               0 Dir(s)  12.395.061.248 bytes free

C:\Program Files\fah_smp_1>Folding@home-Win32-x86.exe -send all

Note: Please read the license agreement (Folding@home-Win32-x86.exe -license). F
use of this software requires that you have read and accepted this agreement.

--- Opening Log file [October 30 17:09:27 UTC]

# Windows CPU Console Edition #################################################

                       Folding@Home Client Version 6.22 SMP Beta2



Launch directory: C:\Program Files\fah_smp_1
Executable: Folding@home-Win32-x86.exe
Arguments: -send all

[17:09:27] - Ask before connecting: No
[17:09:27] - User name: Duffelcoat-minion-1480 (Team 53338)
[17:09:27] - User ID: 763D5F8B1A3ADB10
[17:09:27] - Machine ID: 1
[17:09:27] Loaded queue successfully.
[17:09:27] Attempting to return result(s) to server...

Folding@Home Client Shutdown.
as the "Folding@Home Client Shutdown." message appeared instantly I then tried to send just unit 6

Code: Select all

C:\Program Files\fah_smp_1>Folding@home-Win32-x86.exe -send #6

Note: Please read the license agreement (Folding@home-Win32-x86.exe -license). F
use of this software requires that you have read and accepted this agreement.

--- Opening Log file [October 30 17:09:50 UTC]

# Windows CPU Console Edition #################################################

                       Folding@Home Client Version 6.22 SMP Beta2



Launch directory: C:\Program Files\fah_smp_1
Executable: Folding@home-Win32-x86.exe
Arguments: -send #6

[17:09:50] - Ask before connecting: No
[17:09:50] - User name: Duffelcoat-minion-1480 (Team 53338)
[17:09:50] - User ID: 763D5F8B1A3ADB10
[17:09:50] - Machine ID: 1
[17:09:50] Loaded queue successfully.
[17:09:50] Attempting to return result(s) to server...
[17:09:50] Project: 2665 (Run 2, Clone 562, Gen 60)
[17:09:50] - Failed to send unit 00 to server

Folding@Home Client Shutdown.

So now I'm stuck. As, after more than a days work, the WU is at 98% and I'd hate to lose this,
has anyone got any idea how I can return the almost finished unit ?


edit : any tips on how to complete it are welcome as well

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Posted: Thu Oct 30, 2008 8:53 pm
by toTOW
Update your client to 6.23

There are 3 other reports for partial credit, and someone was able to complete it.

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Posted: Thu Oct 30, 2008 9:16 pm
by matte.2
Thx toTOW
toTOW wrote:Update your client to 6.23
learned about the existence on posting this problem. Will try (but not tonight as it getting late)
toTOW wrote:There are 3 other reports for partial credit, and someone was able to complete it.
Why was the unit re-distributed if someone has completed it ?


edit : ...unless of course it was completed after the deadline :?:

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Posted: Thu Oct 30, 2008 9:40 pm
by toTOW
I don't know ... he didn't miss the preferred deadline.

It might be a check, to get two successes to confirm that the results are valid.

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Posted: Fri Oct 31, 2008 5:47 pm
by matte.2
Thx for the suggestion toTOW,
installed 6.23 and restarting gave me this

Code: Select all

--- Opening Log file [October 31 16:32:36 UTC] 

# Windows SMP Console Edition #################################################

                       Folding@Home Client Version 6.23 Beta R1



Launch directory: C:\Program Files\fah_smp_1
Executable: C:\Program Files\fah_smp_1\Folding@home-Win32-x86.exe
Arguments: -smp -config -forceasm -verbosity 9 

 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[16:32:36] - Ask before connecting: No
[16:32:36] - User name: Duffelcoat-minion-1480 (Team 53338)
[16:32:36] - User ID: 763D5F8B1A3ADB10
[16:32:36] - Machine ID: 1
[16:32:36] Configuring Folding@Home...

[16:32:44] - Ask before connecting: No
[16:32:44] - User name: Duffelcoat-minion-1480 (Team 53338)
[16:32:44] - User ID: 763D5F8B1A3ADB10
[16:32:44] - Machine ID: 1
[16:32:44] Loaded queue successfully.
[16:32:44] - Autosending finished units... [October 31 16:32:44 UTC]
[16:32:44] + Processing work unit
[16:32:44] Trying to send all finished work units
[16:32:44] Work type a1 not eligible for variable processors
[16:32:44] + No unsent completed units remaining.
[16:32:44] Core required: FahCore_a1.exe
[16:32:44] - Autosend completed
[16:32:44] Core found.
[16:32:44] Using generic mpiexec calls
[16:32:44] Working on queue slot 06 [October 31 16:32:44 UTC]
[16:32:44] + Working ...
[16:32:44] - Calling 'mpiexec -np 4 -channel auto -host FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 10 -forceasm -verbose -lifeline 5736 -version 623'

[16:32:44] *------------------------------*
[16:32:44] Folding@Home Gromacs SMP Core
[16:32:44] Version 1.74 (March 10, 2007)
[16:32:44] Preparing to commence simulation
[16:32:44] - Ensuring status. Please wait.
[16:33:01] - Assembly optimizations manually forced on.
[16:33:01] - Not checking prior termination.
[16:33:01] Folding@home Core Shutdown: MISSING_WORK_FILES
[16:33:01] Finalizing output
[16:35:04] CoreStatus = 1 (1)
[16:35:04] Sending work to server
[16:35:04] Project: 2665 (Run 2, Clone 430, Gen 46)

[16:35:04] + Attempting to send results [October 31 16:35:04 UTC]
[16:35:04] - Reading file work/wuresults_06.dat from core
[16:35:04]   (Read 194056 bytes from disk)
[16:35:04] Connecting to
[16:35:10] Posted data.
[16:35:10] Initial: 0000; - Uploaded at ~31 kB/s
[16:35:10] - Averaged speed for that direction ~39 kB/s
[16:35:10] + Results successfully sent
[16:35:10] Thank you for your contribution to Folding@Home.
[16:35:30] - Warning: Could not delete all work unit files (6): Core returned invalid code
[16:35:30] Trying to send all finished work units
[16:35:30] + No unsent completed units remaining.
[16:35:30] - Preparing to get new work unit...
[16:35:30] + Attempting to get work packet
[16:35:30] - Will indicate memory of 3069 MB
[16:35:30] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[16:35:30] - Connecting to assignment server
[16:35:30] Connecting to
[16:35:31] Posted data.
[16:35:31] Initial: 40AB; - Successful: assigned to (
[16:35:31] + News From Folding@Home: Welcome to Folding@Home
[16:35:31] Loaded queue successfully.
[16:35:31] Connecting to
[16:35:34] Posted data.
[16:35:34] Initial: 0000; - Receiving payload (expected size: 2439680)
[16:35:53] - Downloaded at ~125 kB/s
[16:35:53] - Averaged speed for that direction ~163 kB/s
[16:35:53] + Received work.
[16:35:53] Trying to send all finished work units
[16:35:53] + No unsent completed units remaining.
[16:35:53] + Closed connections
[16:35:58] + Processing work unit
[16:35:58] Work type a1 not eligible for variable processors
[16:35:58] Core required: FahCore_a1.exe
[16:35:58] Core found.
[16:35:58] Using generic mpiexec calls
[16:35:58] Working on queue slot 07 [October 31 16:35:58 UTC]
[16:35:58] + Working ...
[16:35:58] - Calling 'mpiexec -np 4 -channel auto -host FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 10 -forceasm -verbose -lifeline 5736 -version 623'

[16:35:58] *------------------------------*
[16:35:58] Folding@Home Gromacs SMP Core
[16:35:58] Version 1.74 (March 10, 2007)
[16:35:58] Preparing to commence simulation
[16:35:58] - Ensuring status. Please wait.
[16:36:15] - Assembly optimizations manually forced on.
[16:36:15] - Not checking prior termination.
[16:36:21] - Expanded 2439168 -> 12879713 (decompressed 528.0 percent)
[16:36:21] - Starting from initial work packet
[16:36:21] Project: 2653 (Run 16, Clone 13, Gen 88)
[16:36:22] Assembly optimizations on if available.
[16:36:22] Entering M.D.
I don't understand the "missing workfiles" bit but apparently something was sent and I'm curious to see the credit
thanks again
User name: Duffelcoat-minion-1480 (Team 53338)
User ID: 763D5F8B1A3ADB10