Page 1 of 1

Project: 10000 (Run 88, Clone 0, Gen 35)

Posted: Thu Jan 07, 2010 1:39 am
by ron5000
One of my Folders has been stuck on this WU since last Saturday. Bad WU? Bad Config? How do I get past it?

Code: Select all

--- Opening Log file [January 7 01:00:23 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\All Users\Documents\F@H
Service: C:\Documents and Settings\All Users\Documents\F@H\FAH.exe
Arguments: -svcstart -d C:\Documents and Settings\All Users\Documents\F@H -verbosity 9 

Launched as a service.
Entered C:\Documents and Settings\All Users\Documents\F@H to do work.

[01:00:23] - Ask before connecting: No
[01:00:23] - User name: RonSeaman (Team 13051)
[01:00:23] - User ID: 5C2DA5D2BB1413A
[01:00:23] - Machine ID: 1
[01:00:23] 
[01:00:24] Loaded queue successfully.
[01:00:24] 
[01:00:24] + Processing work unit
[01:00:24] Core required: FahCore_b4.exe
[01:00:24] Core found.
[01:00:24] - Autosending finished units... [January 7 01:00:24 UTC]
[01:00:24] Trying to send all finished work units
[01:00:24] + No unsent completed units remaining.
[01:00:24] - Autosend completed
[01:00:24] Working on queue slot 01 [January 7 01:00:24 UTC]
[01:00:24] + Working ...
[01:00:24] - Calling '.\FahCore_b4.exe -dir work/ -suffix 01 -priority 96 -checkpoint 15 -service -verbose -lifeline 1408 -version 623'

[01:00:34] CoreStatus = 63 (99)
[01:00:34] + Error starting Folding@Home core.
[01:00:39] 
[01:00:39] + Processing work unit
[01:00:39] Core required: FahCore_b4.exe
[01:00:39] Core found.
[01:00:39] Working on queue slot 01 [January 7 01:00:39 UTC]
[01:00:39] + Working ...
[01:00:39] - Calling '.\FahCore_b4.exe -dir work/ -suffix 01 -priority 96 -checkpoint 15 -service -verbose -lifeline 1408 -version 623'

[01:00:41] *********************** Log Started 07/Jan/2010 01:00:41 ***********************
[01:00:41] ************************** ProtoMol Folding@Home Core **************************
[01:00:41]   Version: 21
[01:00:41]      Type: 180
[01:00:41]      Core: ProtoMol
[01:00:41]   Website: http://folding.stanford.edu/
[01:00:41] Copyright: (c) 2009 Stanford University
[01:00:41]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[01:00:41]      Args: -dir work/ -suffix 01 -priority 96 -checkpoint 15 -service -verbose
[01:00:41]            -lifeline 1408 -version 623
[01:00:41] ************************************ Build *************************************
[01:00:41]      Date: Dec 24 2009
[01:00:41]      Time: 14:36:31
[01:00:41]  Revision: 1748
[01:00:41]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[01:00:41]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[01:00:41]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[01:00:41]  Platform: Windows XP
[01:00:41]      Bits: 32
[01:00:41] ************************************ System ************************************
[01:00:41]        OS: Microsoft Windows XP Professional
[01:00:41]       CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz
[01:00:41]    CPU ID: GenuineIntel Family 15 Model 2 Stepping 9
[01:00:41]      CPUs: 1 Logical, 1 Physical
[01:00:41]    Memory: 510 MB
[01:00:41] ********************************************************************************
[01:00:41] Project: 10000 (Run 88, Clone 0, Gen 35)
[01:00:41] Reading tar file par_all27_prot_lipid.inp
[01:00:41] Reading tar file scpismQuartic.inp
[01:00:41] Reading tar file ww_exteq_nowater1.pdb
[01:00:41] Reading tar file ww_exteq_nowater1.psf
[01:00:41] Reading tar file checkpt
[01:00:41] Reading tar file ww_exteq_nowater1.1510.pos
[01:00:41] ERROR: @ fah\tar\TarHeader.cpp:184:<unknown> 0: Error converting number '046482'
[01:00:41] Folding@home Core Shutdown: EARLY_UNIT_END
[01:00:45] CoreStatus = 79 (121)
[01:00:45] Client-core communications error: ERROR 0x79
[01:00:45] This is a sign of more serious problems, shutting down.

Re: Project: 10000 (Run 88, Clone 0, Gen 35)

Posted: Thu Jan 07, 2010 3:31 am
by ron5000
I forgot to mention I tried deleting the work folder, queue.dat, unitinfo.txt, the log & all the FahCore files. I got the same WU & same results.

Right now as a temporary workaround, I changed the machine ID to 2 & deleted the work folder, queue.dat, unitinfo.txt & the log so the server would give me a different WU.

Re: Project: 10000 (Run 88, Clone 0, Gen 35)

Posted: Thu Jan 07, 2010 8:50 am
by toTOW
Yes bad WU which produce a bad return code for EUE ...

Re: Project: 10000 (Run 88, Clone 0, Gen 35)

Posted: Fri Jan 08, 2010 3:16 am
by ron5000
Thanks! Do I need to do anything, or just leave my machine ID on 2 & press on?

Re: Project: 10000 (Run 88, Clone 0, Gen 35)

Posted: Fri Jan 08, 2010 6:13 am
by bruce
ron5000 wrote:Thanks! Do I need to do anything, or just leave my machine ID on 2 & press on?
Just press on.

The Pande Group knows that some WUs will be unstable but doesn't know which ones until somebody runs them. They should be a rather small percentage of any project.

IMHO, the client and servers don't handle the reassignment process the way I think they should, Even though it's on my personal wish-list for a future version of FAH, I expect that we'll have to wait a while for that future version and there's no guarantee that I'll get my wish.

Re: Project: 10000 (Run 88, Clone 0, Gen 35)

Posted: Sat Jan 09, 2010 1:44 am
by jcoffland
From your report I was able to track down the bug and fix it in the next release of the ProtoMol core.

Thanks!