Page 1 of 1

p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 3:32 pm
by Tobit
Here's an odd one I haven't seen before here on my system. These generally run quite well here. In fact, this is the first abnormal result I've seen since ProtoMol started.

Code: Select all

[08:07:53] *********************** Log Started 25/Dec/2009 08:07:53 ***********************
[08:07:53] ************************** ProtoMol Folding@Home Core **************************
[08:07:53]   Version: 21
[08:07:53]      Type: 180
[08:07:53]      Core: ProtoMol
[08:07:53]   Website: http://folding.stanford.edu/
[08:07:53] Copyright: (c) 2009 Stanford University
[08:07:53]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[08:07:53]      Args: -dir work/ -suffix 00 -checkpoint 15 -lifeline 5116 -version 623
[08:07:53] ************************************ Build *************************************
[08:07:53]      Date: Dec 24 2009
[08:07:53]      Time: 14:36:31
[08:07:53]  Revision: 1748
[08:07:53]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[08:07:53]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[08:07:53]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[08:07:53]  Platform: Windows XP
[08:07:53]      Bits: 32
[08:07:53] ************************************ System ************************************
[08:07:53]        OS: Microsoft(R) Windows(R) XP Professional x64 Edition
[08:07:53]       CPU: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
[08:07:53]    CPU ID: GenuineIntel Family 6 Model 15 Stepping 6
[08:07:53]      CPUs: 2 Logical, 1 Physical
[08:07:53]    Memory: 4.00 GB
[08:07:53] ********************************************************************************
[08:07:53] Project: 10001 (Run 229, Clone 4, Gen 4)
[08:07:53] Reading tar file par_all27_prot_lipid.inp
[08:07:53] Reading tar file scpismQuartic.inp
[08:07:53] Reading tar file ww_exteq_nowater1.pdb
[08:07:53] Reading tar file ww_exteq_nowater1.psf
[08:07:53] Reading tar file checkpt
[08:07:53] Reading tar file ww_exteq_nowater1.208.pos
[08:07:53] Reading tar file ww_exteq_nowater1.208.vel
[08:07:53] Reading tar file protomol.conf
[08:07:53] Reading tar file core.xml
[08:07:53] Digital signatures verified
[08:07:53] Completed 0 out of 200000 steps (0%)
[08:10:07] WARNING: UnexpectedExitHandler triggered
[08:10:07] WARNING: Unexpected exit from science code
[08:10:07] Saving result file logfile_00.txt
[08:10:07] Saving result file checkpt
[08:10:07] Saving result file log.txt
[08:10:07] Saving result file protomol.conf
[08:10:07] Saving result file ww.dcd
[08:10:07] Saving result file ww_exteq_nowater1.208.pos
[08:10:07] Saving result file ww_exteq_nowater1.208.vel
[08:10:07] Folding@home Core Shutdown: UNKNOWN
[08:10:11] CoreStatus = 7B (123)
[08:10:11] Sending work to server
[08:10:11] Project: 10001 (Run 229, Clone 4, Gen 4)
[08:10:11] + Attempting to send results [December 25 08:10:11 UTC]
[08:10:13] + Results successfully sent

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 3:42 pm
by Grandpa_01
It looks like it did what it was suposed to do. Did you get the same WU again.

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 3:50 pm
by Tobit
Grandpa_01 wrote:It looks like it did what it was suposed to do. Did you get the same WU again.
No, it moved onto a different WU. Did you see these lines in the log?

Code: Select all

[08:07:53] Completed 0 out of 200000 steps (0%)
[08:10:07] WARNING: UnexpectedExitHandler triggered
[08:10:07] WARNING: Unexpected exit from science code
Unexpected tells me it didn't do something it was supposed to do. :o

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 6:25 pm
by Grandpa_01
Tobit wrote: No, it moved onto a different WU. Did you see these lines in the log?

Code: Select all

[08:07:53] Completed 0 out of 200000 steps (0%)
[08:10:07] WARNING: UnexpectedExitHandler triggered
[08:10:07] WARNING: Unexpected exit from science code
Unexpected tells me it didn't do something it was supposed to do. :o
From what I understand that is expected and when it happens with the new Version V21 it is supposed to do what it did and send the WU back to the server so you will get a different WU rather than keep getting that one over and over again. I looks like they got that bug fixed in V21.

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 6:29 pm
by Tobit
Grandpa_01 wrote:From what I understand that is expected
Ending early is expected with ProtoMol based units but not ending early with errors like this one did. I have plenty that end early but I've never seen one end early with the errors this one did. Finishing with a CoreStatus of 7B is not normal.

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 8:32 pm
by Grandpa_01
I did not say it was what I did say was V21 is doing what it is suposed to do when a WU fails. Which the other verson were not always doing.

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 9:25 pm
by bruce
I agree with Grandpa_01 . . . to a point.

To understand it fully, we need to identify several different components that make up the FAH system. Most of the time people break things up into two pieces -- the servers and the software on your PC, or three pieces -- the servers, the client, and a FahCore. To understand what's going on here we need to look one level deeper and split the FahCore into two separate logical pieces that are integrally combined before you ever see it.

Any FahCore is made up of code written mostly by Stanford and code written mostly by someone else. The Stanford developers can find and fix bugs in the code they wrote rather quickly but if there is a bug in it, but if in the code that somebody else wrote has an error, it will probably take longer to get it fixed. In this case, the message "Unexpected exit from science code" says that there was some kind of error in that other code. The Stanford code responds by reporting a CoreStatus = 7B (123) to the client. The client responds by sending an error report to the server, as it should, and the server gives you a new assignment.

Some of the other FAHcores respond differently to an error in the science code and this is the first example I've seen of doing it right. Other FAHcores make a different report to the client and the result (an undesirable one) is that you may have the same WU reassigned, producing the same error repeatedly.

Version 19 and 20 of ProtoMol were important developmental steps toward this solution, and I commend jcoffland for promptly moving to what appears to be an excellent solution for those unexpected problems that come up in the non-Stanford code.

Re: p10001 (Run 229, Clone 4, Gen 4) - UNKNOWN

Posted: Fri Dec 25, 2009 10:30 pm
by Tobit
Thanks Bruce, that helps me to better understand what Grandpa_01 was trying to say. I would agree that error handeling is greatly improved with v21. However, we should still report these unexpected errors as a possible bad WU, correct? This clearly was more than a simple ending early because no more computation was possible.