Page 1 of 1

Project 2652 (Run 0, Clone 430, Gen 44)

Posted: Tue Jan 01, 2008 4:09 pm
by al2
Edit by -b: Split this topic off from one associated with a different project. (See 2nd post below.)



I've just had the same issue and i thought posting my FAHlog (pasted from Fahmon - great tool imo) could potentially be of use to others/project overall ?(despite how small a possible contribution,its the small that adds up to the whole i guess).

I'm going to try restarting the WU from the latest back up copy i made.

EDIT**Just to add my system is all stock and likely very stable wrt win smp client since i've never have any issues like this ( i can remember) since i started folding last summer (i occasionally get hanging clients assoc. the net connection (i think) but this isn't a problem with regular monitering)**

Happy New Year to all naturally
[14:08:08] Completed 550000 out of 1000000 steps (55 percent)
[14:23:49] Writing local files
[14:23:49] Completed 560000 out of 1000000 steps (56 percent)
[14:39:29] Writing local files
[14:39:29] Completed 570000 out of 1000000 steps (57 percent)
[14:51:10] Warning: long 1-4 interactions
[14:51:10] Gromacs cannot continue further.
[14:51:10] Going to send back what have done.
[14:51:10] logfile size: 353037
[14:51:10] - Writing 353573 bytes of core data to disk...
[14:51:11] ... Done.
[14:51:11] - Failed to delete work/wudata_06.arc
[14:51:11] No C.P. to delete.
[14:51:11] - Failed to delete work/wudata_06.dyn
[14:51:11] - Failed to delete work/wudata_06.chk
[14:51:11] - Failed to delete work/wudata_06.sas
[14:51:11] - Failed to delete work/wudata_06.goe
[14:51:11] - Failed to delete work/wudata_06.xvg
[14:51:11] Warning: check for stray files
[14:51:11]
[14:51:11] Folding@home Core Shutdown: EARLY_UNIT_END
[14:51:11]
[14:51:11] Folding@home Core Shutdown: EARLY_UNIT_END
[14:51:17] CoreStatus = 7B (123)
[14:51:17] Client-core communications error: ERROR 0x7b
[14:51:17] Deleting current work unit & continuing...
[14:53:21] - Preparing to get new work unit...
[14:53:21] + Attempting to get work packet
[14:53:21] - Connecting to assignment server
[14:53:22] - Successful: assigned to (171.64.65.64).
[14:53:22] + News From Folding@Home: Welcome to Folding@Home
[14:53:22] Loaded queue successfully.
[14:53:27] + Closed connections
[14:53:32]
[14:53:32] + Processing work unit
[14:53:32] Core required: FahCore_a1.exe
[14:53:32] Core found.
[14:53:32] Working on Unit 07 [January 1 14:53:32]
[14:53:32] + Working ...
[14:53:33]
[14:53:33] *------------------------------*
[14:53:33] Folding@Home Gromacs SMP Core
[14:53:33] Version 1.74 (March 10, 2007)
[14:53:33]
[14:53:33] Preparing to commence simulation
[14:53:33] - Ensuring status. Please wait.
[14:53:33] Created dyn
[14:53:33] - Files status OK
[14:53:33] this execution.
[14:53:33] - Files status OK
[14:53:34] mpressed 507.5 percent)
[14:53:34] - Starting from initial work packet
[14:53:34]
[14:53:34] Project: 2652 (Run 0, Clone 430, Gen 44)
[14:53:34]
[14:53:34] : 2652 (Run 0, Clone 430, Gen 44)
[14:53:34]
[14:53:34] ble.
[14:53:34] Entering M.D.
[14:53:51] al work pa- Starting from initial work packet
[14:53:51]
[14:53:51] Project: 2652 (Run 0, Clone 430, Gen 44)
[14:53:51]
[14:53:51] Entering M.D.
[14:53:58] rotein
[14:53:58] Writing local files
[14:53:58] cal files
[14:53:58] boost OK.
[14:53:58] Writing local files
[14:53:58] Completed 0 out of 1000000 steps (0 percent)
[15:09:39] Writing local files
[15:09:39] Completed 10000 out of 1000000 steps (1 percent)
[15:28:27] Writing local files
[15:28:27] Completed 20000 out of 1000000 steps (2 percent)
[15:49:30] Writing local files

Re: Project: 3062 (Run 3, Clone 26, Gen 1)

Posted: Tue Jan 01, 2008 6:54 pm
by bruce
al2 wrote:I've just had the same issue and i thought posting my FAHlog (pasted from Fahmon - great tool imo) could potentially be of use to others/project overall ?(despite how small a possible contribution,its the small that adds up to the whole i guess).

I'm going to try restarting the WU from the latest back up copy i made.

[14:53:51] Project: 2652 (Run 0, Clone 430, Gen 44)
It's not clear from your post what Project/Run/Clone/Gen was involved with the long 1-4 interactions error you had at 57 percent. Was it Project: 3062 (Run 3, Clone 26, Gen 1) or perhaps Project: 2652 (Run 0, Clone 430, Gen 44) ?

As a general rule, we don't encourage restarting from a backup because you were already assigned another WU. Unless you're very careful, you'll be overwiting either Project: 2652 (Run 0, Clone 430, Gen 44) or another WU that you will be assigned after that one. Of course if you were reassigned the exact same WU as the one that had the error, then everything is fine.

Posted: Tue Jan 01, 2008 10:01 pm
by al2
bruce wrote:
It's not clear from your post what Project/Run/Clone/Gen was involved with the long 1-4 interactions error you had at 57 percent. Was it Project: 3062 (Run 3, Clone 26, Gen 1) or perhaps Project: 2652 (Run 0, Clone 430, Gen 44) ?

As a general rule, we don't encourage restarting from a backup because you were already assigned another WU. Unless you're very careful, you'll be overwiting either Project: 2652 (Run 0, Clone 430, Gen 44) or another WU that you will be assigned after that one. Of course if you were reassigned the exact same WU as the one that had the error, then everything is fine.
I've just come back to check something on this thread and i realized in my rush to report what i wrongly assumed to be a very new, unique and exclusive error occurence, i've posted the info in the wrong place completely missing the fact the Project # ( and so type of SMP client) is difference despite the issues similarities. Sorry.As you guessed it was actually Project: 2652 (Run 0, Clone 430, Gen 44)

I understand from what you say about NOT starting from a back-up after being re-assigned,luckilly its not something i've ever done before(though i thought it might cause some sort of mix up)and now know its something harmful to avoid ( as i don't know whether its the exact WU as before i'll re-start from 'scratch' )

Re: Project 2652 (Run 0, Clone 430, Gen 44)

Posted: Wed Jan 02, 2008 12:35 am
by bruce
OK. Topic split (see note in first post).

Please check your memory very carefully for errors. We've received some new information about error 0x7b and memory faults which has just been added to the WIKI: http://fahwiki.net/index.php/CoreStatus_codes#7B That may not be your problem, but it's something to look into.