Page 2 of 2
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Tue Aug 19, 2008 12:22 pm
by noorman
.
I 'd suggest that such WU's are kept within Stanford's walls to be run and checked/finished over there in stead of sending them out for months on end ...
That way, troublesome WU's would leave the loop earlier and their faults would be discovered sooner too !
.
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Fri Aug 22, 2008 12:29 am
by Shizuka
I got this work unit.
Code: Select all
[23:01:53] Initial: 0000; - Receiving payload (expected size: 4659162)
[23:06:35] - Downloaded at ~16 kB/s
[23:06:35] - Averaged speed for that direction ~61 kB/s
[23:06:35] + Received work.
[23:06:35] Trying to send all finished work units
[23:06:35] + No unsent completed units remaining.
[23:06:35] + Closed connections
[23:06:35]
[23:06:35] + Processing work unit
[23:06:35] Work type a1 not eligible for variable processors
[23:06:35] Core required: FahCore_a1.exe
[23:06:35] Core found.
[23:06:35] Working on queue slot 04 [August 21 23:06:35 UTC]
[23:06:35] + Working ...
[23:06:35] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 04 -checkpoint 15 -verbose -lifeline 3764 -version 622'
[23:06:36]
[23:06:36] *------------------------------*
[23:06:36] Folding@Home Gromacs SMP Core
[23:06:36] Version 1.76 (February 23, 2008)
[23:06:36]
[23:06:36] Preparing to commence simulation
[23:06:36] - Ensuring status. Please wait- Created dyn
[23:06:36] - Files status OK
[23:06:36] 4.sas
[23:06:36] - Failed to- Created dyn
[23:06:36] - Files status OK
[23:06:36] ng: check for stray files
[23:06:36] - Created dyn
[23:06:36] - Files status OK
[23:06:48] 65 (Run 1, Clone 649, Gen 6)
[23:06:48]
[23:06:48] packet
[23:06:48]
[23:06:48] Project: 2665 (Run 1, - Starting from initial work packet
[23:06:48]
[23:06:48] Project: 2665 (Run 1, Clone 649, Gen 6)
[23:06:48]
[23:06:51] Assembly optimizations on if available.
[23:06:51] Entering M.D.
[23:07:07] files
[23:07:07] n: IBX in water
[23:07:07] Writing local files
[23:07:09] Extra SSE boost OK.
[23:07:17] Gromacs cannot continue further.
[23:07:17] Going to send back what have done.
[23:07:17] logfil- Failed to d- Writing 9958 bytes - FaNo C.P. to delete.
[23:07:17] - Failed to delete work/wudata_04.bed
[23:07:17] - Failed to delNo C.P. to delete.
[23:07:17] - Failed to delet- Failed to delete work/wudata_04.bed
[23:07:17] - Failed to delete
[23:07:17] Folding@home Core Shutdown: EARLY_UNIT_END
[23:07:17] Finalizing output
[23:07:17] ng: check for stray files
[23:09:17]
[23:09:17] Folding@home Core Shutdown: EARLY_UNIT_END
[23:09:17] Finalizing output
Fixed problem. Changed machine ID, and viola! New WU!
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Fri Aug 22, 2008 2:19 pm
by VijayPande
Our server code will give a WU a certain # of tries and then stop it. We'll look to see if this code is not working in this case or if there's something else going on here.
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Fri Aug 22, 2008 8:51 pm
by Shizuka
Problem is, the client sits there hung after the WU gets an EUE. It sat there for over two hours before I found it, then I couldn't get rid of it after deleting it several times.
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Sun Aug 24, 2008 2:39 am
by sick willie
VijayPande wrote:Our server code will give a WU a certain # of tries and then stop it. We'll look to see if this code is not working in this case or if there's something else going on here.
+1 more.
Most of my Linux boxes and now starting on my Windows machines. This WU stalls the client w/ a seg fault in Linux and results in no further activity (w/o a F@H restart) in Windows.
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Sun Aug 24, 2008 3:37 am
by ppetrone
Thanks for being patient, SW. Let me check again.
Paula
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Sun Aug 24, 2008 4:24 am
by VijayPande
I was hoping the code would take care of this automatically (since we can't kill WU's by hand too frequently), but that's not working. We're working on a server code update to handle that. For now, I've manually killed this WU.
Re: Project: 2665 (Run 1, Clone 649, Gen 6)
Posted: Sun Aug 24, 2008 5:23 am
by sick willie
Thank you.