Project: 2665 (Run 1, Clone 649, Gen 6)

Moderators: Site Moderators, FAHC Science Team

noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by noorman »

.

I 'd suggest that such WU's are kept within Stanford's walls to be run and checked/finished over there in stead of sending them out for months on end ...

That way, troublesome WU's would leave the loop earlier and their faults would be discovered sooner too !


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
Shizuka
Posts: 5
Joined: Mon Jul 28, 2008 8:42 pm

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by Shizuka »

I got this work unit.

Code: Select all

[23:01:53] Initial: 0000; - Receiving payload (expected size: 4659162)
[23:06:35] - Downloaded at ~16 kB/s
[23:06:35] - Averaged speed for that direction ~61 kB/s
[23:06:35] + Received work.
[23:06:35] Trying to send all finished work units
[23:06:35] + No unsent completed units remaining.
[23:06:35] + Closed connections
[23:06:35] 
[23:06:35] + Processing work unit
[23:06:35] Work type a1 not eligible for variable processors
[23:06:35] Core required: FahCore_a1.exe
[23:06:35] Core found.
[23:06:35] Working on queue slot 04 [August 21 23:06:35 UTC]
[23:06:35] + Working ...
[23:06:35] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 04 -checkpoint 15 -verbose -lifeline 3764 -version 622'

[23:06:36] 
[23:06:36] *------------------------------*
[23:06:36] Folding@Home Gromacs SMP Core
[23:06:36] Version 1.76 (February 23, 2008)
[23:06:36] 
[23:06:36] Preparing to commence simulation
[23:06:36] - Ensuring status. Please wait- Created dyn
[23:06:36] - Files status OK
[23:06:36] 4.sas
[23:06:36] - Failed to- Created dyn
[23:06:36] - Files status OK
[23:06:36] ng:  check for stray files
[23:06:36] - Created dyn
[23:06:36] - Files status OK
[23:06:48] 65 (Run 1, Clone 649, Gen 6)
[23:06:48] 
[23:06:48] packet
[23:06:48] 
[23:06:48] Project: 2665 (Run 1, - Starting from initial work packet
[23:06:48] 
[23:06:48] Project: 2665 (Run 1, Clone 649, Gen 6)
[23:06:48] 
[23:06:51] Assembly optimizations on if available.
[23:06:51] Entering M.D.
[23:07:07] files
[23:07:07] n: IBX in water
[23:07:07] Writing local files
[23:07:09] Extra SSE boost OK.
[23:07:17] Gromacs cannot continue further.
[23:07:17] Going to send back what have done.
[23:07:17] logfil- Failed to d- Writing 9958 bytes - FaNo C.P. to delete.
[23:07:17] - Failed to delete work/wudata_04.bed
[23:07:17] - Failed to delNo C.P. to delete.
[23:07:17] - Failed to delet- Failed to delete work/wudata_04.bed
[23:07:17] - Failed to delete 
[23:07:17] Folding@home Core Shutdown: EARLY_UNIT_END
[23:07:17] Finalizing output
[23:07:17] ng:  check for stray files
[23:09:17] 
[23:09:17] Folding@home Core Shutdown: EARLY_UNIT_END
[23:09:17] Finalizing output
Fixed problem. Changed machine ID, and viola! New WU!
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by VijayPande »

Our server code will give a WU a certain # of tries and then stop it. We'll look to see if this code is not working in this case or if there's something else going on here.
Shizuka
Posts: 5
Joined: Mon Jul 28, 2008 8:42 pm

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by Shizuka »

Problem is, the client sits there hung after the WU gets an EUE. It sat there for over two hours before I found it, then I couldn't get rid of it after deleting it several times.
sick willie
Posts: 33
Joined: Sun May 25, 2008 7:40 pm

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by sick willie »

VijayPande wrote:Our server code will give a WU a certain # of tries and then stop it. We'll look to see if this code is not working in this case or if there's something else going on here.
+1 more. :( Most of my Linux boxes and now starting on my Windows machines. This WU stalls the client w/ a seg fault in Linux and results in no further activity (w/o a F@H restart) in Windows.
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by ppetrone »

Thanks for being patient, SW. Let me check again.

Paula
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by VijayPande »

I was hoping the code would take care of this automatically (since we can't kill WU's by hand too frequently), but that's not working. We're working on a server code update to handle that. For now, I've manually killed this WU.
sick willie
Posts: 33
Joined: Sun May 25, 2008 7:40 pm

Re: Project: 2665 (Run 1, Clone 649, Gen 6)

Post by sick willie »

Thank you. :)
Post Reply