Page 1 of 1

FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Posted: Wed Jan 20, 2010 7:58 am
by Karamiekos
Dont know whats going on here, but I finished one work unit and downloaded this one, but it can't even start........
I tried deleting the core and downloading a new one Just to make sure it wasn't corrupted or anything, but no help.

Code: Select all

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
16 cores detected


--- Opening Log file [January 20 07:35:41 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.24R3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /media/fah/fah
Executable: ./fah6
Arguments: -bigadv -smp 15 -verbosity 9 -local 

[07:35:41] - Ask before connecting: No
[07:35:41] - User name: Karamiekos (Team 36837)
[07:35:41] - User ID: 23E128773C713161
[07:35:41] - Machine ID: 1
[07:35:41] 
[07:35:41] Loaded queue successfully.
[07:35:41] 
[07:35:41] - Autosending finished units... [January 20 07:35:41 UTC]
[07:35:41] + Processing work unit
[07:35:41] Trying to send all finished work units
[07:35:41] Core required: FahCore_a2.exe
[07:35:41] + No unsent completed units remaining.
[07:35:41] Core found.
[07:35:41] - Autosend completed
[07:35:41] Working on queue slot 06 [January 20 07:35:41 UTC]
[07:35:41] + Working ...
[07:35:41] - Calling './mpiexec -np 15 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 06 -priority 96 -checkpoint 22 -verbose -lifeline 6102 -version 624'

[07:35:42] 
[07:35:42] *------------------------------*
[07:35:42] Folding@Home Gromacs SMP Core
[07:35:42] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[07:35:42] 
[07:35:42] Preparing to commence simulation
[07:35:42] - Ensuring status. Please wait.
[07:35:47] Called DecompressByteArray: compressed_data_size=30234593 data_size=159270593, decompressed_data_size=159270593 diff=0
[07:35:49] - Digital signature verified
[07:35:49] 
[07:35:49] Project: 2683 (Run 5, Clone 5, Gen 14)
[07:35:49] 
[07:35:49] Assembly optimizations on if available.
[07:35:49] Entering M.D.
[07:35:59]  (Run 5, Clone 5, Gen 14)
[07:35:59] 
[07:36:00] Entering M.D.
NNODES=15, MYRANK=1, HOSTNAME=new-host-2
NNODES=15, MYRANK=6, HOSTNAME=new-host-2
NNODES=15, MYRANK=7, HOSTNAME=new-host-2
NNODES=15, MYRANK=8, HOSTNAME=new-host-2
NNODES=15, MYRANK=12, HOSTNAME=new-host-2
NNODES=15, MYRANK=13, HOSTNAME=new-host-2
NNODES=15, MYRANK=14, HOSTNAME=new-host-2
NNODES=15, MYRANK=3, HOSTNAME=new-host-2
NNODES=15, MYRANK=4, HOSTNAME=new-host-2
NNODES=15, MYRANK=10, HOSTNAME=new-host-2
NNODES=15, MYRANK=2, HOSTNAME=new-host-2
NNODES=15, MYRANK=11, HOSTNAME=new-host-2
NNODES=15, MYRANK=5, HOSTNAME=new-host-2
NNODES=15, MYRANK=9, HOSTNAME=new-host-2
NNODES=15, MYRANK=0, HOSTNAME=new-host-2
NODEID=0 argc=20
Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=2 argc=20
NODEID=3 argc=20
NODEID=4 argc=20
NODEID=6 argc=20
NODEID=7 argc=20
NODEID=8 argc=20
NODEID=10 argc=20
NODEID=12 argc=20
NODEID=13 argc=20
NODEID=14 argc=20
NODEID=5 argc=20
NODEID=11 argc=20
NODEID=9 argc=20
NODEID=1 argc=20
Note: tpx file_version 48, software version 68

Will use 10 particle-particle and 5 PME only nodes
This is a guess, check the performance at the end of the log file

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 2D domain decomposition 5 x 1 x 2
starting mdrun 'SINGLE VESICLE in water'
3750000 steps,  15000.0 ps (continuing from step 3500000,  14000.0 ps).
[07:36:24] Completed 0 out of 250000 steps  (0%)

t = 14000.001 ps: Water molecule starting at atom 149430 can not be settled.
Check for bad contacts and/or reduce the timestep.
[07:36:26] 
[07:36:26] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Quit
[0]4:Return code = 0, signaled with Quit
[0]5:Return code = 0, signaled with Quit
[0]6:Return code = 0, signaled with Quit
[0]7:Return code = 0, signaled with Quit
[0]8:Return code = 0, signaled with Quit
[0]9:Return code = 0, signaled with Quit
[0]10:Return code = 0, signaled with Quit
[0]11:Return code = 0, signaled with Quit
[0]12:Return code = 0, signaled with Quit
[0]13:Return code = 0, signaled with Quit
[0]14:Return code = 0, signaled with Quit
[07:36:41] CoreStatus = 66 (102)
[07:36:41] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[07:36:41] Killing all core threads

Folding@Home Client Shutdown.

Re: FAH Core Interrupted

Posted: Wed Jan 20, 2010 8:06 am
by Karamiekos
I tried to delete it, but I kept getting the same exact Work unit back, and always had the same result. I have had zero problems up to now. I think there might be something wrong with this particular one. I finally got a different work unit after 2-3 attempts and it is working fine.

Just a heads up.... it would be nice to double verify with the next person to get it and see if they have problems.

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Posted: Wed Jan 20, 2010 7:57 pm
by rickoic
The line:
t=14000.001 ps: Water molecule starting at atom 149430 can not bge settled.

indicates that be beginning parameters are so out of bounds that folding cannont be accomplished.

You need to post this in the forum about problems with a specific work unit so that pandegroup can remove the work unit.

Until then you may have to remove the -bigadv and fold one of the 1920pt wu's to get your pc back up and running.
Erase this one. Remove the -bigadv. D/l a wu. Stop it and put the -bigadv back in if you want, or fold that way for a day or so.

Probably the only way your going to be able to continue folding around receiving this bad wu.

Fold on

Rick

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Posted: Wed Jan 20, 2010 9:00 pm
by Karamiekos
I did get it up and running on another unit no problems. I am definitely leaning towards a work unit problem. I posted here due to the unique nature of the -bigadv project, but if the mods see fit I hope they move the thread wherever it needs to be.

Thank you for the input though Rick, if they don't seem to notice this thread I will try again to contact and make sure they are aware. I know they are busy.

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Posted: Thu Jan 21, 2010 3:28 am
by bruce
There's certainly a possibility of a bad WU but has this particular client completed other BigWUs successfully? One possible cause for the error is insufficient (virtual?) memory.

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Posted: Thu Jan 21, 2010 3:56 am
by Karamiekos
This client has been running big work units good for about a month now. The machine usually doesn't dip into the file swap with 8 gigs of ram and an 11 gig swap it usually uses less than 6 gigs of ram and no file swap.
I would be really interested to see what happens if someone else gets it.

Project: 2683 (Run 5, Clone 5, Gen 14)

Posted: Thu Jan 21, 2010 5:02 am
by k1wi

Code: Select all

[16:47:19] *------------------------------*
[16:47:19] Folding@Home Gromacs SMP Core
[16:47:19] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[16:47:19] 
[16:47:19] Preparing to commence simulation
[16:47:19] - Ensuring status. Please wait.
[16:47:19] Files status OK
[16:47:22] - Expanded 30234593 -> 159270593 (decompressed 100.6 percent)
[16:47:22] Called DecompressByteArray: compressed_data_size=30234593 data_size=159270593, decompressed_data_size=159270593 diff=0
[16:47:23] - Digital signature verified
[16:47:23] 
[16:47:23] Project: 2683 (Run 5, Clone 5, Gen 14)
[16:47:23] 
[16:47:23] Assembly optimizations on if available.
[16:47:23] Entering M.D.
[16:47:34]  (Run 5, Clone 5, Gen 14)
[16:47:34] 
[16:47:35] Entering M.D.
NNODES=8, MYRANK=0, HOSTNAME=FAH
NODEID=0 argc=20
NNODES=8, MYRANK=1, HOSTNAME=FAH
NODEID=1 argc=20
NNODES=8, MYRANK=2, HOSTNAME=FAH
NODEID=2 argc=20
NNODES=8, MYRANK=3, HOSTNAME=FAH
NODEID=3 argc=20
NNODES=8, MYRANK=4, HOSTNAME=FAH
NODEID=4 argc=20
NNODES=8, MYRANK=5, HOSTNAME=FAH
NODEID=5 argc=20
NNODES=8, MYRANK=6, HOSTNAME=FAH
NODEID=6 argc=20
NNODES=8, MYRANK=7, HOSTNAME=FAH
NODEID=7 argc=20
Reading file work/wudata_02.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 8 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
3750000 steps,  15000.0 ps (continuing from step 3500000,  14000.0 ps).
[16:47:55] Completed 0 out of 250000 steps  (0%)

t = 14000.001 ps: Water molecule starting at atom 859944 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 14000.001 ps: Water molecule starting at atom 597471 can not be settled.
Check for bad contacts and/or reduce the timestep.
[16:47:57] 
[16:47:57] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[0]4:Return code = 0, signaled with Quit
[0]5:Return code = 0, signaled with Quit
[0]6:Return code = 0, signaled with Quit
[0]7:Return code = 0, signaled with Quit
[16:48:05] CoreStatus = 66 (102)
[16:48:05] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)

Folding@Home Client Shutdown.

I just about to go and look @ how to get rid of this work unit, as it isn't starting a new one, so my computer's sitting idle at the moment.

Will this hurt my passkey ratio? IE affect whether or not I earn -bigadv bonuses

Re: Project: 2683 (Run 5, Clone 5, Gen 14)

Posted: Thu Jan 21, 2010 6:38 am
by bruce
The ratio for bonuses is 80% so unless you've had other failures, it will not affect your bonus.

I'll report this as a bad WU.

Re: Project: 2683 (Run 5, Clone 5, Gen 14)

Posted: Sat Jan 23, 2010 5:53 pm
by tear
FYI, it's failed for me the same way on 20th; found it in the log just now.