Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Moderators: Site Moderators, FAHC Science Team

tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by tear »

Water molecule can not be settled, 100% reproducible, see two following terminal
output snippets.

Snippet A (side note: client hung and did not continue until ^C):

Code: Select all

Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.

8 cores detected


--- Opening Log file [June 6 11:58:42 UTC]


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.24beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /fah/clients/fah
Executable: ./fah6
Arguments: -oneunit -verbosity 9 -forceasm -smp

[11:58:42] - Ask before connecting: No
[11:58:42] - User name: tear (Team 100259)
[11:58:42] - User ID: 1FD229A605CD6A27
[11:58:42] - Machine ID: 3
[11:58:42]
[11:58:42] Loaded queue successfully.
[11:58:42] - Preparing to get new work unit...
[11:58:42] + Attempting to get work packet
[11:58:42] - Will indicate memory of 2013 MB
[11:58:42] - Connecting to assignment server
[11:58:42] Connecting to http://assign.stanford.edu:8080/
[11:58:42] - Autosending finished units... [11:58:42]
[11:58:42] Trying to send all finished work units
[11:58:42] + No unsent completed units remaining.
[11:58:42] - Autosend completed
[11:58:44] Posted data.
[11:58:44] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[11:58:44] + News From Folding@Home: Welcome to Folding@Home
[11:58:44] Loaded queue successfully.
[11:58:44] Connecting to http://171.67.108.24:8080/
[11:58:50] Posted data.
[11:58:50] Initial: 0000; - Receiving payload (expected size: 4842125)
[11:59:05] - Downloaded at ~315 kB/s
[11:59:05] - Averaged speed for that direction ~1264 kB/s
[11:59:05] + Received work.
[11:59:05] + Closed connections
[11:59:05]
[11:59:05] + Processing work unit
[11:59:05] At least 4 processors must be requested.Core required: FahCore_a2.exe
[11:59:05] Core found.
[11:59:05] Working on queue slot 05 [June 6 11:59:05 UTC]
[11:59:05] + Working ...
[11:59:05] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 05 -nocpulock -checkpoint 15 -forceasm -verbose -lifeline 27519 -version 624'

Warning: Ignoring unknown arg
Warning: Ignoring unknown arg
Warning: Ignoring unknown arg
Warning: Ignoring unknown arg
[11:59:05]
[11:59:05] *------------------------------*
[11:59:05] Folding@Home Gromacs SMP Core
[11:59:05] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[11:59:05]
[11:59:05] Preparing to commence simulation
[11:59:05] - Ensuring status. Please wait.
[11:59:14] - Assembly optimizations manually forced on.
[11:59:14] - Not checking prior termination.
[11:59:15] - Expanded 4841613 -> 24004881 (decompressed 495.8 percent)
[11:59:15] Called DecompressByteArray: compressed_data_size=4841613 data_size=24004881, decompressed_data_size=24004881 diff=0
[11:59:16] - Digital signature verified
[11:59:16]
[11:59:16] Project: 2671 (Run 3, Clone 82, Gen 42)
[11:59:16]
[11:59:16] Assembly optimizations on if available.
[11:59:16] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=octopus
NNODES=4, MYRANK=1, HOSTNAME=octopus
NNODES=4, MYRANK=2, HOSTNAME=octopus
NNODES=4, MYRANK=3, HOSTNAME=octopus
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_05.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22878 system in water'
10750000 steps,  21500.0 ps (continuing from step 10500000,  21000.0 ps).
[11:59:25] Completed 0 out of 250000 steps  (0%)

t = 21000.005 ps: Water molecule starting at atom 95476 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 21000.007 ps: Water molecule starting at atom 46285 can not be settled.
Check for bad contacts and/or reduce the timestep.

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483503. It should have been within [ 0 .. 2312 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 2, will try to stop all the nodes
Halting parallel pro
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483519. It should have been within [ 0 .. 1800 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 2 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_2]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483527. It should have been within [ 0 .. 1568 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
gram mdrun on CPU 3 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
Snippet B: (client did not hang)

Code: Select all

Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.

8 cores detected


--- Opening Log file [June 6 16:27:21 UTC]


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.24beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /fah/clients/fah
Executable: ./fah6
Arguments: -oneunit -verbosity 9 -forceasm -smp

[16:27:21] - Ask before connecting: No
[16:27:21] - User name: tear (Team 100259)
[16:27:21] - User ID: 1FD229A605CD6A27
[16:27:21] - Machine ID: 3
[16:27:21]
[16:27:22] Loaded queue successfully.
[16:27:22]
[16:27:22] + Processing work unit
[16:27:22] At least 4 processors must be requested.Core required: FahCore_a2.exe
[16:27:22] Core found.
[16:27:22] - Autosending finished units... [June 6 16:27:22 UTC]
[16:27:22] Working on queue slot 05 [June 6 16:27:22 UTC]
[16:27:22] Trying to send all finished work units
[16:27:22] + Working ...
[16:27:22] + No unsent completed units remaining.
[16:27:22] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 05 -nocpulock -checkpoint 15 -forceasm -verbose -lifeline 5785 -version 624'

[16:27:22] - Autosend completed
Warning: Ignoring unknown arg
Warning: Ignoring unknown arg
Warning: Ignoring unknown arg
Warning: Ignoring unknown arg
[16:27:22]
[16:27:22] *------------------------------*
[16:27:22] Folding@Home Gromacs SMP Core
[16:27:22] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[16:27:22]
[16:27:22] Preparing to commence simulation
[16:27:22] - Ensuring status. Please wait.
[16:27:31] - Assembly optimizations manually forced on.
[16:27:31] - Not checking prior termination.
[16:27:32] - Expanded 4841613 -> 24004881 (decompressed 495.8 percent)
[16:27:32] Called DecompressByteArray: compressed_data_size=4841613 data_size=24004881, decompressed_data_size=24004881 diff=0
[16:27:33] - Digital signature verified
[16:27:33]
[16:27:33] Project: 2671 (Run 3, Clone 82, Gen 42)
[16:27:33]
[16:27:33] Assembly optimizations on if available.
[16:27:33] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=octopus
NNODES=4, MYRANK=0, HOSTNAME=octopus
NNODES=4, MYRANK=2, HOSTNAME=octopus
NNODES=4, MYRANK=3, HOSTNAME=octopus
NODEID=2 argc=20
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=3 argc=20
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_05.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22878 system in water'
10750000 steps,  21500.0 ps (continuing from step 10500000,  21000.0 ps).
[16:27:42] Completed 0 out of 250000 steps  (0%)

t = 21000.005 ps: Water molecule starting at atom 95476 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 21000.007 ps: Water molecule starting at atom 46285 can not be settled.
Check for bad contacts and/or reduce the timestep.

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483503. It should have been within [ 0 .. 2312 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 2, will try to stop all the nodes
Halting parallel pro
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483519. It should have been within [ 0 .. 1800 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 2 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_2]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
gram mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483527. It should have been within [ 0 .. 1568 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
[0]0:Return code = 255
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 255
[0]3:Return code = 255
[16:27:48] CoreStatus = FF (255)
[16:27:48] Sending work to server
[16:27:48] Project: 2671 (Run 3, Clone 82, Gen 42)
[16:27:48] - Error: Could not get length of results file work/wuresults_05.dat
[16:27:48] - Error: Could not read unit 05 file. Removing from queue.
[16:27:48] Trying to send all finished work units
[16:27:48] + No unsent completed units remaining.
[16:27:48] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[16:27:48] + Attempting to get work packet
[16:27:48] - Will indicate memory of 2013 MB
[16:27:48] - Connecting to assignment server
[16:27:48] ***** Got a SIGTERM signal (15)
[16:27:48] Connecting to http://assign.stanford.edu:8080/
[16:27:48] Killing all core threads

Folding@Home Client Shutdown.

tear
One man's ceiling is another man's floor.
Image
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by susato »

Tear, obviously this one isn't going to succeed on your equipment. The problem certainly looks like something wrong with the WU. Feel free to trash your work folder and queue and try to get another WU.

We'll wait a few days to see if anyone else reports trouble with this one. Thank you for posting.
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by tear »

No problem.

I reproduced the problem to make sure fault is not mine.

tear
One man's ceiling is another man's floor.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by bruce »

tear wrote:No problem.

I reproduced the problem to make sure fault is not mine.

tear
Reproduced the problem on a different computer?
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by tear »

"Another computer" bit is not relevant because problem occurs at the exactly same simulation step every single time.

tear
One man's ceiling is another man's floor.
Image
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by tear »

Got it assigned to another machine. Same thing.
One man's ceiling is another man's floor.
Image
klasseng
Posts: 126
Joined: Thu Dec 27, 2007 6:08 am
Hardware configuration: System 1: Mac Studio, M1 Max,
System 2: Mac Mini, M2
Location: Canada

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by klasseng »

Got assigned to one of my machines . . . same thing.

Bad WU!?
peace,
Grant
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by bruce »

Reported as a bad WU.
Foxbat
Posts: 94
Joined: Wed Dec 05, 2007 10:23 pm
Hardware configuration: Apple Mac Pro 1,1 2x2.66 GHz Dual-Core Xeon w/10 GB RAM | EVGA GTX 960, Zotac GTX 750 Ti | Ubuntu 14.04 LTS
Dell Precision T7400 2x3.0 GHz Quad-Core Xeon w/16 GB RAM | Zotac GTX 970 | Ubuntu 14.04 LTS
Apple iMac Retina 5K 4.00 GHz Core i7 w/8 GB RAM | OS X 10.11.3 (El Capitan)
Location: Michiana, USA

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by Foxbat »

Bruce, I guess I was too fast... I picked it up on my 2.66 GHz Mac Pro Quad:

Code: Select all

[22:07:46] Completed 242500 out of 250000 steps  (97%)
[22:15:03] Completed 245000 out of 250000 steps  (98%)
[22:22:20] Completed 247500 out of 250000 steps  (99%)
[22:29:38] Completed 250000 out of 250000 steps  (100%)
[22:29:39] DynamicWrapper: Finished Work Unit: sleep=10000
[22:29:49] 
[22:29:49] Finished Work Unit:
[22:29:49] - Reading up to 21220128 from "work/wudata_05.trr": Read 21220128
[22:29:50] trr file hash check passed.
[22:29:50] - Reading up to 4399256 from "work/wudata_05.xtc": Read 4399256
[22:29:50] xtc file hash check passed.
[22:29:50] edr file hash check passed.
[22:29:50] logfile size: 181807
[22:29:50] Leaving Run
[22:29:54] - Writing 25946287 bytes of core data to disk...
[22:29:54]   ... Done.
[22:29:58] - Shutting down core
[22:29:58] 
[22:29:58] Folding@home Core Shutdown: FINISHED_UNIT
[22:33:13] CoreStatus = 64 (100)
[22:33:13] Unit 5 finished with 83 percent of time to deadline remaining.
[22:33:13] Updated performance fraction: 0.829133
[22:33:13] Sending work to server
[22:33:13] Project: 2676 (Run 3, Clone 179, Gen 140)


[22:33:13] + Attempting to send results [June 10 22:33:13 UTC]
[22:33:13] - Reading file work/wuresults_05.dat from core
[22:33:13]   (Read 25946287 bytes from disk)
[22:33:13] Connecting to http://171.67.108.24:8080/
[22:38:47] Posted data.
[22:38:47] Initial: 0000; - Uploaded at ~74 kB/s
[22:38:51] - Averaged speed for that direction ~73 kB/s
[22:38:51] + Results successfully sent
[22:38:51] Thank you for your contribution to Folding@Home.
[22:38:51] + Number of Units Completed: 998

[22:38:52] - Warning: Could not delete all work unit files (5): Core file absent
[22:38:52] Trying to send all finished work units
[22:38:52] + No unsent completed units remaining.
[22:38:52] - Preparing to get new work unit...
[22:38:52] Cleaning up work directory
[22:38:53] + Attempting to get work packet
[22:38:53] - Will indicate memory of 4096 MB
[22:38:53] - Connecting to assignment server
[22:38:53] Connecting to http://assign.stanford.edu:8080/
[22:38:53] Posted data.
[22:38:53] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[22:38:53] + News From Folding@Home: Welcome to Folding@Home
[22:38:53] Loaded queue successfully.
[22:38:53] Connecting to http://171.67.108.24:8080/
[22:38:59] Posted data.
[22:38:59] Initial: 0000; - Receiving payload (expected size: 4842125)
[22:39:08] - Downloaded at ~525 kB/s
[22:39:08] - Averaged speed for that direction ~424 kB/s
[22:39:08] + Received work.
[22:39:08] Trying to send all finished work units
[22:39:08] + No unsent completed units remaining.
[22:39:08] + Closed connections
[22:39:08] 
[22:39:08] + Processing work unit
[22:39:08] At least 4 processors must be requested; read 1.
[22:39:08] Core required: FahCore_a2.exe
[22:39:08] Core found.
[22:39:08] - Using generic ./mpiexec
[22:39:08] Working on queue slot 06 [June 10 22:39:08 UTC]
[22:39:08] + Working ...
[22:39:08] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 06 -priority 96 -checkpoint 8 -forceasm -verbose -lifeline 424 -version 624'

[22:39:09] 
[22:39:09] *------------------------------*
[22:39:09] Folding@Home Gromacs SMP Core
[22:39:09] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[22:39:09] 
[22:39:09] Preparing to commence simulation
[22:39:09] - Ensuring status. Please wait.
[22:39:18] - Assembly optimizations manually forced on.
[22:39:18] - Not checking prior termination.
[22:39:19] - Expanded 4841613 -> 24004881 (decompressed 495.8 percent)
[22:39:19] Called DecompressByteArray: compressed_data_size=4841613 data_size=24004881, decompressed_data_size=24004881 diff=0
[22:39:20] - Digital signature verified
[22:39:20] 
[22:39:20] Project: 2671 (Run 3, Clone 82, Gen 42)
[22:39:20] 
[22:39:20] Assembly optimizations on if available.
[22:39:20] Entering M.D.
[22:39:30] Completed 0 out of 250000 steps  (0%)
[22:39:32] 
[22:39:32] Folding@home Core Shutdown: INTERRUPTED
[22:39:36] CoreStatus = 66 (102)
[22:39:36] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[22:39:36] Killing all core threads

Folding@Home Client Shutdown.


--- Opening Log file [June 10 22:40:06 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.24R1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/Foxbat/Library/FAH-SMP-Term1
Executable: /Users/Foxbat/Library/FAH-SMP-Term1/fah6
Arguments: -local -advmethods -forceasm -verbosity 9 -smp 

[22:40:06] - Ask before connecting: No
[22:40:06] - User name: Foxbat (Team 55236)
[22:40:06] - User ID: 3DA6459B38FDAE1E
[22:40:06] - Machine ID: 1
[22:40:06] 
[22:40:06] Loaded queue successfully.
[22:40:06] - Autosending finished units... [June 10 22:40:06 UTC]
[22:40:06] 
[22:40:06] Trying to send all finished work units
[22:40:06] + Processing work unit
[22:40:06] + No unsent completed units remaining.
[22:40:06] At least 4 processors must be requested; read 1.
[22:40:06] - Autosend completed
[22:40:06] Core required: FahCore_a2.exe
[22:40:06] Core found.
[22:40:06] - Using generic ./mpiexec
[22:40:06] Working on queue slot 06 [June 10 22:40:06 UTC]
[22:40:06] + Working ...
[22:40:06] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 06 -priority 96 -checkpoint 8 -forceasm -verbose -lifeline 1553 -version 624'

[22:40:06] 
[22:40:06] *------------------------------*
[22:40:06] Folding@Home Gromacs SMP Core
[22:40:06] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[22:40:06] 
[22:40:06] Preparing to commence simulation
[22:40:06] - Ensuring status. Please wait.
[22:40:06] - Expanded 4841613 -> 24004881 (decompressed 495.8 percent)
[22:40:07] Called DecompressByteArray: compressed_data_size=4841613 data_size=24004881, decompressed_data_size=24004881 diff=0
[22:40:08] - Digital signature verified
[22:40:08] 
[22:40:08] Project: 2671 (Run 3, Clone 82, Gen 42)
[22:40:08] 
[22:40:08] Assembly optimizations on if available.
[22:40:08] Entering M.D.
[22:40:19]  on if available.
[22:40:19] Entering M.D.
[22:40:28]  (0%)
[22:40:30] 
[22:40:30] Folding@home Core Shutdown: INTERRUPTED
[22:40:34] CoreStatus = 66 (102)
[22:40:34] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[22:40:34] Killing all core threads

Folding@Home Client Shutdown.
It continues to fail to start. I'm going to clean out the Folding directory, apply Apple updates, and try again.

Update: Yep, that fixed it. Thanks, Bruce!
Image
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Project: 2671 (Run 3, Clone 82, Gen 42) seg fault

Post by alpha754293 »

seg fault on a different comp; dual AMD Opteron 2220 on Tyan S2915WA2NRF.

log:

Code: Select all

[01:12:34] 
[01:12:34] *------------------------------*
[01:12:34] Folding@Home Gromacs SMP Core
[01:12:34] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[01:12:34] 
[01:12:34] Preparing to commence simulation
[01:12:34] - Ensuring status. Please wait.
[01:12:35] Called DecompressByteArray: compressed_data_size=4841613 data_size=24004881, decompressed_data_size=24004881 diff=0
[01:12:35] - Digital signature verified
[01:12:35] 
[01:12:35] Project: 2671 (Run 3, Clone 82, Gen 42)
[01:12:35] 
[01:12:35] Assembly optimizations on if available.
[01:12:35] Entering M.D.
[01:12:42] Multi-core optimizations on
[01:12:45] ntering M.D.
NNODES=4, MYRANK=0, HOSTNAME=opteron3
NNODES=4, MYRANK=2, HOSTNAME=opteron3
NNODES=4, MYRANK=3, HOSTNAME=opteron3
NNODES=4, MYRANK=1, HOSTNAME=opteron3
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
[01:12:52] Multi-core optimizations on
Note: tpx file_version 48, software version 64

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22878 system in water'
10750000 steps,  21500.0 ps (continuing from step 10500000,  21000.0 ps).
[01:12:54] Completed 0 out of 250000 steps  (0%)

t = 21000.005 ps: Water molecule starting at atom 95476 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 21000.007 ps: Water molecule starting at atom 46285 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 21000.009 ps: Water molecule starting at atom 95476 can not be settled.
Check for bad contacts and/or reduce the timestep.
[01:12:55] 
[01:12:55] Folding@home Core Shutdown: INTERRUPTED
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[cli_1]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
[cli_2]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
[0]0:Return code = 102
[0]1:Return code = 1
[0]2:Return code = 1
[0]3:Return code = 0, signaled with Segmentation fault
[01:12:59] CoreStatus = 66 (102)
[01:12:59] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[01:12:59] Killing all core threads

Folding@Home Client Shutdown.
restarting client...
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: Project: 2671 (Run 3, Clone 82, Gen 42) seg fault

Post by alpha754293 »

No matter how many times I restart the client, I get the same error.

Deleted queue.dat to move the client along.
Foxbat
Posts: 94
Joined: Wed Dec 05, 2007 10:23 pm
Hardware configuration: Apple Mac Pro 1,1 2x2.66 GHz Dual-Core Xeon w/10 GB RAM | EVGA GTX 960, Zotac GTX 750 Ti | Ubuntu 14.04 LTS
Dell Precision T7400 2x3.0 GHz Quad-Core Xeon w/16 GB RAM | Zotac GTX 970 | Ubuntu 14.04 LTS
Apple iMac Retina 5K 4.00 GHz Core i7 w/8 GB RAM | OS X 10.11.3 (El Capitan)
Location: Michiana, USA

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by Foxbat »

bruce wrote:Reported as a bad WU.
:( It's still in the system. I got it again today and it idled my Mac for most of the day.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by bruce »

Foxbat wrote:
bruce wrote:Reported as a bad WU.
:( It's still in the system. I got it again today and it idled my Mac for most of the day.
All I can do is report it. Somebody from the Pande Group has to remove it from circulation.
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by bollix47 »

FYI

Code: Select all

[18:52:09] + Processing work unit
[18:52:09] At least 4 processors must be requested.Core required: FahCore_a2.exe
[18:52:09] Core found.
[18:52:09] Working on queue slot 00 [June 12 18:52:09 UTC]
[18:52:09] + Working ...
[18:52:09] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 00 -checkpoint 30 -verbose -lifeline 22373 -version 624'

[18:52:09] 
[18:52:09] *------------------------------*
[18:52:09] Folding@Home Gromacs SMP Core
[18:52:09] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[18:52:09] 
[18:52:09] Preparing to commence simulation
[18:52:09] - Ensuring status. Please wait.
[18:52:10] Called DecompressByteArray: compressed_data_size=4841613 data_size=24004881, decompressed_data_size=24004881 diff=0
[18:52:10] - Digital signature verified
[18:52:10] 
[18:52:10] Project: 2671 (Run 3, Clone 82, Gen 42)
[18:52:10] 
[18:52:10] Assembly optimizations on if available.
[18:52:10] Entering M.D.
[18:52:20] (Run 3, Clone 82, Gen 42)
[18:52:20] 
[18:52:20] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=challenger
NNODES=4, MYRANK=1, HOSTNAME=challenger
NNODES=4, MYRANK=2, HOSTNAME=challenger
NNODES=4, MYRANK=3, HOSTNAME=challenger
NODEID=0 argc=20
NODEID=1 argc=20
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_00.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=2 argc=20
NODEID=3 argc=20
Note: tpx file_version 48, software version 64

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22878 system in water'
10750000 steps,  21500.0 ps (continuing from step 10500000,  21000.0 ps).

t = 21000.005 ps: Water molecule starting at atom 95476 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 21000.007 ps: Water molecule starting at atom 46285 can not be settled.
Check for bad contacts and/or reduce the timestep.

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483519. It should have been within [ 0 .. 1800 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483503. It should have been within [ 0 .. 2312 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 2, will try to stop all the nodes
Halting parallel program mdrun on CPU 2 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_2]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: nsgrid.c, line: 357

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.

Variable ci has value -2147483527. It should have been within [ 0 .. 1568 ]

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
[0]0:Return code = 255
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 255
[0]3:Return code = 255
[18:52:35] CoreStatus = FF (255)
[18:52:35] Sending work to server
[18:52:35] Project: 2671 (Run 3, Clone 82, Gen 42)
[18:52:35] - Error: Could not get length of results file work/wuresults_00.dat
[18:52:35] - Error: Could not read unit 00 file. Removing from queue.
[18:52:35] Trying to send all finished work units
[18:52:35] + No unsent completed units remaining.
The client did not abort or hang but carried on with a different WU and appears to be fine with the new one.
Image
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2671 (Run 3, Clone 82, Gen 42) WMCNBS 100% reprod.

Post by kasson »

We re-generated the work unit. Hopefully it will run successfully now. Thanks for the error reports.
Post Reply