Project: 2671 (Run 59, Clone 80, Gen 122)

Moderators: Site Moderators, FAHC Science Team

Post Reply
BrokenWolf
Posts: 126
Joined: Sat Aug 02, 2008 3:08 am

Project: 2671 (Run 59, Clone 80, Gen 122)

Post by BrokenWolf »

Gave an error at the start. Water molecules @ atom #### can not be settled. Known good folding system.

Code: Select all

[06:10:49] *------------------------------*
[06:10:49] Folding@Home Gromacs SMP Core
[06:10:49] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[06:10:49] 
[06:10:49] Preparing to commence simulation
[06:10:49] - Ensuring status. Please wait.
[06:10:49] Files status OK
[06:10:50] - Expanded 4835226 -> 24044373 (decompressed 497.2 percent)
[06:10:50] Called DecompressByteArray: compressed_data_size=4835226 data_size=24044373, decompressed_data_size=24044373 diff=0
[06:10:51] - Digital signature verified
[06:10:51] 
[06:10:51] Project: 2671 (Run 59, Clone 80, Gen 122)
[06:10:51] 
[06:10:51] Assembly optimizations on if available.
[06:10:51] Entering M.D.
[06:11:00] un 59, Clone 80, Gen 122)
[06:11:00] 
[06:11:00] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=fold55
NNODES=4, MYRANK=0, HOSTNAME=fold55
NODEID=0 argc=20
NNODES=4, MYRANK=3, HOSTNAME=fold55
NODEID=1 argc=20
NNODES=4, MYRANK=2, HOSTNAME=fold55
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_00.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22887 system in water'
30750000 steps,  61500.0 ps (continuing from step 30500000,  61000.0 ps).

t = 61000.003 ps: Water molecule starting at atom 39907 can not be settled.
Check for bad contacts and/or reduce the timestep.
[06:11:10] Completed 0 out of 250000 steps  (0%)

t = 61000.005 ps: Water molecule starting at atom 137968 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.007 ps: Water molecule starting at atom 39907 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.009 ps: Water molecule starting at atom 137968 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.011 ps: Water molecule starting at atom 38731 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.013 ps: Water molecule starting at atom 137968 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.015 ps: Water molecule starting at atom 38731 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.017 ps: Water molecule starting at atom 137968 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.019 ps: Water molecule starting at atom 38731 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 61000.021 ps: Water molecule starting at atom 137968 can not be settled.
Check for bad contacts and/or reduce the timestep.

Step 30500010:
The charge group starting at atom 38731 moved than the distance allowed by the domain decomposition (1.200000) in direction Z
distance out of cell 175.460800
Old coordinates:    9.513    0.233   11.192
New coordinates:  -73.201  151.582  186.917
Old cell boundaries in direction Z:    7.500   11.250
New cell boundaries in direction Z:    7.395   11.456

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 4038

Fatal error:
A charge group moved too far between two domain decomposition steps
This usually means that your system is not well equilibrated
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 2, will try to stop all the nodes
Halting parallel program mdrun on CPU 2 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
[06:11:12] 
[06:11:12] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[06:24:51] ***** Got an Activate signal (2)
[06:24:51] Killing all core threads

Folding@Home Client Shutdown.
Image
Bastien
Posts: 3
Joined: Sat Jan 26, 2008 4:36 pm

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by Bastien »

I can confirm this. I have also received this WU 2 hours ago and I got the same errors.
Freddy_Frog
Posts: 9
Joined: Sat Aug 09, 2008 11:48 am

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by Freddy_Frog »

Yep, still bad....

Working on queue slot 04 [November 15 04:21:57 UTC]
[04:21:57] + Working ...
[04:21:57]
[04:21:57] *------------------------------*
[04:21:57] Folding@Home Gromacs SMP Core
[04:21:57] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[04:21:57]
[04:21:57] Preparing to commence simulation
[04:21:57] - Ensuring status. Please wait.
[04:22:07] - Assembly optimizations manually forced on.
[04:22:07] - Not checking prior termination.
[04:22:08] - Expanded 4835226 -> 24044373 (decompressed 497.2 percent)
[04:22:08] Called DecompressByteArray: compressed_data_size=4835226 data_size=24044373, decompressed_data_size=24044373 diff=0
[04:22:08] - Digital signature verified
[04:22:08]
[04:22:08] Project: 2671 (Run 59, Clone 80, Gen 122)
[04:22:08]
[04:22:08] Assembly optimizations on if available.
[04:22:08] Entering M.D.
[04:22:17] Completed 0 out of 250000 steps (0%)
[04:22:18]
[04:22:18] Folding@home Core Shutdown: INTERRUPTED
[04:22:23] CoreStatus = 66 (102)
[04:22:23] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
AllGold
Posts: 6
Joined: Tue Jan 22, 2008 6:36 pm

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by AllGold »

This same WU just got me as well.

It's difficult to move on to another WU because it doesn't get far enough to leave anything to send back.
Freddy_Frog
Posts: 9
Joined: Sat Aug 09, 2008 11:48 am

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by Freddy_Frog »

I deleted the work folder and queue.dat and received a different unit, but when it finished I ended up with this again. When someone is back at work can they please kill this one?

Working on queue slot 02 [November 15 15:25:37 UTC]
[15:25:37] + Working ...
[15:25:37]
[15:25:37] *------------------------------*
[15:25:37] Folding@Home Gromacs SMP Core
[15:25:37] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[15:25:37]
[15:25:37] Preparing to commence simulation
[15:25:37] - Ensuring status. Please wait.
[15:25:47] - Assembly optimizations manually forced on.
[15:25:47] - Not checking prior termination.
[15:25:48] - Expanded 4835226 -> 24044373 (decompressed 497.2 percent)
[15:25:48] Called DecompressByteArray: compressed_data_size=4835226 data_size=24044373, decompressed_data_size=24044373 diff=0
[15:25:48] - Digital signature verified
[15:25:48]
[15:25:48] Project: 2671 (Run 59, Clone 80, Gen 122)
[15:25:48]
[15:25:48] Assembly optimizations on if available.
[15:25:48] Entering M.D.
[15:25:57] Completed 0 out of 250000 steps (0%)
[15:25:58]
[15:25:58] Folding@home Core Shutdown: INTERRUPTED
[15:26:02] CoreStatus = 66 (102)
[15:26:02] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
_zzz_
Posts: 1
Joined: Sun Sep 13, 2009 9:23 am

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by _zzz_ »

I have the same problem with project 2671 running in a quad core SMP linux (VM).

Right after the beginning of processing, this occurs. I deleted the queue.dat + work folder, and the next project running was 2662. with 2662 I have no problems


I had this issue with 2671 already more than once :(

Please fix it, because it totally blocks my 24h folding machine... And I do not have daily access to it, in order to repair it.


thanks
Phantom
Posts: 23
Joined: Mon Dec 03, 2007 2:14 am
Location: teammacosx.org
Contact:

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by Phantom »

Ouch! This one got me, too...

Exact same behavior. Please flag this WU and remove it from the mix.
parkut
Posts: 363
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by parkut »

I've been assigned this one also. Failed immediately.
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by GTron »

One of my dedicated folders got this WU Friday. Restarting can produce a message saying deleting and continuing but really just hangs. When flushing this WU, it gets a new WU but upon completion gets this one again -- still.

A second dedicated folder started this process this afternoon.

I hope Stanford can kill this WU server side soon!

Greg, Folding On
Freddy_Frog
Posts: 9
Joined: Sat Aug 09, 2008 11:48 am

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by Freddy_Frog »

It's back.....

[19:32:38] Working on queue slot 07 [November 17 19:32:38 UTC]
[19:32:38] + Working ...
[19:32:38]
[19:32:38] *------------------------------*
[19:32:38] Folding@Home Gromacs SMP Core
[19:32:38] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[19:32:38]
[19:32:38] Preparing to commence simulation
[19:32:38] - Ensuring status. Please wait.
[19:32:48] - Assembly optimizations manually forced on.
[19:32:48] - Not checking prior termination.
[19:32:49] - Expanded 4835226 -> 24044373 (decompressed 497.2 percent)
[19:32:49] Called DecompressByteArray: compressed_data_size=4835226 data_size=24044373, decompressed_data_size=24044373 diff=0
[19:32:49] - Digital signature verified
[19:32:49]
[19:32:49] Project: 2671 (Run 59, Clone 80, Gen 122)
[19:32:49]
[19:32:49] Assembly optimizations on if available.
[19:32:49] Entering M.D.
[19:32:57] Completed 0 out of 250000 steps (0%)
[19:32:59]
[19:32:59] Folding@home Core Shutdown: INTERRUPTED
[19:33:03] CoreStatus = 66 (102)
[19:33:03] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by GTron »

Two of my three dedicated folders still are battling this bad WU. It also doesn't have the decency to die gracefully -- had to kill the hung processes and flush it 4 times today so far.

Stanford, please, it's time to kill this one!

Greg
ikerekes
Posts: 94
Joined: Thu Nov 13, 2008 4:18 pm
Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Location: Calgary, Canada

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by ikerekes »

I got this very same WU
failed 3 times before I got a new Wu, which executing happily now :)

Code: Select all

06:09:39] 
[06:09:39] *------------------------------*
[06:09:39] Folding@Home Gromacs SMP Core
[06:09:39] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[06:09:39] 
[06:09:39] Preparing to commence simulation
[06:09:39] - Ensuring status. Please wait.
[06:09:40] Called DecompressByteArray: compressed_data_size=4835226 data_size=24044373, decompressed_data_size=24044373 diff=0
[06:09:40] - Digital signature verified
[06:09:40] 
[06:09:40] Project: 2671 (Run 59, Clone 80, Gen 122)
[06:09:40] 
[06:09:40] Assembly optimizations on if available.
[06:09:40] Entering M.D.
[06:09:50] un 59, Clone 80, Gen 122)
[06:09:50] 
[06:09:50] Entering M.D.
[06:10:03] CoreStatus = FF (255)
[06:10:03] Sending work to server
[06:10:03] Project: 2671 (Run 59, Clone 80, Gen 122)
[06:10:03] - Error: Could not get length of results file work/wuresults_04.dat
[06:10:03] - Error: Could not read unit 04 file. Removing from queue.
Image
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by kasson »

This one's stopped now. Work unit removed from circulation.
Phantom
Posts: 23
Joined: Mon Dec 03, 2007 2:14 am
Location: teammacosx.org
Contact:

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by Phantom »

Like a bad penny, this one just came back and bit me again.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2671 (Run 59, Clone 80, Gen 122)

Post by kasson »

Please post a log. The server can't transmit this work unit to you because the data files are no longer there.
It is possible that there are stale files on your client, but you might double-check that you're running the WU you think you are.
Post Reply