Page 1 of 1

Project: 2662 (Run 2, Clone 111, Gen 28)

Posted: Sat Oct 03, 2009 11:32 am
by SKeptical_Thinker
This system is a running ubuntu 9.04.

Code: Select all

[05:46:57] - Connecting to assignment server
[05:46:57] Connecting to http://assign.stanford.edu:8080/
[05:46:57] Posted data.
[05:46:57] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[05:46:57] + News From Folding@Home: Welcome to Folding@Home
[05:46:57] Loaded queue successfully.
[05:46:57] Connecting to http://171.64.65.56:8080/
[05:47:02] Posted data.
[05:47:02] Initial: 0000; - Receiving payload (expected size: 4922857)
[05:47:10] - Downloaded at ~600 kB/s
[05:47:10] - Averaged speed for that direction ~618 kB/s
[05:47:10] + Received work.
[05:47:10] Trying to send all finished work units
[05:47:10] + No unsent completed units remaining.
[05:47:10] + Closed connections
[05:47:10] 
[05:47:10] + Processing work unit
[05:47:10] Core required: FahCore_a2.exe
[05:47:10] Core found.
[05:47:10] Working on Unit 06 [October 2 05:47:10]
[05:47:10] + Working ...
[05:47:10] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 06 -checkpoint 15 -verbose -lifeline 3452 -version 602'

[05:47:10] 
[05:47:10] *------------------------------*
[05:47:10] Folding@Home Gromacs SMP Core
[05:47:10] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[05:47:10] 
[05:47:10] Preparing to commence simulation
[05:47:10] - Ensuring status. Please wait.
[05:47:11] Called DecompressByteArray: compressed_data_size=4922345 data_size=24360573, decompressed_data_size=24360573 diff=0
[05:47:11] - Digital signature verified
[05:47:11] 
[05:47:11] Project: 2662 (Run 2, Clone 111, Gen 28)
[05:47:11] 
[05:47:11] Assembly optimizations on if available.
[05:47:11] Entering M.D.
[05:47:21] Run 2, Clone 111, Gen 28)
[05:47:21] 
[05:47:21] Entering M.D.
[05:47:34] Completed 0 out of 250000 steps  (0%)
[06:05:12] Completed 2500 out of 250000 steps  (1%)
[06:22:59] Completed 5000 out of 250000 steps  (2%)
[06:40:47] Completed 7500 out of 250000 steps  (3%)
[06:42:27] - Autosending finished units...
[06:42:27] Trying to send all finished work units
[06:42:27] + No unsent completed units remaining.
[06:42:27] - Autosend completed
[06:58:42] Completed 10000 out of 250000 steps  (4%)
[07:15:13] Completed 12500 out of 250000 steps  (5%)
[07:32:39] Completed 15000 out of 250000 steps  (6%)
[07:49:48] Completed 17500 out of 250000 steps  (7%)
[08:06:46] Completed 20000 out of 250000 steps  (8%)
[08:23:43] Completed 22500 out of 250000 steps  (9%)
[08:41:10] Completed 25000 out of 250000 steps  (10%)
[08:59:00] Completed 27500 out of 250000 steps  (11%)
[09:16:01] Completed 30000 out of 250000 steps  (12%)
[09:34:00] Completed 32500 out of 250000 steps  (13%)
[09:51:59] Completed 35000 out of 250000 steps  (14%)
[10:09:50] Completed 37500 out of 250000 steps  (15%)
[10:27:08] Completed 40000 out of 250000 steps  (16%)
[10:45:03] Completed 42500 out of 250000 steps  (17%)
[11:02:38] Completed 45000 out of 250000 steps  (18%)
[11:19:22] Completed 47500 out of 250000 steps  (19%)
[11:37:10] Completed 50000 out of 250000 steps  (20%)
[11:54:40] Completed 52500 out of 250000 steps  (21%)
[12:12:17] Completed 55000 out of 250000 steps  (22%)
[12:29:22] Completed 57500 out of 250000 steps  (23%)
[12:42:27] - Autosending finished units...
[12:42:27] Trying to send all finished work units
[12:42:27] + No unsent completed units remaining.
[12:42:27] - Autosend completed
[12:46:50] Completed 60000 out of 250000 steps  (24%)
[13:03:49] Completed 62500 out of 250000 steps  (25%)
[13:21:12] Completed 65000 out of 250000 steps  (26%)
[13:38:48] Completed 67500 out of 250000 steps  (27%)
[13:55:56] Completed 70000 out of 250000 steps  (28%)
[14:13:21] Completed 72500 out of 250000 steps  (29%)
[14:30:16] Completed 75000 out of 250000 steps  (30%)
[14:47:33] Completed 77500 out of 250000 steps  (31%)
[15:04:43] Completed 80000 out of 250000 steps  (32%)
[15:21:28] Completed 82500 out of 250000 steps  (33%)
[15:38:26] Completed 85000 out of 250000 steps  (34%)
[15:56:07] Completed 87500 out of 250000 steps  (35%)
[16:13:45] Completed 90000 out of 250000 steps  (36%)
[16:31:15] Completed 92500 out of 250000 steps  (37%)
[16:48:50] Completed 95000 out of 250000 steps  (38%)
[17:06:05] Completed 97500 out of 250000 steps  (39%)
[17:23:35] Completed 100000 out of 250000 steps  (40%)
[17:40:48] Completed 102500 out of 250000 steps  (41%)
[17:58:31] Completed 105000 out of 250000 steps  (42%)
[18:15:20] Completed 107500 out of 250000 steps  (43%)
[18:32:45] Completed 110000 out of 250000 steps  (44%)
[18:42:27] - Autosending finished units...
[18:42:27] Trying to send all finished work units
[18:42:27] + No unsent completed units remaining.
[18:42:27] - Autosend completed
[18:49:46] Completed 112500 out of 250000 steps  (45%)
[19:07:26] Completed 115000 out of 250000 steps  (46%)
[19:23:56] Completed 117500 out of 250000 steps  (47%)
[19:41:02] Completed 120000 out of 250000 steps  (48%)
[19:58:55] Completed 122500 out of 250000 steps  (49%)
[20:16:38] Completed 125000 out of 250000 steps  (50%)
[20:33:12] Completed 127500 out of 250000 steps  (51%)
[20:50:20] Completed 130000 out of 250000 steps  (52%)
[21:07:25] Completed 132500 out of 250000 steps  (53%)
[21:24:00] Completed 135000 out of 250000 steps  (54%)
[21:40:49] Completed 137500 out of 250000 steps  (55%)
[21:57:49] Completed 140000 out of 250000 steps  (56%)
[22:15:21] Completed 142500 out of 250000 steps  (57%)
[22:32:25] Completed 145000 out of 250000 steps  (58%)
[22:49:34] Completed 147500 out of 250000 steps  (59%)
[23:06:32] Completed 150000 out of 250000 steps  (60%)
[23:24:02] Completed 152500 out of 250000 steps  (61%)
[23:41:21] Completed 155000 out of 250000 steps  (62%)
[23:58:14] Completed 157500 out of 250000 steps  (63%)
[00:15:34] Completed 160000 out of 250000 steps  (64%)
[00:32:34] Completed 162500 out of 250000 steps  (65%)
[00:42:27] - Autosending finished units...
[00:42:27] Trying to send all finished work units
[00:42:27] + No unsent completed units remaining.
[00:42:27] - Autosend completed
[00:49:17] Completed 165000 out of 250000 steps  (66%)
[01:06:32] Completed 167500 out of 250000 steps  (67%)
[01:23:16] Completed 170000 out of 250000 steps  (68%)
[01:39:59] Completed 172500 out of 250000 steps  (69%)
[01:57:35] Completed 175000 out of 250000 steps  (70%)
[02:14:58] Completed 177500 out of 250000 steps  (71%)
[02:32:01] Completed 180000 out of 250000 steps  (72%)
[02:49:38] Completed 182500 out of 250000 steps  (73%)
[03:06:47] Completed 185000 out of 250000 steps  (74%)
[03:24:00] Completed 187500 out of 250000 steps  (75%)
[03:40:45] Completed 190000 out of 250000 steps  (76%)
[03:57:24] Completed 192500 out of 250000 steps  (77%)
[04:14:00] Completed 195000 out of 250000 steps  (78%)
[04:20:42] CoreStatus = FF (255)
[04:20:42] Client-core communications error: ERROR 0xff
[04:20:42] Deleting current work unit & continuing...
[04:20:58] - Warning: Could not delete all work unit files (6): Core file absent
[04:20:58] Trying to send all finished work units
[04:20:58] + No unsent completed units remaining.
[04:20:58] - Preparing to get new work unit...
[04:20:58] + Attempting to get work packet

Project: 2662 (Run 2, Clone 111, Gen 28)

Posted: Sat Oct 03, 2009 12:09 pm
by SKeptical_Thinker
There are console messages that aren't appearing in the log file that may be of concern.

Code: Select all

[04:20:58] - Preparing to get new work unit...
[04:20:58] + Attempting to get work packet
[04:20:58] - Connecting to assignment server
[04:20:58] Connecting to http://assign.stanford.edu:8080/
[04:20:58] Posted data.
[04:20:58] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[04:20:58] + News From Folding@Home: Welcome to Folding@Home
[04:20:58] Loaded queue successfully.
[04:20:58] Connecting to http://171.64.65.56:8080/
[04:21:04] Posted data.
[04:21:04] Initial: 0000; - Receiving payload (expected size: 4922857)
[04:21:12] - Downloaded at ~600 kB/s
[04:21:12] - Averaged speed for that direction ~614 kB/s
[04:21:12] + Received work.
[04:21:12] + Closed connections
[04:21:17] 
[04:21:17] + Processing work unit
[04:21:17] Core required: FahCore_a2.exe
[04:21:17] Core found.
[04:21:17] Working on Unit 07 [October 3 04:21:17]
[04:21:17] + Working ...
[04:21:17] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 07 -checkpoint 15 -verbose -lifeline 3452 -version 602'

[04:21:17] 
[04:21:17] *------------------------------*
[04:21:17] Folding@Home Gromacs SMP Core
[04:21:17] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[04:21:17] 
[04:21:17] Preparing to commence simulation
[04:21:17] - Ensuring status. Please wait.
[04:21:26] - Looking at optimizations...
[04:21:26] - Working with standard loops on this execution.
[04:21:26] - Files status OK
[04:21:28] - Expanded 4922345 -> 24360573 (decompressed 494.8 percent)
[04:21:28] Called DecompressByteArray: compressed_data_size=4922345 data_size=24360573, decompressed_data_size=24360573 diff=0
[04:21:28] - Digital signature verified
[04:21:28] 
[04:21:28] Project: 2662 (Run 2, Clone 111, Gen 28)
[04:21:28] 
[04:21:28] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=mlm-work
NNODES=4, MYRANK=3, HOSTNAME=mlm-work
NNODES=4, MYRANK=1, HOSTNAME=mlm-work
NNODES=4, MYRANK=0, HOSTNAME=mlm-work
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_07.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun 'HGG in water'
7250000 steps,  14500.0 ps (continuing from step 7000000,  14000.0 ps).
[04:21:41] Completed 0 out of 250000 steps  (0%)
[04:37:58] Completed 2500 out of 250000 steps  (1%)
[04:54:13] Completed 5000 out of 250000 steps  (2%)
[05:11:25] Completed 7500 out of 250000 steps  (3%)
[05:28:23] Completed 10000 out of 250000 steps  (4%)
[05:45:55] Completed 12500 out of 250000 steps  (5%)
[06:03:36] Completed 15000 out of 250000 steps  (6%)
[06:21:12] Completed 17500 out of 250000 steps  (7%)

t = 14039.547 ps: Water molecule starting at atom 63511 can not be settled.
Check for bad contacts and/or reduce the timestep.
[06:37:27] Completed 20000 out of 250000 steps  (8%)
[06:42:27] - Autosending finished units...
[06:42:27] Trying to send all finished work units
[06:42:27] + No unsent completed units remaining.
[06:42:27] - Autosend completed
[06:54:56] Completed 22500 out of 250000 steps  (9%)
[07:12:36] Completed 25000 out of 250000 steps  (10%)
[07:30:22] Completed 27500 out of 250000 steps  (11%)
[07:46:35] Completed 30000 out of 250000 steps  (12%)

t = 14064.443 ps: Water molecule starting at atom 69355 can not be settled.
Check for bad contacts and/or reduce the timestep.
[08:03:57] Completed 32500 out of 250000 steps  (13%)

t = 14065.603 ps: Water molecule starting at atom 89266 can not be settled.
Check for bad contacts and/or reduce the timestep.
[08:21:44] Completed 35000 out of 250000 steps  (14%)
[08:39:27] Completed 37500 out of 250000 steps  (15%)
[08:56:38] Completed 40000 out of 250000 steps  (16%)
[09:14:01] Completed 42500 out of 250000 steps  (17%)
[09:31:38] Completed 45000 out of 250000 steps  (18%)
[09:48:43] Completed 47500 out of 250000 steps  (19%)
[10:06:24] Completed 50000 out of 250000 steps  (20%)
[10:23:47] Completed 52500 out of 250000 steps  (21%)
[10:41:22] Completed 55000 out of 250000 steps  (22%)
[10:58:20] Completed 57500 out of 250000 steps  (23%)

t = 14117.915 ps: Water molecule starting at atom 129787 can not be settled.
Check for bad contacts and/or reduce the timestep.
[11:15:54] Completed 60000 out of 250000 steps  (24%)
[11:32:21] Completed 62500 out of 250000 steps  (25%)
[11:50:04] Completed 65000 out of 250000 steps  (26%)

Project: 2662 (Run 2, Clone 111, Gen 28)

Posted: Sun Oct 04, 2009 12:26 pm
by SKeptical_Thinker
This is the third WU in a row to end this way, two at 100% and one at 75%, on my free standing unbuntu 9.04 machine. It has happened at least once at 100% on my unbuntu 9.04 virtual machine running in XP-32. Is there anything I can do to prevent this communication error? If not, I will have to find something else to do with these machines.

Code: Select all

Executable: ./fah6
Arguments: -smp -verbosity 9 

[13:36:33] - Ask before connecting: No
[13:36:33] - User name: SKeptical_Thinker (Team 31574)
[13:36:33] - User ID: 40702AC168F0DD5B
[13:36:33] - Machine ID: 1
[13:36:33] 
[13:36:33] Loaded queue successfully.
[13:36:33] - Autosending finished units...
[13:36:33] Trying to send all finished work units
[13:36:33] + No unsent completed units remaining.
[13:36:33] - Autosend completed
[13:36:33] 
[13:36:33] + Processing work unit
[13:36:33] Core required: FahCore_a2.exe
[13:36:33] Core found.
[13:36:33] Working on Unit 07 [October 3 13:36:33]
[13:36:33] + Working ...
[13:36:33] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 07 -checkpoint 15 -verbose -lifeline 3452 -version 602'

[13:36:34] 
[13:36:34] *------------------------------*
[13:36:34] Folding@Home Gromacs SMP Core
[13:36:34] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[13:36:34] 
[13:36:34] Preparing to commence simulation
[13:36:34] - Ensuring status. Please wait.
[13:36:34] Files status OK
[13:36:35] - Expanded 4922345 -> 24360573 (decompressed 494.8 percent)
[13:36:35] Called DecompressByteArray: compressed_data_size=4922345 data_size=24360573, decompressed_data_size=24360573 diff=0
[13:36:35] - Digital signature verified
[13:36:35] 
[13:36:35] Project: 2662 (Run 2, Clone 111, Gen 28)
[13:36:35] 
[13:36:35] Assembly optimizations on if available.
[13:36:35] Entering M.D.
[13:36:41] Using Gromacs checkpoints
[13:36:45] 
[13:36:45] Entering M.D.
[13:36:51] Using Gromacs checkpoints
[13:36:57] Resuming from checkpoint
[13:36:57] Verified work/wudata_07.log
[13:36:58] Verified work/wudata_07.trr
[13:36:58] Verified work/wudata_07.xtc
[13:36:58] Verified work/wudata_07.edr
[13:36:59] Completed 67160 out of 250000 steps  (26%)
[13:39:20] Completed 67500 out of 250000 steps  (27%)
[13:57:25] Completed 70000 out of 250000 steps  (28%)
[14:15:22] Completed 72500 out of 250000 steps  (29%)
[14:33:08] Completed 75000 out of 250000 steps  (30%)
[14:50:34] Completed 77500 out of 250000 steps  (31%)
[15:08:27] Completed 80000 out of 250000 steps  (32%)
[15:25:48] Completed 82500 out of 250000 steps  (33%)
[15:43:01] Completed 85000 out of 250000 steps  (34%)
[16:00:56] Completed 87500 out of 250000 steps  (35%)
[16:18:54] Completed 90000 out of 250000 steps  (36%)
[16:36:48] Completed 92500 out of 250000 steps  (37%)
[16:54:27] Completed 95000 out of 250000 steps  (38%)
[17:12:38] Completed 97500 out of 250000 steps  (39%)
[17:30:31] Completed 100000 out of 250000 steps  (40%)
[17:47:38] Completed 102500 out of 250000 steps  (41%)
[18:04:44] Completed 105000 out of 250000 steps  (42%)
[18:22:32] Completed 107500 out of 250000 steps  (43%)
[18:40:22] Completed 110000 out of 250000 steps  (44%)
[18:58:50] Completed 112500 out of 250000 steps  (45%)
[19:17:17] Completed 115000 out of 250000 steps  (46%)
[19:35:04] Completed 117500 out of 250000 steps  (47%)
[19:36:33] - Autosending finished units...
[19:36:33] Trying to send all finished work units
[19:36:33] + No unsent completed units remaining.
[19:36:33] - Autosend completed
[19:52:09] Completed 120000 out of 250000 steps  (48%)
[20:09:42] Completed 122500 out of 250000 steps  (49%)
[20:27:30] Completed 125000 out of 250000 steps  (50%)
[20:46:02] Completed 127500 out of 250000 steps  (51%)
[21:03:18] Completed 130000 out of 250000 steps  (52%)
[21:20:32] Completed 132500 out of 250000 steps  (53%)
[21:38:16] Completed 135000 out of 250000 steps  (54%)
[21:56:44] Completed 137500 out of 250000 steps  (55%)
[22:13:55] Completed 140000 out of 250000 steps  (56%)
[22:30:53] Completed 142500 out of 250000 steps  (57%)
[22:48:23] Completed 145000 out of 250000 steps  (58%)
[23:06:01] Completed 147500 out of 250000 steps  (59%)
[23:23:45] Completed 150000 out of 250000 steps  (60%)
[23:41:47] Completed 152500 out of 250000 steps  (61%)
[23:58:42] Completed 155000 out of 250000 steps  (62%)
[00:16:09] Completed 157500 out of 250000 steps  (63%)
[00:34:24] Completed 160000 out of 250000 steps  (64%)
[00:52:40] Completed 162500 out of 250000 steps  (65%)
[01:10:26] Completed 165000 out of 250000 steps  (66%)
[01:28:00] Completed 167500 out of 250000 steps  (67%)
[01:36:33] - Autosending finished units...
[01:36:33] Trying to send all finished work units
[01:36:33] + No unsent completed units remaining.
[01:36:33] - Autosend completed
[01:45:51] Completed 170000 out of 250000 steps  (68%)
[02:04:16] Completed 172500 out of 250000 steps  (69%)
[02:22:03] Completed 175000 out of 250000 steps  (70%)
[02:40:24] Completed 177500 out of 250000 steps  (71%)
[02:58:48] Completed 180000 out of 250000 steps  (72%)
[03:17:10] Completed 182500 out of 250000 steps  (73%)
[03:34:27] Completed 185000 out of 250000 steps  (74%)
[03:52:07] Completed 187500 out of 250000 steps  (75%)
[04:10:25] Completed 190000 out of 250000 steps  (76%)
[04:28:20] Completed 192500 out of 250000 steps  (77%)
[04:46:37] Completed 195000 out of 250000 steps  (78%)
[05:04:36] Completed 197500 out of 250000 steps  (79%)
[05:22:38] Completed 200000 out of 250000 steps  (80%)
[05:40:48] Completed 202500 out of 250000 steps  (81%)
[05:58:55] Completed 205000 out of 250000 steps  (82%)
[06:17:04] Completed 207500 out of 250000 steps  (83%)
[06:35:00] Completed 210000 out of 250000 steps  (84%)
[06:53:09] Completed 212500 out of 250000 steps  (85%)
[07:10:58] Completed 215000 out of 250000 steps  (86%)
[07:29:10] Completed 217500 out of 250000 steps  (87%)
[07:36:33] - Autosending finished units...
[07:36:33] Trying to send all finished work units
[07:36:33] + No unsent completed units remaining.
[07:36:33] - Autosend completed
[07:47:13] Completed 220000 out of 250000 steps  (88%)
[08:05:02] Completed 222500 out of 250000 steps  (89%)
[08:23:12] Completed 225000 out of 250000 steps  (90%)
[08:41:12] Completed 227500 out of 250000 steps  (91%)
[08:59:17] Completed 230000 out of 250000 steps  (92%)
[09:16:57] Completed 232500 out of 250000 steps  (93%)
[09:34:47] Completed 235000 out of 250000 steps  (94%)
[09:52:51] Completed 237500 out of 250000 steps  (95%)
[10:10:45] Completed 240000 out of 250000 steps  (96%)
[10:28:54] Completed 242500 out of 250000 steps  (97%)
[10:46:48] Completed 245000 out of 250000 steps  (98%)
[11:04:54] Completed 247500 out of 250000 steps  (99%)
[11:22:58] Completed 250000 out of 250000 steps  (100%)
[11:23:05] CoreStatus = 0 (0)
[11:23:05] Client-core communications error: ERROR 0x0
[11:23:05] Deleting current work unit & continuing...
[11:23:22] - Warning: Could not delete all work unit files (7): Core file absent
[11:23:22] Trying to send all finished work units
[11:23:22] + No unsent completed units remaining.
[11:23:22] - Preparing to get new work unit...
[11:23:22] + Attempting to get work packet

Re: Project: 2662 (Run 2, Clone 111, Gen 28)

Posted: Sun Oct 04, 2009 5:16 pm
by susato
Threads merged.

If the same WU started over again in position 8, stop the client, run it again with the -delete 08 flag (which will make it delete the current work unit then quit) then run it again normally and it will pick up a different WU.

Also, before your final restart, make sure you have the latest version of the client, and delete your FahCore_a2.exe so that the client will download a fresh copy for future processing.

Good luck & let us know how it goes.