Merged problems with projects 6903/6904, Part 1

Moderators: Site Moderators, FAHC Science Team

Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: project:6903 run:6 clone:2 gen:75

Post by Leonardo »

Willie, reports are coming in from others, including me, with unusual frame times with 6903 and 6904 work units.
Edit by Mod:
Topics merged.
Image
sick willie
Posts: 33
Joined: Sun May 25, 2008 7:40 pm

Re: project:6903 run:6 clone:2 gen:75

Post by sick willie »

Leonardo, thanks for the heads up. I hope PG does something with this problem quickly. :(
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 6904 (Run 2, Clone 14, Gen 51)

Post by toTOW »

That would be great if you could reference the PRCG of those WUs in this thread :)
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: project:6903 run:6 clone:2 gen:75

Post by Leonardo »

At last count, there were three open threads on this 6903/6904 anomaly(s). Perhaps a mod will consolidate them.
Image
KMac
Posts: 31
Joined: Thu Feb 17, 2011 6:50 pm

Re: Merged problems with projects 6903/6904

Post by KMac »

Project: 6903 (Run 0, Clone 1, Gen 62) has 500,000 steps with a TPF almost exactly double the norm.
I will complete uploading it in approx 2 hours (0500 UTC).
-alias-
Posts: 121
Joined: Sun Feb 22, 2009 1:20 pm

Re: Merged problems with projects 6903/6904

Post by -alias- »

Yes and here is another P6903 with steps of 5000 and a total of 500,000. It should have been 2500 and 250,000.

Code: Select all

[04:41:52] Project: 6903 (Run 7, Clone 8, Gen 26)
[04:41:52] 
[04:41:52] Assembly optimizations on if available.
[04:41:52] Entering M.D.
[04:41:59] Mapping NT from 24 to 24 
[04:42:03] Completed 0 out of 500000 steps  (0%)
[04:46:01] - Autosending finished units... [February 6 04:46:01 UTC]
[04:46:01] Trying to send all finished work units
[04:46:01] + No unsent completed units remaining.
[04:46:01] - Autosend completed
[04:46:32] ng M.D.
[04:46:38] Using Gromacs checkpoints
[04:46:41] Mapping NT from 24 to 24 
[04:46:49] Resuming from checkpoint
[04:47:02] Verified work/wudata_08.log
[04:47:02] Verified work/wudata_08.trr
[04:47:02] Verified work/wudata_08.xtc
[04:47:02] Verified work/wudata_08.edr
[04:47:03] Completed 295 out of 500000 steps  (0%)
[05:35:09] Completed 5000 out of 500000 steps  (1%)
[06:24:24] ***** Got an Activate signal (2)

I stopped here, and started it again.

[06:26:39] Completed 9675 out of 500000 steps  (1%)
[06:29:58] Completed 10000 out of 500000 steps  (2%)

New update

[06:29:58] Completed 10000 out of 500000 steps  (2%)
[07:21:01] Completed 15000 out of 500000 steps  (3%)

TPF is 54:22 so I will stop the project here!
This one had a TPF of 53.06 before I stopped it, and the report from HFM.NET was 25.40 after a the reset. I'll see what the real TPF is the next % and is it abnormally high I suppose I just delete the work folder and start a new project. I expect now that SF provides an explanation of what is going on?
Last edited by -alias- on Mon Feb 06, 2012 7:30 am, edited 1 time in total.
KMac
Posts: 31
Joined: Thu Feb 17, 2011 6:50 pm

Re: Merged problems with projects 6903/6904

Post by KMac »

Project: 6903 (Run 0, Clone 1, Gen 62) would not upload [Compressed data size (378447008) exceeds limit. and I/O Error CoreStatus = 75 (117)]. It then restarted the same unit.

Code: Select all

--- Opening Log file [February 4 03:12:38 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/KMac/fah
Executable: ./fah6
Arguments: -smp -bigadv 

[03:12:38] - Ask before connecting: No
[03:12:38] - User name: KMac (Team 33)
[03:12:38] - User ID: ***************
[03:12:38] - Machine ID: 1
[03:12:38] 
[03:12:38] Loaded queue successfully.
[03:12:38] - Preparing to get new work unit...
[03:12:38] Cleaning up work directory
[03:12:38] + Attempting to get work packet
[03:12:38] Passkey found
[03:12:38] - Connecting to assignment server
[03:12:39] - Successful: assigned to (130.237.232.237).
[03:12:39] + News From Folding@Home: Welcome to Folding@Home
[03:12:39] Loaded queue successfully.
[03:13:26] + Closed connections
[03:13:26] 
[03:13:26] + Processing work unit
[03:13:26] Core required: FahCore_a5.exe
[03:13:26] Core found.
[03:13:26] Working on queue slot 05 [February 4 03:13:26 UTC]
[03:13:26] + Working ...
thekraken: The Kraken 0.6 (compiled Sat Jan 28 02:00:51 CST 2012 by kevin@SM-H8QGi-F)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 2797
thekraken: Logging to thekraken.log
[03:13:26] 
[03:13:26] *------------------------------*
[03:13:26] Folding@Home Gromacs SMP Core
[03:13:26] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[03:13:26] 
[03:13:26] Preparing to commence simulation
[03:13:26] - Looking at optimizations...
[03:13:26] - Created dyn
[03:13:26] - Files status OK
[03:13:33] - Expanded 57232293 -> 71846524 (decompressed 50.4 percent)
[03:13:33] Called DecompressByteArray: compressed_data_size=57232293 data_size=71846524, decompressed_data_size=71846524 diff=0
[03:13:34] - Digital signature verified
[03:13:34] 
[03:13:34] Project: 6903 (Run 0, Clone 1, Gen 62)
[03:13:34] 
[03:13:34] Assembly optimizations on if available.
[03:13:34] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_05.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[03:13:43] Mapping NT from 48 to 48 
Starting 48 threads
Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
15750000 steps,  63000.0 ps (continuing from step 15250000,  61000.0 ps).
[03:13:48] Completed 0 out of 500000 steps  (0%)
[03:30:53] ng M.D.
[03:30:59] Using Gromacs checkpoints
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

[03:31:08] Mapping NT from 48 to 48 
Reading file work/wudata_05.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
Starting 48 threads

Reading checkpoint file work/wudata_05.cpt generated: Fri Feb  3 21:28:49 2012


Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
15750000 steps,  63000.0 ps (continuing from step 15252400,  61009.6 ps).
[03:31:30] Resuming from checkpoint
[03:32:36] Verified work/wudata_05.log
[03:32:37] Verified work/wudata_05.trr
[03:32:37] Verified work/wudata_05.xtc
[03:32:37] Verified work/wudata_05.edr
[03:32:38] Completed 2400 out of 500000 steps  (0%)
[04:17:29] Completed 5000 out of 500000 steps  (1%)
[04:48:37] Completed 10000 out of 500000 steps  (2%)
[05:19:46] Completed 15000 out of 500000 steps  (3%)

NOTE: Turning on dynamic load balancing

[05:49:32] Completed 20000 out of 500000 steps  (4%)
[06:18:44] Completed 25000 out of 500000 steps  (5%)
[06:47:58] Completed 30000 out of 500000 steps  (6%)
[07:17:11] Completed 35000 out of 500000 steps  (7%)
[07:46:19] Completed 40000 out of 500000 steps  (8%)
[08:15:31] Completed 45000 out of 500000 steps  (9%)
[08:44:44] Completed 50000 out of 500000 steps  (10%)
[09:13:58] Completed 55000 out of 500000 steps  (11%)
[09:43:11] Completed 60000 out of 500000 steps  (12%)
[10:12:25] Completed 65000 out of 500000 steps  (13%)
[10:41:38] Completed 70000 out of 500000 steps  (14%)
[11:10:51] Completed 75000 out of 500000 steps  (15%)
[11:40:04] Completed 80000 out of 500000 steps  (16%)
[12:09:16] Completed 85000 out of 500000 steps  (17%)
[12:38:30] Completed 90000 out of 500000 steps  (18%)
[13:07:44] Completed 95000 out of 500000 steps  (19%)
[13:36:59] Completed 100000 out of 500000 steps  (20%)
[14:06:17] Completed 105000 out of 500000 steps  (21%)
[14:35:32] Completed 110000 out of 500000 steps  (22%)
[15:04:45] Completed 115000 out of 500000 steps  (23%)
[15:33:59] Completed 120000 out of 500000 steps  (24%)
[16:03:13] Completed 125000 out of 500000 steps  (25%)
[16:32:27] Completed 130000 out of 500000 steps  (26%)
[17:01:41] Completed 135000 out of 500000 steps  (27%)
[17:30:50] Completed 140000 out of 500000 steps  (28%)
[18:00:06] Completed 145000 out of 500000 steps  (29%)
[18:29:21] Completed 150000 out of 500000 steps  (30%)
[18:58:38] Completed 155000 out of 500000 steps  (31%)
[19:27:56] Completed 160000 out of 500000 steps  (32%)
[19:57:12] Completed 165000 out of 500000 steps  (33%)
[20:26:29] Completed 170000 out of 500000 steps  (34%)
[20:55:43] Completed 175000 out of 500000 steps  (35%)
[21:24:59] Completed 180000 out of 500000 steps  (36%)
[21:54:13] Completed 185000 out of 500000 steps  (37%)
[22:23:28] Completed 190000 out of 500000 steps  (38%)
[22:52:46] Completed 195000 out of 500000 steps  (39%)
[23:22:04] Completed 200000 out of 500000 steps  (40%)
[23:51:22] Completed 205000 out of 500000 steps  (41%)
[00:20:38] Completed 210000 out of 500000 steps  (42%)
[00:49:55] Completed 215000 out of 500000 steps  (43%)
[01:19:13] Completed 220000 out of 500000 steps  (44%)
[01:48:30] Completed 225000 out of 500000 steps  (45%)
[02:17:48] Completed 230000 out of 500000 steps  (46%)
[02:47:05] Completed 235000 out of 500000 steps  (47%)
[03:16:16] Completed 240000 out of 500000 steps  (48%)
[03:45:33] Completed 245000 out of 500000 steps  (49%)
[04:14:51] Completed 250000 out of 500000 steps  (50%)
[04:44:13] Completed 255000 out of 500000 steps  (51%)
[05:13:31] Completed 260000 out of 500000 steps  (52%)
[05:42:53] Completed 265000 out of 500000 steps  (53%)
[06:12:11] Completed 270000 out of 500000 steps  (54%)
[06:41:29] Completed 275000 out of 500000 steps  (55%)
[07:10:46] Completed 280000 out of 500000 steps  (56%)
[07:40:02] Completed 285000 out of 500000 steps  (57%)
[08:09:19] Completed 290000 out of 500000 steps  (58%)
[08:38:37] Completed 295000 out of 500000 steps  (59%)
[09:07:54] Completed 300000 out of 500000 steps  (60%)
[09:37:11] Completed 305000 out of 500000 steps  (61%)
[10:06:28] Completed 310000 out of 500000 steps  (62%)
[10:35:44] Completed 315000 out of 500000 steps  (63%)
[11:05:02] Completed 320000 out of 500000 steps  (64%)
[11:34:18] Completed 325000 out of 500000 steps  (65%)
[12:03:34] Completed 330000 out of 500000 steps  (66%)
[12:32:52] Completed 335000 out of 500000 steps  (67%)
[13:02:09] Completed 340000 out of 500000 steps  (68%)
[13:31:20] Completed 345000 out of 500000 steps  (69%)
[14:00:37] Completed 350000 out of 500000 steps  (70%)
[14:29:53] Completed 355000 out of 500000 steps  (71%)
[14:59:09] Completed 360000 out of 500000 steps  (72%)
[15:28:25] Completed 365000 out of 500000 steps  (73%)
[15:57:41] Completed 370000 out of 500000 steps  (74%)
[16:26:58] Completed 375000 out of 500000 steps  (75%)
[16:56:15] Completed 380000 out of 500000 steps  (76%)
[17:25:31] Completed 385000 out of 500000 steps  (77%)
[17:54:46] Completed 390000 out of 500000 steps  (78%)
[18:24:02] Completed 395000 out of 500000 steps  (79%)
[18:53:18] Completed 400000 out of 500000 steps  (80%)
[19:22:34] Completed 405000 out of 500000 steps  (81%)
[19:51:50] Completed 410000 out of 500000 steps  (82%)
[20:21:07] Completed 415000 out of 500000 steps  (83%)
[20:50:36] Completed 420000 out of 500000 steps  (84%)
[21:19:51] Completed 425000 out of 500000 steps  (85%)
[21:49:07] Completed 430000 out of 500000 steps  (86%)
[22:18:23] Completed 435000 out of 500000 steps  (87%)
[22:47:38] Completed 440000 out of 500000 steps  (88%)
[23:16:53] Completed 445000 out of 500000 steps  (89%)
[23:46:02] Completed 450000 out of 500000 steps  (90%)
[00:15:18] Completed 455000 out of 500000 steps  (91%)
[00:44:34] Completed 460000 out of 500000 steps  (92%)
[01:13:48] Completed 465000 out of 500000 steps  (93%)
[01:43:05] Completed 470000 out of 500000 steps  (94%)
[02:12:22] Completed 475000 out of 500000 steps  (95%)
[02:41:39] Completed 480000 out of 500000 steps  (96%)
[03:11:47] Completed 485000 out of 500000 steps  (97%)
[03:41:05] Completed 490000 out of 500000 steps  (98%)
[04:10:24] Completed 495000 out of 500000 steps  (99%)
[04:40:00] Completed 500000 out of 500000 steps  (100%)

Writing final coordinates.

 Average load imbalance: 0.2 %
 Part of the total run time spent waiting due to load imbalance: 0.1 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %


	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time: 176934.061 176934.061    100.0
                       2d01h08:54
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   2455.266    129.150      0.972     24.693

Thanx for Using GROMACS - Have a Nice Day

[04:40:24] DynamicWrapper: Finished Work Unit: sleep=10000
[04:40:34] 
[04:40:34] Finished Work Unit:
[04:40:34] - Reading up to 182433744 from "work/wudata_05.trr": Read 182433744
[04:40:36] trr file hash check passed.
[04:40:36] - Reading up to 207686544 from "work/wudata_05.xtc": Read 207686544
[04:40:37] xtc file hash check passed.
[04:40:37] edr file hash check passed.
[04:40:37] logfile size: 396661
[04:40:37] Leaving Run
[04:40:41] - Writing 390860941 bytes of core data to disk...
[04:42:46] Done: 390860429 -> 378447008 (compressed to 8.9 percent)
[04:42:46] -  Compressed data size (378447008) exceeds limit.
[04:42:46] - Error: Could not write out results to file
[04:42:46] - Shutting down core
[04:42:46] 
[04:42:46] Folding@home Core Shutdown: FILE_IO_ERROR
[04:42:47] CoreStatus = 75 (117)
[04:42:47] Error opening or reading from a file.
[04:42:47] Deleting current work unit & continuing...
thekraken: The Kraken 0.6 (compiled Sat Jan 28 02:00:51 CST 2012 by kevin@SM-H8QGi-F)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 3829
thekraken: Logging to thekraken.log
[04:43:12] - Preparing to get new work unit...
[04:43:12] Cleaning up work directory
[04:43:12] + Attempting to get work packet
[04:43:12] Passkey found
[04:43:12] - Connecting to assignment server
[04:43:13] - Successful: assigned to (130.237.232.237).
[04:43:13] + News From Folding@Home: Welcome to Folding@Home
[04:43:13] Loaded queue successfully.
[04:45:03] + Closed connections
[04:45:08] 
[04:45:08] + Processing work unit
[04:45:08] Core required: FahCore_a5.exe
[04:45:08] Core found.
[04:45:08] Working on queue slot 06 [February 6 04:45:08 UTC]
[04:45:08] + Working ...
thekraken: The Kraken 0.6 (compiled Sat Jan 28 02:00:51 CST 2012 by kevin@SM-H8QGi-F)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 3833
thekraken: Logging to thekraken.log
[04:45:08] 
[04:45:08] *------------------------------*
[04:45:08] Folding@Home Gromacs SMP Core
[04:45:08] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[04:45:08] 
[04:45:08] Preparing to commence simulation
[04:45:08] - Looking at optimizations...
[04:45:08] - Created dyn
[04:45:08] - Files status OK
[04:45:15] - Expanded 57232293 -> 71846524 (decompressed 50.4 percent)
[04:45:15] Called DecompressByteArray: compressed_data_size=57232293 data_size=71846524, decompressed_data_size=71846524 diff=0
[04:45:16] - Digital signature verified
[04:45:16] 
[04:45:16] Project: 6903 (Run 0, Clone 1, Gen 62)
[04:45:16] 
[04:45:16] Assembly optimizations on if available.
[04:45:16] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_06.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[04:45:25] Mapping NT from 48 to 48 
Starting 48 threads
Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
15750000 steps,  63000.0 ps (continuing from step 15250000,  61000.0 ps).
[04:45:30] Completed 0 out of 500000 steps  (0%)
[05:02:36] ng M.D.
[05:02:42] Using Gromacs checkpoints
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

[05:02:50] Mapping NT from 48 to 48 
Reading file work/wudata_06.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
Starting 48 threads

Reading checkpoint file work/wudata_06.cpt generated: Sun Feb  5 23:00:33 2012


Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
15750000 steps,  63000.0 ps (continuing from step 15252405,  61009.6 ps).
[05:04:17] Resuming from checkpoint
[05:05:05] Verified work/wudata_06.log
[05:05:06] Verified work/wudata_06.trr
[05:05:06] Verified work/wudata_06.xtc
[05:05:06] Verified work/wudata_06.edr
[05:05:07] Completed 2405 out of 500000 steps  (0%)
[05:36:42] Completed 5000 out of 500000 steps  (1%)

NOTE: Turning on dynamic load balancing

[06:07:59] Completed 10000 out of 500000 steps  (2%)
[06:37:12] Completed 15000 out of 500000 steps  (3%)
[07:06:44] Completed 20000 out of 500000 steps  (4%)
KMac
Posts: 31
Joined: Thu Feb 17, 2011 6:50 pm

Re: Merged problems with projects 6903/6904

Post by KMac »

-alias- wrote:Yes and here is another P6903 with steps of 5000 and a total of 500,000. It should have been 2500 and 250,000.
I just received the same unit 6903(7,8,26) and can confirm this unit is 500000 steps.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Merged problems with projects 6903/6904

Post by kasson »

Thanks--we'll take a look.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Merged problems with projects 6903/6904

Post by kasson »

We have identified ~30 WU's that have too many steps. These appear related to an event on Jan 16 where some incoming returns were not written properly (the return was credited, but some of the data wasn't written). We are manually re-running those WU's and will re-generate the new ones as soon as we can.
-alias-
Posts: 121
Joined: Sun Feb 22, 2009 1:20 pm

Re: Merged problems with projects 6903/6904

Post by -alias- »

It sounds like my usual luck, I got down 2 of a total of 30 WUs with errors, out of millions of computers that fold proteins, so maybe I should play the lottery this week. Thank you Kasson, now we know what the cause for all this step was.
KMac
Posts: 31
Joined: Thu Feb 17, 2011 6:50 pm

Re: Merged problems with projects 6903/6904

Post by KMac »

The issue has not been resolved as these units are still being assigned. Project: 6904 (Run 0, Clone 31, Gen 39) has 10,000,000 steps and is the fourth unit of this type that I have been assigned.
I cannot give a TPF as I stopped the unit after 11 hours without a frame advancement on a 48 core machine.

Code: Select all

--- Opening Log file [February 4 03:12:38 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

    [09:28:58] + Attempting to send results [February 8 09:28:58 UTC]
[09:44:33] + Results successfully sent
[09:44:33] Thank you for your contribution to Folding@Home.
[09:44:33] + Number of Units Completed: 8

thekraken: The Kraken 0.6 (compiled Sat Jan 28 02:00:51 CST 2012 by kevin@SM-H8QGi-F)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 3385
thekraken: Logging to thekraken.log
[09:44:40] - Preparing to get new work unit...
[09:44:40] Cleaning up work directory
[09:44:42] + Attempting to get work packet
[09:44:42] Passkey found
[09:44:42] - Connecting to assignment server
[09:44:42] - Successful: assigned to (130.237.232.237).
[09:44:42] + News From Folding@Home: Welcome to Folding@Home
[09:44:42] Loaded queue successfully.
[09:46:51] + Closed connections
[09:46:51] 
[09:46:51] + Processing work unit
[09:46:51] Core required: FahCore_a5.exe
[09:46:51] Core found.
[09:46:51] Working on queue slot 00 [February 8 09:46:51 UTC]
[09:46:51] + Working ...
thekraken: The Kraken 0.6 (compiled Sat Jan 28 02:00:51 CST 2012 by kevin@SM-H8QGi-F)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 3389
thekraken: Logging to thekraken.log
[09:46:51] 
[09:46:51] *------------------------------*
[09:46:51] Folding@Home Gromacs SMP Core
[09:46:51] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[09:46:51] 
[09:46:51] Preparing to commence simulation
[09:46:51] - Looking at optimizations...
[09:46:51] - Created dyn
[09:46:51] - Files status OK
[09:46:57] - Expanded 46502903 -> 71843392 (decompressed 62.1 percent)
[09:46:57] Called DecompressByteArray: compressed_data_size=46502903 data_size=71843392, decompressed_data_size=71843392 diff=0
[09:46:58] - Digital signature verified
[09:46:58] 
[09:46:58] Project: 6904 (Run 0, Clone 31, Gen 39)
[09:46:58] 
[09:46:58] Assembly optimizations on if available.
[09:46:58] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_00.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[09:47:07] Mapping NT from 48 to 48 
Starting 48 threads
Making 2D domain decomposition 8 x 6 x 1

WARNING: This run will generate roughly 7421 Mb of data

starting mdrun 'Overlay'
10000000 steps,  40000.0 ps.
[09:47:13] Completed 0 out of 10000000 steps  (0%)
[10:04:28]  M.D.
[10:04:34] Using Gromacs checkpoints
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

[10:04:42] Mapping NT from 48 to 48 
Reading file work/wudata_00.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
Starting 48 threads

Reading checkpoint file work/wudata_00.cpt generated: Wed Feb  8 04:02:16 2012


Making 2D domain decomposition 8 x 6 x 1

WARNING: This run will generate roughly 7351 Mb of data

starting mdrun 'Overlay'
10000000 steps,  40000.0 ps (continuing from step 1790,      7.2 ps).
[10:07:05] Resuming from checkpoint
[10:07:11] Verified work/wudata_00.log
[10:07:11] Verified work/wudata_00.trr
[10:07:11] Verified work/wudata_00.xtc
[10:07:11] Verified work/wudata_00.edr
[10:07:12] Completed 1790 out of 10000000 steps  (0%)

NOTE: Turning on dynamic load balancing

Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

kasson wrote:We have identified ~30 WU's that have too many steps. These appear related to an event on Jan 16 where some incoming returns were not written properly (the return was credited, but some of the data wasn't written). We are manually re-running those WU's and will re-generate the new ones as soon as we can.
It looks like there may still be a couple of these out there. They were repotted showing up again yesterday over at the [H].
http://hardforum.com/showpost.php?p=103 ... stcount=22
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Amaruk
Posts: 254
Joined: Fri Jun 20, 2008 3:57 am
Location: Watching from the Woods

Project: 6903 (Run 6, Clone 0, Gen 72)

Post by Amaruk »

Yep, they're definitely still being issued.

Code: Select all

[07:04:04] Project: 6903 (Run 6, Clone 0, Gen 72)
[07:04:04] 
[07:04:04] Assembly optimizations on if available.
[07:04:04] Entering M.D.
[07:04:13] Mapping NT from 48 to 48 
[07:04:19] Completed 0 out of 500000 steps  (0%)
[07:14:00] - Autosending finished units... [February 9 07:14:00 UTC]
[07:14:00] Trying to send all finished work units
[07:14:00] + No unsent completed units remaining.
[07:14:00] - Autosend completed
[07:21:34] ng M.D.
[07:21:40] Using Gromacs checkpoints
[07:21:49] Mapping NT from 48 to 48 
[07:23:23] Resuming from checkpoint
[07:23:55] Verified work/wudata_03.log
[07:23:55] Verified work/wudata_03.trr
[07:23:55] Verified work/wudata_03.xtc
[07:23:55] Verified work/wudata_03.edr
[07:23:56] Completed 2360 out of 500000 steps  (0%)
What I want to know is, will this issue fix itself? That is, if I finish this WU, will the next one (P6903 R6 C0 G73) be normal sized?

If it is fixed then I have no problem finishing it, but if not I'd rather dump it now instead of passing it on.
Image
KMac
Posts: 31
Joined: Thu Feb 17, 2011 6:50 pm

Re: Merged problems with projects 6903/6904

Post by KMac »

No, it will not fix itself. It will complete, realize the results file is too large to send, delete the result and restart the same unit.
You must delete the work folder and machinedata.dat to continue.
Post Reply