Page 1 of 1

Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Wed Apr 16, 2008 6:19 pm
by road-runner

Code: Select all

Another bad one I guess?

[code][07:58:31] Project: 3062 (Run 1, Clone 37, Gen 36)
[07:58:31] 
[07:58:31] Assembly optimizations on if available.
[07:58:31] Entering M.D.
[07:58:48]  on if available.
[07:58:48] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=david-desktop
NNODES=4, MYRANK=3, HOSTNAME=david-desktop
NNODES=4, MYRANK=0, HOSTNAME=david-desktop
NNODES=4, MYRANK=1, HOSTNAME=david-desktop
NODEID=3 argc=15
NODEID=2 argc=15
NODEID=1 argc=15
NODEID=0 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

starting mdrun 'p3062_lambda5_99sb'
5000000 steps,  10000.0 ps.

[07:58:55] 062_lambda5_99sb
[07:58:55] Writing local files
[07:58:55] Extra SSE bExtra SSEWriting local files
[07:58:55] Completed 0 out of 5000000 steps  (0 percent)
[08:10:55] Writing local files
[08:10:55] Completed 50000 out of 5000000 steps  (1 percent)
[08:22:58] Writing local files
[08:22:58] Completed 100000 out of 5000000 steps  (2 percent)
[08:35:03] Writing local files
[08:35:03] Completed 150000 out of 5000000 steps  (3 percent)
[08:47:07] Writing local files
[08:47:07] Completed 200000 out of 5000000 steps  (4 percent)
[08:59:07] Writing local files
[08:59:07] Completed 250000 out of 5000000 steps  (5 percent)
[09:11:10] Writing local files
[09:11:10] Completed 300000 out of 5000000 steps  (6 percent)
[09:23:13] Writing local files
[09:23:13] Completed 350000 out of 5000000 steps  (7 percent)
[09:35:17] Writing local files
[09:35:17] Completed 400000 out of 5000000 steps  (8 percent)
[09:47:21] Writing local files
[09:47:21] Completed 450000 out of 5000000 steps  (9 percent)
[09:59:21] Writing local files
[09:59:21] Completed 500000 out of 5000000 steps  (10 percent)
[10:11:25] Writing local files
[10:11:25] Completed 550000 out of 5000000 steps  (11 percent)
[10:23:28] Writing local files
[10:23:28] Completed 600000 out of 5000000 steps  (12 percent)
[10:35:32] Writing local files
[10:35:32] Completed 650000 out of 5000000 steps  (13 percent)
[10:47:35] Writing local files
[10:47:35] Completed 700000 out of 5000000 steps  (14 percent)
[10:59:38] Writing local files
[10:59:38] Completed 750000 out of 5000000 steps  (15 percent)
[11:11:42] Writing local files
[11:11:43] Completed 800000 out of 5000000 steps  (16 percent)
[11:23:42] Writing local files
[11:23:42] Completed 850000 out of 5000000 steps  (17 percent)
[11:35:48] Writing local files
[11:35:48] Completed 900000 out of 5000000 steps  (18 percent)
[11:47:51] Writing local files
[11:47:51] Completed 950000 out of 5000000 steps  (19 percent)
[11:55:42] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[11:55:46] CoreStatus = 0 (0)
[11:55:46] Client-core communications error: ERROR 0x0
[11:55:46] Deleting current work unit & continuing...
[0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[12:00:07] - Preparing to get new work unit...
[12:00:07] + Attempting to get work packet
[12:00:07] - Connecting to assignment server
[12:00:13] - Successful: assigned to (171.64.65.63).
[12:00:13] + News From Folding@Home: Welcome to Folding@Home
[12:00:13] Loaded queue successfully.
[12:00:41] + Closed connections
[12:00:46] 
[12:00:46] + Processing work unit
[12:00:46] Core required: FahCore_a1.exe
[12:00:46] Core found.
[12:00:46] Working on Unit 04 [April 16 12:00:46]
[12:00:46] + Working ...
[12:00:46] 
[12:00:46] *------------------------------*
[12:00:46] Folding@Home Gromacs SMP Core
[12:00:46] Version 1.74 (November 27, 2006)
[12:00:46] 
[12:00:46] Preparing to commence simulation
[12:00:46] - Ensuring status. Please wait.
[12:01:03] - Assembly optimizations manually forced on.
[12:01:03] - Not checking prior termination.
[12:01:03] - Expanded 608619 -> 3266045 (decompressed 536.6 percent)
[12:01:03] - Starting from initial work packet
[12:01:03] 
[12:01:03] Project: 3062 (Run 1, Clone 37, Gen 36)
[12:01:03] 
[12:01:03] Assembly optimizations on if available.
[12:01:03] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=david-desktop
NNODES=4, MYRANK=1, HOSTNAME=david-desktop
NNODES=4, MYRANK=3, HOSTNAME=david-desktop
NNODES=4, MYRANK=0, HOSTNAME=david-desktop
NODEID=3 argc=15
NODEID=1 argc=15
NODEID=2 argc=15
NODEID=0 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

[12:01:09] Rejecting checkpoint
starting mdrun 'p3062_lambda5_99sb'
5000000 steps,  10000.0 ps.

[12:01:10] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[12:01:10] 
[12:01:10] Extra SSE boost OK.
[12:01:10] Writing local files
[12:01:10] Completed 0 out of 5000000 steps  (0 percent)
[12:13:12] Writing local files
[12:13:12] Completed 50000 out of 5000000 steps  (1 percent)
[12:25:13] Writing local files
[12:25:13] Completed 100000 out of 5000000 steps  (2 percent)
[12:37:15] Writing local files
[12:37:15] Completed 150000 out of 5000000 steps  (3 percent)
[12:49:19] Writing local files
[12:49:19] Completed 200000 out of 5000000 steps  (4 percent)
[13:01:21] Writing local files
[13:01:21] Completed 250000 out of 5000000 steps  (5 percent)
[13:13:24] Writing local files
[13:13:24] Completed 300000 out of 5000000 steps  (6 percent)
[13:25:26] Writing local files
[13:25:26] Completed 350000 out of 5000000 steps  (7 percent)
[13:37:31] Writing local files
[13:37:31] Completed 400000 out of 5000000 steps  (8 percent)
[13:49:37] Writing local files
[13:49:37] Completed 450000 out of 5000000 steps  (9 percent)
[14:01:40] Writing local files
[14:01:40] Completed 500000 out of 5000000 steps  (10 percent)
[14:13:42] Writing local files
[14:13:42] Completed 550000 out of 5000000 steps  (11 percent)
[14:25:40] Writing local files
[14:25:40] Completed 600000 out of 5000000 steps  (12 percent)
[14:37:45] Writing local files
[14:37:45] Completed 650000 out of 5000000 steps  (13 percent)
[14:49:50] Writing local files
[14:49:50] Completed 700000 out of 5000000 steps  (14 percent)
[15:01:55] Writing local files
[15:01:55] Completed 750000 out of 5000000 steps  (15 percent)
[15:13:58] Writing local files
[15:13:58] Completed 800000 out of 5000000 steps  (16 percent)
[15:26:02] Writing local files
[15:26:02] Completed 850000 out of 5000000 steps  (17 percent)
[15:38:09] Writing local files
[15:38:09] Completed 900000 out of 5000000 steps  (18 percent)
[15:50:12] Writing local files
[15:50:12] Completed 950000 out of 5000000 steps  (19 percent)
[15:58:05] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[15:58:09] CoreStatus = 0 (0)
[15:58:09] Client-core communications error: ERROR 0x0
[15:58:09] Deleting current work unit & continuing...
[0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[16:02:30] - Preparing to get new work unit...
[16:02:30] + Attempting to get work packet
[16:02:30] - Connecting to assignment server
[16:02:31] - Successful: assigned to (171.64.65.63).
[16:02:31] + News From Folding@Home: Welcome to Folding@Home
[16:02:31] Loaded queue successfully.
[16:02:33] + Closed connections
[16:02:38] 
[16:02:38] + Processing work unit
[16:02:38] Core required: FahCore_a1.exe
[16:02:38] Core found.
[16:02:38] Working on Unit 05 [April 16 16:02:38]
[16:02:38] + Working ...
[16:02:38] 
[16:02:38] *------------------------------*
[16:02:38] Folding@Home Gromacs SMP Core
[16:02:38] Version 1.74 (November 27, 2006)
[16:02:38] 
[16:02:38] Preparing to commence simulation
[16:02:38] - Ensuring status. Please wait.
[16:02:55] - Assembly optimizations manually forced on.
[16:02:55] - Not checking prior termination.
[16:02:55] - Expanded 608619 -> 3266045 (decompressed 536.6 percent)
[16:02:55] - Starting from initial work packet
[16:02:55] 
[16:02:55] Project: 3062 (Run 1, Clone 37, Gen 36)
[16:02:55] 
[16:02:55] Assembly optimizations on if available.
[16:02:55] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=david-desktop
NNODES=4, MYRANK=3, HOSTNAME=david-desktop
NNODES=4, MYRANK=0, HOSTNAME=david-desktop
NNODES=4, MYRANK=1, HOSTNAME=david-desktop
NODEID=3 argc=15
NODEID=2 argc=15
NODEID=1 argc=15
NODEID=0 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

[16:03:01] Rejecting checkpoint
starting mdrun 'p3062_lambda5_99sb'
5000000 steps,  10000.0 ps.

[16:03:02] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[16:03:02] 
[16:03:02] Extra SSE boost OK.
[16:03:02] Writing local files
[16:03:02] Completed 0 out of 5000000 steps  (0 percent)
[16:14:59] Writing local files
[16:14:59] Completed 50000 out of 5000000 steps  (1 percent)
[16:27:02] Writing local files
[16:27:02] Completed 100000 out of 5000000 steps  (2 percent)
[16:39:06] Writing local files
[16:39:06] Completed 150000 out of 5000000 steps  (3 percent)
[16:51:07] Writing local files
[16:51:07] Completed 200000 out of 5000000 steps  (4 percent)
[17:03:10] Writing local files
[17:03:10] Completed 250000 out of 5000000 steps  (5 percent)
[17:15:11] Writing local files
[17:15:11] Completed 300000 out of 5000000 steps  (6 percent)
[17:27:11] Writing local files
[17:27:11] Completed 350000 out of 5000000 steps  (7 percent)
[17:39:11] Writing local files
[17:39:11] Completed 400000 out of 5000000 steps  (8 percent)
[17:51:11] Writing local files
[17:51:11] Completed 450000 out of 5000000 steps  (9 percent)
[18:03:12] Writing local files
[18:03:12] Completed 500000 out of 5000000 steps  (10 percent)

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Wed Apr 16, 2008 6:47 pm
by 7im
Most likely, nothing in the stats from anyone.

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Wed Apr 16, 2008 6:55 pm
by 7im
IMO, Prime is lame. Try StressCPU v2.0, it pushes much harder, and is based on the same Gromacs code used by the fah client. Nothing better for stability testing to run fah ;) http://www.gromacs.org/component/option ... Itemid,26/

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Wed Apr 16, 2008 9:59 pm
by road-runner
My machines are stable and I know about memtest, OCCT, and prime 95. I was just reporting it. I will not report anymore that are bad I will just delete them...

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Wed Apr 16, 2008 10:16 pm
by Flathead74
road-runner wrote:My machines are stable and I know about memtest, OCCT, and prime 95. I was just reporting it. I will not report anymore that are bad I will just delete them...
road-runner, don't stop reporting bad WUs, just put them over here:
viewforum.php?f=19

That way we can more easily see if other folks are also having problems with a specific WU. :)


Thread Moved. -7im

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Thu Apr 17, 2008 3:13 pm
by MDCRL
hey all...

DL 3062 @ 08:57 this morning, folding on Phenom 9600 @ 2.3Ghz BE/1G DDR2... all is going well so far w/ this WU, but...

I checked it in FahMon and in unitinfo file and things are conflicting - wonder if it has anything to do w/ the other problems perople are having.

unitinfo:

Current Work Unit
-----------------
Name: p3062_lambda5_99sb
Tag: P3062R5C98G15
Download time: April 17 08:57:11
Due time: April 20 20:57:11
Progress: 24% [||________]


FAH Mon:

DL: 1d 08h 12mn ago
Pref Deadline: 8h 12mn ago
Final Deadline:In 1d 15h 47mn


This is what the script looks like: are the parts in red correct or typo's?

[08:57:08] - Preparing to get new work unit...
[08:57:08] + Attempting to get work packet
[08:57:08] - Connecting to assignment server
[08:57:08] - Successful: assigned to (171.64.65.63).
[08:57:08] + News From Folding@Home: Welcome to Folding@Home
[08:57:08] Loaded queue successfully.
[08:57:11] + Closed connections
[08:57:11]
[08:57:11] + Processing work unit
[08:57:11] Core required: FahCore_a1.exe
[08:57:11] Core found.
[08:57:11] Working on Unit 05 [April 17 08:57:11]
[08:57:11] + Working ...
[08:57:11]
[08:57:11] *------------------------------*
[08:57:11] Folding@Home Gromacs SMP Core
[08:57:11] Version 1.74 (March 10, 2007)
[08:57:11]
[08:57:11] Preparing to commence simulation
[08:57:11] - Ensuring status. Please wait.
[08:57:12] - Starting from initial work packet
[08:57:12]
[08:57:12] Project: 3062 (Run 5, Clone 98, Gen 15)
[08:57:12]
[08:57:12] Assembly optimizations on if available.
[08:57:12] Entering M.D.
[08:57:29] ial work pa- Starting from initial work packet (should this be initial work packet)
[08:57:29]
[08:57:29] Project: 3Entering M.D.
[08:57:29] one 98, Gen 15)
[08:57:29]
[08:57:29] Entering M.D.
[08:57:35] Rejecting checkpoint
[08:57:35] a SSE boost OK.
[08:57:35] ambda5_99sbExtra SSE boost OK. (should this be lambda)

[08:57:35]
[08:57:35] Extra SSE boost OK.[/b]


What would cause FahMon to read the wrong info?
- I have had this happen on other WU's, however they are showing much more skewed times
- so I just figured that was an issue w/ daylight saving or something, or just a bug w/ FahMon even after the new update

I'll let you know if I have any problems w/ it today - it should be finished about 17 hrs....

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Thu Apr 24, 2008 5:03 pm
by MDCRL
BTW.... ran two of these incl. last post WU.... except for the FAhMon issues - both went through w/out any problems :D

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Thu Apr 24, 2008 11:48 pm
by bruce
Well, what I see in FAHlog.txt
[08:57:08] - Successful: assigned to (171.64.65.63).
[08:57:08] + News From Folding@Home: Welcome to Folding@Home
[08:57:08] Loaded queue successfully.
[08:57:11] + Closed connections
[08:57:11]
[08:57:11] + Processing work unit
[08:57:11] Core required: FahCore_a1.exe
[08:57:11] Core found.
[08:57:11] Working on Unit 05 [April 17 08:57:11]

matches what is shown in unitinfo
Download time: April 17 08:57:11

Now check the local time/data on your computer. Convert from your local timezone to UTC. Subtract "now" from the download time and do you get DL: 1d 08h 12mn ago when you actually captured this data?

If FahMon is running on a different computer than the WU is running on, you need to check that both clocks are synchronized and that both timezones are set correctly. In any case, if you find a problem with the local date/time, be VERY careful when you change your clock because you can create a condition where the WU expires. There's a setting in FAH's Advanced Config to ignore the local clock which will allow you to change the clock without the current WU expiring.

If that doesn't resolve it, check the settings on the Monitoring tab of the FahMon Preferences.

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Fri Apr 25, 2008 5:15 am
by 7im
MDCRL wrote: This is what the script looks like: are the parts in red correct or typo's?
Not typo's no, bug, yes. Jumbled text in the log is one of the Known Bugs. The list of bugs is in a Sticky Post at the top of the Windows SMP section of the forum.

Re: Project: 3062 (Run 1, Clone 37, Gen 36) Bad?

Posted: Mon Apr 28, 2008 6:23 pm
by MDCRL
Thanks 7im...

That was the issue - local time buggered for some reason.... All better now :mrgreen:

I usually do run client w/ ignore local time..... now I know why that is suggested

- & once again.....
We appreciate all the time spent helping to chase down petty gremlins,
despite all the larger issues already taking up much of your time Image

Have a great week and continued good luck with all those other gremlins you all are working on.