Project 2665 - Warning: pressure scaling more than 1%

Moderators: Site Moderators, FAHC Science Team

Post Reply
DocJonz
Posts: 257
Joined: Thu Dec 06, 2007 6:31 pm
Hardware configuration: Folding with: 4x RTX 4070Ti, 1x RTX 4080 Super, 1x RTX 5070Ti
Location: United Kingdom
Contact:

Project 2665 - Warning: pressure scaling more than 1%

Post by DocJonz »

Two of my LinuxSMP's stopped in the night (Project 2665 (3,823,46) and Project 2665 (3,986,44)), so I restarted them and this (referring to the title!) was the resulting error message I received before the WU deleted itself and started a new one (see code below).

If such an error message is written into the code, would it not be a good idea to send the results back on detection of this warning and inform the WU writer, rather than just deleteing them and starting a new one?

Code: Select all

# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/williams/Folding@Home/LinuxSMP1
Executable: ./fah6
Arguments: -forceasm -smp -advmethods 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[06:06:44] - Ask before connecting: No
[06:06:44] - User name: DocJonz (Team 35947)
[06:06:44] - User ID: 1380EE5D0603FD45
[06:06:44] - Machine ID: 1
[06:06:44] 
[06:06:44] Loaded queue successfully.
[06:06:44] 
[06:06:44] + Processing work unit
[06:06:44] Core required: FahCore_a1.exe
[06:06:44] Core found.
[06:06:44] Working on Unit 04 [September 2 06:06:44]
[06:06:44] + Working ...
[06:06:44] 
[06:06:44] *------------------------------*
[06:06:44] Folding@Home Gromacs SMP Core
[06:06:44] Version 1.74 (November 27, 2006)
[06:06:44] 
[06:06:44] Preparing to commence simulation
[06:06:44] - Ensuring status. Please wait.
[06:06:45] 
[06:06:46] Project: 2665 (Run 3, Clone 986, Gen 44)
[06:06:46] 
[06:06:46] Assembly optimizations on if available.
[06:06:46] Entering M.D.
[06:07:03]  on if available.
[06:07:03] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=PC3-AkasaLinux
NNODES=4, MYRANK=3, HOSTNAME=PC3-AkasaLinux
NNODES=4, MYRANK=2, HOSTNAME=PC3-AkasaLinux
NNODES=4, MYRANK=1, HOSTNAME=PC3-AkasaLinux
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

(single precision)
[06:07:10] Protein: HGG in water
[06:07:10] Writing local files
starting mdrun 'HGG in water'
250000 steps,    500.0 ps.

[06:07:10] Completed 155864 out of 250000 steps  (62 CompletedExtra SSE boost OK.
[06:07:10] 0 steps  (62 percent)
[06:07:11] Extra SSE boost OK.

Step 155960  Warning: pressure scaling more than 1%, mu: 3572.44 3572.44 3572.44

Step 155960  Warning: pressure scaling more than 1%, mu: 3572.44 3572.44 3572.44

Step 155960  Warning: pressure scaling more than 1%, mu: 3572.44 3572.44 3572.44

Step 155960  Warning: pressure scaling more than 1%, mu: 3572.44 3572.44 3572.44
[06:07:45] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[06:07:49] CoreStatus = 0 (0)
[06:07:49] Client-core communications error: ERROR 0x0
[06:07:49] Deleting current work unit & continuing...

Code: Select all

# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/williams/Folding@Home/LinuxSMP1
Executable: ./fah6
Arguments: -local -forceasm -smp 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[06:07:49] - Ask before connecting: No
[06:07:49] - User name: DocJonz (Team 35947)
[06:07:49] - User ID: 67F77E12465234F2
[06:07:49] - Machine ID: 1
[06:07:49] 
[06:07:49] Loaded queue successfully.
[06:07:49] 
[06:07:49] + Processing work unit
[06:07:49] Core required: FahCore_a1.exe
[06:07:49] Core found.
[06:07:49] Working on Unit 05 [September 2 06:07:49]
[06:07:49] + Working ...
[06:07:50] 
[06:07:50] *------------------------------*
[06:07:50] Folding@Home Gromacs SMP Core
[06:07:50] Version 1.74 (November 27, 2006)
[06:07:50] 
[06:07:50] Preparing to commence simulation
[06:07:50] - Ensuring status. Please wait.
[06:07:50] 
[06:07:51] Project: 2665 (Run 3, Clone 823, Gen 46)
[06:07:51] 
[06:07:51] Assembly optimizations on if available.
[06:07:51] Entering M.D.
[06:08:08]  on if available.
[06:08:08] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=PC5-AntecLinuxLR
NNODES=4, MYRANK=2, HOSTNAME=PC5-AntecLinuxLR
NNODES=4, MYRANK=1, HOSTNAME=PC5-AntecLinuxLR
NNODES=4, MYRANK=3, HOSTNAME=PC5-AntecLinuxLR
NODEID=1 argc=15
NODEID=0 argc=15
NODEID=2 argc=15
NODEID=3 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

(single precision)
[06:08:15] Protein: HGG in water
[06:08:15] Writing local files
[06:08:15] Protein: HGG in water
[06:08:15] Writing local files
starting mdrun 'HGG in water'
250000 steps,    500.0 ps.

[06:08:16] ercent)
[06:08:16] Extra SSE boost OK.
[06:08:16] 0 steps  (35 percent)
[06:08:16] Extra SSE boost OK.

Step 89580  Warning: pressure scaling more than 1%, mu: 975.497 975.497 975.497

Step 89580  Warning: pressure scaling more than 1%, mu: 975.497 975.497 975.497
[06:10:34] Warning:  long 1-4 interactions

Step 89580  Warning: pressure scaling more than 1%, mu: 975.497 975.497 975.497

Step 89580  Warning: pressure scaling more than 1%, mu: 975.497 975.497 975.497
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[06:10:39] CoreStatus = 0 (0)
[06:10:39] Client-core communications error: ERROR 0x0
[06:10:39] Deleting current work unit & continuing...
[0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[06:15:02] - Preparing to get new work unit...
[06:15:02] + Attempting to get work packet
[06:15:02] - Connecting to assignment server
[06:15:02] - Successful: assigned to (171.64.65.56).
[06:15:02] + News From Folding@Home: Welcome to Folding@Home
[06:15:02] Loaded queue successfully.
Folding Stats (HFM.NET): DocJonz Folding Farm Stats
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 2665 - Warning: pressure scaling more than 1%

Post by bruce »

DocJonz wrote:If such an error message is written into the code, would it not be a good idea to send the results back on detection of this warning and inform the WU writer, rather than just deleteing them and starting a new one?
Yes, it would. Error recovery in FahCore_a1 has always been poor, resulting in many WUs being deleted that should have been reported. That was noted soon after the SMP version entered beta and it's still a problem. In fact, that's one of the two top reasons why FahCore_a1 (and SMP) is still classified as beta. (Yes, it's a fine distinction where the client is considered Released, but the core that it's running isn't.)
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Project 2665 - Warning: pressure scaling more than 1%

Post by VijayPande »

I agree it would be important to get better reporting here. However, please note a couple of items. Our non-SMP clients do crash reporting. Here's the problem for SMP: SMP has to use MPI, which means that there's a program in between our client and the core (mpirun). When the core crashes, mpirun doesn't give any useful info and so our client can't know that something bad has happened.

We're looking into what we need to do to work around this, but that's the situation. It's not that it's not been considered or that it's trivial. It's something very much on our minds. Hopefully we can have this resolved ASAP.
Post Reply