Page 1 of 1

P6900 R60, C4, G17

Posted: Thu Aug 18, 2011 8:57 pm
by Stewart1
Multiple failures. Immediately after starting there is a message on the console reading:

Code: Select all

step 4250000: Water molecule starting at atom 821562 cannot be settled.
Check for bad copntacts and/or reduce the timestep if appropriate.
It then reports CoreStatus = 0 (0) and quits.

Code: Select all

# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /usr/local/fah
Executable: ./fah6
Arguments: -bigadv -smp 8 

[20:39:08] - Ask before connecting: No
[20:39:08] - User name: Stewart1 (Team 163049)
[20:39:08] - User ID: 230E7B3A3AD117E1
[20:39:08] - Machine ID: 1
[20:39:08] 
[20:39:08] Loaded queue successfully.
[20:39:08] 
[20:39:08] + Processing work unit
[20:39:08] Core required: FahCore_a5.exe
[20:39:08] Core found.
[20:39:08] Working on queue slot 02 [August 18 20:39:08 UTC]
[20:39:08] + Working ...
[20:39:08] 
[20:39:08] *------------------------------*
[20:39:08] Folding@Home Gromacs SMP Core
[20:39:08] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[20:39:08] 
[20:39:08] Preparing to commence simulation
[20:39:08] - Looking at optimizations...
[20:39:08] - Created dyn
[20:39:08] - Files status OK
[20:39:08] Error: Missing work file=<>
[20:39:08] 
[20:39:08] Folding@home Core Shutdown: MISSING_WORK_FILES
[20:39:08] CoreStatus = 74 (116)
[20:39:08] The core could not find the work files specified. Removing from queue
[20:39:08] Deleting current work unit & continuing...
[20:39:08] - Preparing to get new work unit...
[20:39:08] Cleaning up work directory
[20:39:08] + Attempting to get work packet
[20:39:08] Passkey found
[20:39:08] - Connecting to assignment server
[20:39:09] - Successful: assigned to (130.237.232.141).
[20:39:09] + News From Folding@Home: Welcome to Folding@Home
[20:39:10] Loaded queue successfully.
[20:42:04] + Closed connections
[20:42:09] 
[20:42:09] + Processing work unit
[20:42:09] Core required: FahCore_a5.exe
[20:42:09] Core found.
[20:42:09] Working on queue slot 03 [August 18 20:42:09 UTC]
[20:42:09] + Working ...
[20:42:09] 
[20:42:09] *------------------------------*
[20:42:09] Folding@Home Gromacs SMP Core
[20:42:09] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[20:42:09] 
[20:42:09] Preparing to commence simulation
[20:42:09] - Looking at optimizations...
[20:42:09] - Created dyn
[20:42:09] - Files status OK
[20:42:12] - Expanded 24863108 -> 30796292 (decompressed 123.8 percent)
[20:42:12] Called DecompressByteArray: compressed_data_size=24863108 data_size=30796292, decompressed_data_size=30796292 diff=0
[20:42:12] - Digital signature verified
[20:42:12] 
[20:42:12] Project: 6900 (Run 60, Clone 4, Gen 17)
[20:42:12] 
[20:42:12] Assembly optimizations on if available.
[20:42:12] Entering M.D.
[20:42:19] Mapping NT from 8 to 8 
[20:42:27] Completed 0 out of 250000 steps  (0%)
[20:42:31] CoreStatus = 0 (0)
[20:42:31] Sending work to server
[20:42:31] Project: 6900 (Run 60, Clone 4, Gen 17)
[20:42:31] - Error: Could not get length of results file work/wuresults_03.dat
[20:42:31] - Error: Could not read unit 03 file. Removing from queue.
[20:42:31] - Preparing to get new work unit...
[20:42:31] Cleaning up work directory
[20:42:31] + Attempting to get work packet
[20:42:31] Passkey found
[20:42:31] - Connecting to assignment server
[20:42:31] - Successful: assigned to (130.237.232.141).
[20:42:31] + News From Folding@Home: Welcome to Folding@Home
[20:42:32] Loaded queue successfully.
[20:45:45] + Closed connections
[20:45:50] 
[20:45:50] + Processing work unit
[20:45:50] Core required: FahCore_a5.exe
[20:45:50] Core found.
[20:45:50] Working on queue slot 04 [August 18 20:45:50 UTC]
[20:45:50] + Working ...
[20:45:50] 
[20:45:50] *------------------------------*
[20:45:50] Folding@Home Gromacs SMP Core
[20:45:50] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[20:45:50] 
[20:45:50] Preparing to commence simulation
[20:45:50] - Ensuring status. Please wait.
[20:45:59] - Looking at optimizations...
[20:45:59] - Working with standard loops on this execution.
[20:45:59] - Created dyn
[20:45:59] - Files status OK
[20:46:01] - Expanded 24863108 -> 30796292 (decompressed 123.8 percent)
[20:46:01] Called DecompressByteArray: compressed_data_size=24863108 data_size=30796292, decompressed_data_size=30796292 diff=0
[20:46:02] - Digital signature verified
[20:46:02] 
[20:46:02] Project: 6900 (Run 60, Clone 4, Gen 17)
[20:46:02] 
[20:46:02] Entering M.D.
[20:46:09] Mapping NT from 8 to 8 
[20:46:13] Completed 0 out of 250000 steps  (0%)
[20:46:16] CoreStatus = 0 (0)
[20:46:17] Sending work to server
[20:46:17] Project: 6900 (Run 60, Clone 4, Gen 17)
[20:46:17] - Error: Could not get length of results file work/wuresults_04.dat
[20:46:17] - Error: Could not read unit 04 file. Removing from queue.
[20:46:17] - Preparing to get new work unit...
[20:46:17] Cleaning up work directory
[20:46:17] + Attempting to get work packet
[20:46:17] Passkey found
[20:46:17] - Connecting to assignment server
[20:46:17] - Successful: assigned to (130.237.232.141).
[20:46:17] + News From Folding@Home: Welcome to Folding@Home
[20:46:17] Loaded queue successfully.
[20:49:30] + Closed connections
[20:49:35] 
[20:49:35] + Processing work unit
[20:49:35] Core required: FahCore_a5.exe
[20:49:35] Core found.
[20:49:35] Working on queue slot 05 [August 18 20:49:35 UTC]
[20:49:35] + Working ...
[20:49:35] 
[20:49:35] *------------------------------*
[20:49:35] Folding@Home Gromacs SMP Core
[20:49:35] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[20:49:35] 
[20:49:35] Preparing to commence simulation
[20:49:35] - Ensuring status. Please wait.
[20:49:45] - Looking at optimizations...
[20:49:45] - Working with standard loops on this execution.
[20:49:45] - Created dyn
[20:49:45] - Files status OK
[20:49:47] - Expanded 24863108 -> 30796292 (decompressed 123.8 percent)
[20:49:47] Called DecompressByteArray: compressed_data_size=24863108 data_size=30796292, decompressed_data_size=30796292 diff=0
[20:49:47] - Digital signature verified
[20:49:47] 
[20:49:47] Project: 6900 (Run 60, Clone 4, Gen 17)
[20:49:47] 
[20:49:47] Entering M.D.
[20:49:54] Mapping NT from 8 to 8 
[20:49:58] Completed 0 out of 250000 steps  (0%)
[20:50:01] CoreStatus = 0 (0)
[20:50:01] Sending work to server
[20:50:01] Project: 6900 (Run 60, Clone 4, Gen 17)
[20:50:01] - Error: Could not get length of results file work/wuresults_05.dat
[20:50:01] - Error: Could not read unit 05 file. Removing from queue.
[20:50:01] - Preparing to get new work unit...
[20:50:01] Cleaning up work directory
[20:50:01] + Attempting to get work packet
[20:50:01] Passkey found
[20:50:01] - Connecting to assignment server
[20:50:02] - Successful: assigned to (171.67.108.22).
[20:50:02] + News From Folding@Home: Welcome to Folding@Home
[20:50:02] Loaded queue successfully.

Folding@Home Client Shutdown.

Re: P6900 R60, C4, G17

Posted: Sat Aug 20, 2011 3:46 pm
by naapi
Exactly the same problem here. It downloads multiple times, then proceeds to download new core (always goes with A5), after 5 core downloads it puts a statement to the effect "5 core download fails, going to sleep for 24 hours".

4 x Opteron 6128, 16 MB RAM, Ubuntu 11.04, Kraken, Langouste, 10% overclocking, as per tear's SuperMicro G34 modification, no problem whatsoever with other WUs.

Code: Select all

15:10:47] Completed 247502 out of 250002 steps  (99%)
[15:20:53] Completed 250002 out of 250002 steps  (100%)

Writing final coordinates.

Average load imbalance: 8.4 %
Part of the total run time spent waiting due to load imbalance: 3.8 %


Parallel run - timing based on wallclock.

              NODE (s)  Real (s)      (%)
      Time:  11352.345  11352.345    100.0
                      3h09:12
              (Mnbf/s)  (GFlops)  (ns/day)  (hour/ns)
Performance:  1620.480    85.582      1.430    16.782

Thanx for Using GROMACS - Have a Nice Day

[15:21:06] DynamicWrapper: Finished Work Unit: sleep=10000
[15:21:16]
[15:21:16] Finished Work Unit:
[15:21:16] - Reading up to 56084448 from "work/wudata_05.trr": Read 56084448
[15:21:16] trr file hash check passed.
[15:21:16] - Reading up to 45565864 from "work/wudata_05.xtc": Read 45565864
[15:21:17] xtc file hash check passed.
[15:21:17] edr file hash check passed.
[15:21:17] logfile size: 196584
[15:21:17] Leaving Run
[15:21:22] - Writing 102017244 bytes of core data to disk...
[15:21:23]  ... Done.
[15:21:46] - Shutting down core
[15:21:46]
[15:21:46] Folding@home Core Shutdown: FINISHED_UNIT
[15:21:49] CoreStatus = 64 (100)
[15:21:49] Unit 5 finished with 87 percent of time to deadline remaining.
[15:21:49] Updated performance fraction: 0.915243
[15:21:49] Sending work to server
[15:21:49] Project: 2689 (Run 5, Clone 9, Gen 132)


(...) trying to send finished WU
(...) trying to send finished WU 2
(...) trying to send finished WU 3

[15:21:49] - Preparing to get new work unit...
[15:21:49] Cleaning up work directory
[15:21:49] + Attempting to get work packet
[15:21:49] Passkey found
[15:21:49] - Will indicate memory of 16075 MB
[15:21:49] - Connecting to assignment server
[15:21:49] Connecting to http://assign.stanford.edu:8080/
[15:21:50] Posted data.
[15:21:50] Initial: ED82; - Successful: assigned to (130.237.232.141).
[15:21:50] + News From Folding@Home: Welcome to Folding@Home
[15:21:50] Loaded queue successfully.
[15:21:50] Sent data
[15:21:50] Connecting to http://130.237.232.141:8080/
[15:21:56] Posted data.
[15:21:58] Initial: 0000; - Receiving payload (expected size: 24863620)
[15:22:23] - Downloaded at ~971 kB/s
[15:22:23] - Averaged speed for that direction ~927 kB/s
[15:22:23] + Received work.
[15:22:23] Trying to send all finished work units
[15:22:23] Project: 2689 (Run 5, Clone 9, Gen 132)

(...) trying to send finished WU 4
(...) trying to send finished WU 5

[15:22:24] + Processing work unit
[15:22:24] Core required: FahCore_a5.exe
[15:22:24] Core found.
[15:22:24] Working on queue slot 06 [August 20 15:22:24 UTC]
[15:22:24] + Working ...
[15:22:24] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 06 -np 32 -checkpoint 30 -verbose -lifeline 7251 -version 634'

thekraken: The Kraken 0.2
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: Logging to thekraken.log
[15:22:24]
[15:22:24] *------------------------------*
[15:22:24] Folding@Home Gromacs SMP Core
[15:22:24] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[15:22:24]
[15:22:24] Preparing to commence simulation
[15:22:24] - Looking at optimizations...
[15:22:24] - Created dyn
[15:22:24] - Files status OK
[15:22:27] - Expanded 24863108 -> 30796292 (decompressed 123.8 percent)
[15:22:27] Called DecompressByteArray: compressed_data_size=24863108 data_size=30796292, decompressed_data_size=30796292 diff=0
[15:22:27] - Digital signature verified
[15:22:27]
[15:22:27] Project: 6900 (Run 60, Clone 4, Gen 17)
[15:22:27]
[15:22:27] Assembly optimizations on if available.
[15:22:27] Entering M.D.
                        :-)  G  R  O  M  A  C  S  (-:

                  Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra,
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff,
          Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

              Berk Hess, David van der Spoel, and Erik Lindahl.

      Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                              :-)  Gromacs  (-:

Reading file work/wudata_06.tpr, VERSION 4.0.99_development_20090605 (single precision)
[15:22:35] Mapping NT from 32 to 32
Note: tpx file_version 70, software version 73
Starting 32 threads
Making 2D domain decomposition 8 x 4 x 1
starting mdrun 'SINGLE VESICLE in water'
4500000 steps,  18000.0 ps (continuing from step 4250000,  17000.0 ps).

step 4250000: Water molecule starting at atom 821562 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.
[15:22:38] Completed 0 out of 250000 steps  (0%)

step 4250001: Water molecule starting at atom 821562 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.
Segmentation fault
[15:22:40] CoreStatus = 8B (139)
[15:22:40] Client-core communications error: ERROR 0x8b
[15:22:40] Deleting current work unit & continuing...
thekraken: The Kraken 0.2
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: Logging to thekraken.log
[15:22:58] Trying to send all finished work units
[15:22:58] Project: 2689 (Run 5, Clone 9, Gen 132)
(...) trying to send finished WU 6
(...) trying to send finished WU 7

[15:22:58] - Preparing to get new work unit...
[15:22:58] Cleaning up work directory
[15:22:58] + Attempting to get work packet
[15:22:58] Passkey found
[15:22:58] - Will indicate memory of 16075 MB
[15:22:58] - Connecting to assignment server
[15:22:58] Connecting to http://assign.stanford.edu:8080/
[15:23:04] Posted data.
[15:23:04] Initial: ED82; - Successful: assigned to (130.237.232.141).
[15:23:04] + News From Folding@Home: Welcome to Folding@Home
[15:23:04] Loaded queue successfully.
[15:23:04] Sent data
[15:23:04] Connecting to http://130.237.232.141:8080/
[15:23:14] Posted data.
[15:23:14] Initial: 0000; - Receiving payload (expected size: 24863620)
[15:23:40] - Downloaded at ~933 kB/s
[15:23:40] - Averaged speed for that direction ~928 kB/s
[15:23:40] + Received work.
[15:23:40] + Closed connections
^C[15:23:43] ***** Got an Activate signal (2)
[15:23:44] Killing all core threads