Merged problems with projects 6903/6904, Part 2

Moderators: Site Moderators, FAHC Science Team

Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Merged problems with projects 6903/6904, Part 2

Post by Nathan_P »

Edit by Mod: Since this problem existed previously and was apparently fixed ... for a couple of months ... but it's back, I'm starting a new topic.

I have received another of those 512 byte -bigadv units, fortunately after 20 minutes the client deleted it and downloaded a new WU.

According to hfm the affected WU is as in title.

Code: Select all

[15:37:59] + Results successfully sent
[15:37:59] Thank you for your contribution to Folding@Home.
[15:37:59] + Number of Units Completed: 50

[15:38:03] - Preparing to get new work unit...
[15:38:03] Cleaning up work directory
[15:38:03] + Attempting to get work packet
[15:38:03] Passkey found
[15:38:03] - Connecting to assignment server
[15:38:04] - Successful: assigned to (130.237.232.237).
[15:38:04] + News From Folding@Home: Welcome to Folding@Home
[15:38:04] Loaded queue successfully.
[15:38:04] + Closed connections
[15:38:04] 
[15:38:04] + Processing work unit
[15:38:04] Core required: FahCore_a5.exe
[15:38:04] Core found.
[15:38:04] Working on queue slot 02 [April 25 15:38:04 UTC]
[15:38:04] + Working ...
[15:38:05] 
[15:38:05] *------------------------------*
[15:38:05] Folding@Home Gromacs SMP Core
[15:38:05] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[15:38:05] 
[15:38:05] Preparing to commence simulation
[15:38:05] - Looking at optimizations...
[15:38:05] - Created dyn
[15:38:05] - Files status OK
[15:38:05] Couldn't Decompress
[15:38:05] Called DecompressByteArray: compressed_data_size=0 data_size=0, decompressed_data_size=0 diff=0
[15:38:05] -Error: Couldn't update checksum variables
[15:38:05] Error: Could not open work file
[15:38:05] 
[15:38:05] Folding@home Core Shutdown: FILE_IO_ERROR
[15:38:05] CoreStatus = 75 (117)
[15:38:05] Error opening or reading from a file.
[15:38:05] Deleting current work unit & continuing...
[15:38:05] - Preparing to get new work unit...
[15:38:05] Cleaning up work directory
[15:38:08] + Attempting to get work packet
[15:38:08] Passkey found
[15:38:08] - Connecting to assignment server
[15:38:08] - Successful: assigned to (130.237.232.237).
[15:38:08] + News From Folding@Home: Welcome to Folding@Home
[15:38:09] Loaded queue successfully.
[15:38:09] + Closed connections
[15:38:14] 
[15:38:14] + Processing work unit
[15:38:14] Core required: FahCore_a5.exe
[15:38:14] Core found.
[15:38:14] Working on queue slot 03 [April 25 15:38:14 UTC]
[15:38:14] + Working ...
[15:38:14] 
[15:38:14] *------------------------------*
[15:38:14] Folding@Home Gromacs SMP Core
[15:38:14] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[15:38:14] 
[15:38:14] Preparing to commence simulation
[15:38:14] - Looking at optimizations...
[15:38:14] - Created dyn
[15:38:14] - Files status OK
[15:38:14] Couldn't Decompress
[15:38:14] Called DecompressByteArray: compressed_data_size=0 data_size=0, decompressed_data_size=0 diff=0
[15:38:14] -Error: Couldn't update checksum variables
[15:38:14] Error: Could not open work file
[15:38:14] 
[15:38:14] Folding@home Core Shutdown: FILE_IO_ERROR
[15:38:14] CoreStatus = 75 (117)
[15:38:14] Error opening or reading from a file.
[15:38:14] Deleting current work unit & continuing...
[15:38:14] - Preparing to get new work unit...
[15:38:14] Cleaning up work directory
[15:38:17] + Attempting to get work packet
[15:38:17] Passkey found
[15:38:17] - Connecting to assignment server
[15:38:18] - Successful: assigned to (130.237.232.237).
[15:38:18] + News From Folding@Home: Welcome to Folding@Home
[15:38:18] Loaded queue successfully.
[15:38:18] + Closed connections
[15:38:23] 
[15:38:23] + Processing work unit
[15:38:23] Core required: FahCore_a5.exe
[15:38:23] Core found.
[15:38:23] Working on queue slot 04 [April 25 15:38:23 UTC]
[15:38:23] + Working ...
[15:38:24] 
[15:38:24] *------------------------------*
[15:38:24] Folding@Home Gromacs SMP Core
[15:38:24] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[15:38:24] 
[15:38:24] Preparing to commence simulation
[15:38:24] - Looking at optimizations...
[15:38:24] - Created dyn
[15:38:24] - Files status OK
[15:38:24] Couldn't Decompress
[15:38:24] Called DecompressByteArray: compressed_data_size=0 data_size=0, decompressed_data_size=0 diff=0
[15:38:24] -Error: Couldn't update checksum variables
[15:38:24] Error: Could not open work file
[15:38:24] 
[15:38:24] Folding@home Core Shutdown: FILE_IO_ERROR
[15:38:24] CoreStatus = 75 (117)
[15:38:24] Error opening or reading from a file.
[15:38:24] Deleting current work unit & continuing...
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: P6903 (R10,C3,G64)

Post by bruce »

There's no record in the Mod DB of Project: 6903 (Run 10,Clone 3,Gen 64) being returned yet, but that's what I'd expect for a WU that was deleted. We'll have to wait for somebody else to either complete it or report it.
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by Leonardo »

Project: 6904 (Run 2, Clone 19, Gen 84)
Will not fold, as in will not process after downloading:

Code: Select all

[05:39:49] - Autosending finished units... [May 22 05:39:49 UTC]
[05:39:49] + Processing work unit
[05:39:49] Trying to send all finished work units
[05:39:49] Core required: FahCore_a5.exe
[05:39:49] + No unsent completed units remaining.
[05:39:49] - Autosend completed
[05:39:49] Core found.
[05:39:49] Working on queue slot 09 [May 22 05:39:49 UTC]
[05:39:49] + Working ...
[05:39:49] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 09 -np 32 -checkpoint 30 -forceasm -verbose -lifeline 2219 -version 634'

[05:39:49] 
[05:39:49] *------------------------------*
[05:39:49] Folding@Home Gromacs SMP Core
[05:39:49] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[05:39:49] 
[05:39:49] Preparing to commence simulation
[05:39:49] - Ensuring status. Please wait.
[05:39:59] - Assembly optimizations manually forced on.
[05:39:59] - Not checking prior termination.
[05:40:04] - Expanded 45658952 -> 70963200 (decompressed 61.3 percent)
[05:40:04] Called DecompressByteArray: compressed_data_size=45658952 data_size=70963200, decompressed_data_size=70963200 diff=0
[05:40:05] - Digital signature verified
[05:40:05] 
[05:40:05] Project: 6904 (Run 2, Clone 19, Gen 84)
[05:40:05] 
[05:40:05] Assembly optimizations on if available.
[05:40:05] Entering M.D.
[05:40:14] Mapping NT from 32 to 32 
The status was unchanged after an hour. This was the second attempt to engage this work unit. The first time it sat at "Mapping NT from 32 to 32" for half an hour. I stopped the client (properly), shut down the machine, and rebooted. The results are in the log excerpt above. I gave up and deleted the work unit, queue, and machinedependent.dat and changed the machine ID number. Upon restarting the client, it promptly downloaded a 6903 and is folding that presently without issue.

Bad work unit? This machine has sliced through everything thrown at it for nearly half a year. I've never had a WU failure to engage before.

System specifications, if that provides any usable context:
G34 4-socket, 32 cores
Ubuntu 10.10
all core temps very good
stable power supply on beefy UPS
Image
autogrog
Posts: 38
Joined: Mon Aug 18, 2008 3:38 pm
Location: Halifax, Nova Scotia

Re: Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by autogrog »

I too have received half a dozen and all of them failed after taking a hour to down load on a connection that runs at 1500 kB/sec. They all failed with a ' can't get length of work file' msg.
The server stats page show blanks on the line for this server as well.
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by Leonardo »

Autogrog, "half a dozen" of the same work unit that I specified?
Image
autogrog
Posts: 38
Joined: Mon Aug 18, 2008 3:38 pm
Location: Halifax, Nova Scotia

Re: Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by autogrog »

Yes, the exact same WU would download, run for about a while (using only 1 of the 24 real cores), then fail and download itself again.
Now on a different system (i7 980) the same repeating problem on a different 6904 WU:

Code: Select all

[08:14:31] Passkey found
[08:14:31] - Will indicate memory of 5977 MB
[08:14:31] - Connecting to assignment server
[08:14:31] Connecting to http://assign.stanford.edu:8080/
[08:14:32] Posted data.
[08:14:32] Initial: ED82; - Successful: assigned to (130.237.232.237).
[08:14:32] + News From Folding@Home: Welcome to Folding@Home
[08:14:32] Loaded queue successfully.
[08:14:32] Sent data
[08:14:32] Connecting to http://130.237.232.237:8080/
[08:14:35] Posted data.
[08:14:35] Initial: 0000; - Receiving payload (expected size: 10251999)
[08:15:08] - Downloaded at ~303 kB/s
[08:15:08] - Averaged speed for that direction ~1389 kB/s
[08:15:08] + Received work.
[08:15:08] Trying to send all finished work units
[08:15:08] + No unsent completed units remaining.
[08:15:08] + Closed connections
[08:15:13] 
[08:15:13] + Processing work unit
[08:15:13] Core required: FahCore_a5.exe
[08:15:13] Core found.
[08:15:13] Working on queue slot 01 [May 23 08:15:13 UTC]
[08:15:13] + Working ...
[08:15:13] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 01 -np 12 -checkpoint 30 -verbose -lifeline 1252 -version 634'

[08:15:13] 
[08:15:13] *------------------------------*
[08:15:13] Folding@Home Gromacs SMP Core
[08:15:13] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[08:15:13] 
[08:15:13] Preparing to commence simulation
[08:15:13] - Ensuring status. Please wait.
[08:15:23] - Looking at optimizations...
[08:15:23] - Working with standard loops on this execution.
[08:15:23] - Created dyn
[08:15:23] - Files status OK
[08:15:24] - Expanded 10251487 -> 28999680 (decompressed 282.8 percent)
[08:15:24] Called DecompressByteArray: compressed_data_size=10251487 data_size=28999680, decompressed_data_size=28999680 diff=0
[08:15:24] - Digital signature verified
[08:15:24] 
[08:15:24] Project: 6904 (Run 2, Clone 26, Gen 86)
[08:15:24] 
[08:15:24] Entering M.D.
[08:15:30] Mapping NT from 12 to 12 
[08:29:19] CoreStatus = 0 (0)
[08:29:19] Sending work to server
[08:29:19] Project: 6904 (Run 2, Clone 26, Gen 86)
[08:29:19] - Error: Could not get length of results file work/wuresults_01.dat
[08:29:19] - Error: Could not read unit 01 file. Removing from queue.
[08:29:19] Trying to send all finished work units
[08:29:19] + No unsent completed units remaining.
[08:29:19] - Preparing to get new work unit...
[08:29:19] Cleaning up work directory
[08:29:39] + Attempting to get work packet
[08:29:39] Passkey found
paba
Posts: 2
Joined: Wed May 23, 2012 3:50 pm

Re: Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by paba »

Same story here with 6904 (Run 2, Clone 26, Gen 86). Last message is "Mapping NT from 12 to 12". It uses only one core of 12 and it breaks after 10 minutes. Then it downloads again just the same WU. Goto "Same story here". :roll:
paba
Posts: 2
Joined: Wed May 23, 2012 3:50 pm

Re: Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by paba »

Addendum to my previous post. The WU produces a memory leak. It does allocate 2 MB per second. My vbox has 8 GB available, so the timeout of the WU (10 minutes) will be reached before the memory is exhaust.
gwildperson
Posts: 450
Joined: Tue Dec 04, 2007 8:36 pm

Re: Project: 6904 (Run 2, Clone 19, Gen 84) - will not Fold

Post by gwildperson »

CoreStatus = 0 (0) is an unknown error. Maybe you have a bad WU but only a Mod can tell you that.

How about posting the other PRCG numbers.

At the very least, we know that 84 Gens of that WU have been finished by someone.
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Project: 6901 (Run 5, Clone 0, Gen 227): moar b0rk edition

Post by tear »

Mis-sized P6901 that, not surprisingly, doesn't simulate at all
and produces usual memory leak instead.

Code: Select all

[04:53:51] Connecting to http://assign.stanford.edu:8080/
[04:53:51] Posted data.
[04:53:51] Initial: ED82; - Successful: assigned to (130.237.232.237).
[04:53:51] + News From Folding@Home: Welcome to Folding@Home
[04:53:51] Loaded queue successfully.
[04:53:51] Sent data
[04:53:51] Connecting to http://130.237.232.237:8080/
[04:53:52] Posted data.
[04:53:52] Initial: 0000; - Receiving payload (expected size: 9122)
[04:53:53] - Downloaded at ~8 kB/s
[04:53:53] - Averaged speed for that direction ~197 kB/s
[04:53:53] + Received work.
(...)
[04:53:53] + Processing work unit
[04:53:53] Core required: FahCore_a5.exe
[04:53:53] Core found.
[04:53:53] Working on queue slot 00 [May 25 04:53:53 UTC]
[04:53:53] + Working ...
[04:53:53] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 00 -np 64 -checkpoint 15 -forceasm -verbose -lifeline 11667 -version 634'
[04:53:53]
[04:53:53] *------------------------------*
[04:53:53] Folding@Home Gromacs SMP Core
[04:53:53] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[04:53:53]
[04:53:53] Preparing to commence simulation
[04:53:53] - Assembly optimizations manually forced on.
[04:53:53] - Not checking prior termination.
[04:53:53] - Expanded 8610 -> 4165632 (decompressed 48381.3 percent)
[04:53:53] Called DecompressByteArray: compressed_data_size=8610 data_size=4165632, decompressed_data_size=4165632 diff=0
[04:53:53] - Digital signature verified
[04:53:53]
[04:53:53] Project: 6901 (Run 5, Clone 0, Gen 227)
[04:53:53]
[04:53:53] Assembly optimizations on if available.
[04:53:53] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra,
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff,
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_00.tpr, VERSION 4.0.99_development_20090605 (single precision)
Note: tpx file_version 70, software version 73
[04:53:59] Mapping NT from 64 to 64
^C[05:36:15] ***** Got an Activate signal (2)
One man's ceiling is another man's floor.
Image
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Project: 6901 (Run 5, Clone 0, Gen 227): moar b0rk editi

Post by Jesse_V »

Well the "Expanded 8610 -> 4165632" line seems a bit suspicious! Exactly how big was this memory leak? What exactly do you mean by "doesn't simulate at all"? You haven't been able to reach 1% in about 40 minutes?
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 6901 (Run 5, Clone 0, Gen 227): moar b0rk editi

Post by tear »

31.5GB when I stopped it.

Code: Select all

             total       used       free     shared    buffers     cached
Mem:      33014688   31501092    1513596          0         32    3851632
-/+ buffers/cache:   27649428    5365260
Swap:      2097148          0    2097148
Yes, as you can see, no progress was made in over 40 minutes.

Code: Select all

[04:53:52] Initial: 0000; - Receiving payload (expected size: 9122)
How many atoms can be described with 9 kilobytes ?

We've seen cases like this before at [H]. Units consume all memory and FahCore gets eventually killed by OOM handler.
One man's ceiling is another man's floor.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Project: 6901 (Run 5, Clone 0, Gen 227): moar b0rk editi

Post by Grandpa_01 »

Hey I am lucky #2 Project: 6901 (Run 5, Clone 0, Gen 227) on a 980X it was using 1 core out of 12 and when I shut it down 8Gb out of 12 of the memory.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bollix47
Posts: 2958
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Project: 6903 (Run 3, Clone 8, Gen 117)

Post by bollix47 »

Code: Select all

*********************** Log Started 2012-05-15T18:33:48Z ***********************
18:33:48:************************* Folding@home Client *************************
18:33:48:    Website: http://folding.stanford.edu/
18:33:48:  Copyright: (c) 2009-2012 Stanford University
18:33:48:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:33:48:       Args: 
18:33:48:     Config: /home/bollix/config.xml
18:33:48:******************************** Build ********************************
18:33:48:    Version: 7.1.52
18:33:48:       Date: Mar 20 2012
18:33:48:       Time: 13:19:11
18:33:48:    SVN Rev: 3515
18:33:48:     Branch: fah/trunk/client
18:33:48:   Compiler: GNU 4.6.2
18:33:48:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
18:33:48:             -fno-unsafe-math-optimizations -msse2
18:33:48:   Platform: linux2 3.2.0-1-amd64
18:33:48:       Bits: 64
18:33:48:       Mode: Release
18:33:48:******************************* System ********************************
18:33:48:        CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
18:33:48:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
18:33:48:       CPUs: 12
18:33:48:     Memory: 11.75GiB
18:33:48:Free Memory: 11.14GiB
18:33:48:    Threads: POSIX_THREADS
18:33:48: On Battery: false
18:33:48: UTC offset: -4
18:33:48:        PID: 1797
18:33:48:        CWD: /home/bollix
18:33:48:         OS: Linux 3.0.0-19-generic x86_64
18:33:48:    OS Arch: AMD64
18:33:48:       GPUs: 3
18:33:48:      GPU 0: UNSUPPORTED: Rage XL (Intel Corporation)
18:33:48:      GPU 1: UNSUPPORTED: Rage XL (Intel Corporation)
18:33:48:      GPU 2: NVIDIA:1 G92 [GeForce GTS 250]
18:33:48:       CUDA: 1.1
18:33:48:CUDA Driver: 4000
18:33:48:***********************************************************************
18:33:48:<config>
18:33:48:  <!-- FahCore Control -->
18:33:48:  <core-priority v='low'/>
18:33:48:
18:33:48:  <!-- Network -->
18:33:48:  <proxy v=':8080'/>
18:33:48:
18:33:48:  <!-- Remote Command Server -->
18:33:48:  <command-allow v='127.0.0.1,192.168.2.100-192.168.2.149'/>
18:33:48:  <command-allow-no-pass v='127.0.0.1,192.168.2.100-192.168.2.149'/>
18:33:48:
18:33:48:  <!-- User Information -->
18:33:48:  <passkey v='********************************'/>
18:33:48:  <team v='39340'/>
18:33:48:  <user v='bollix47'/>
18:33:48:
18:33:48:  <!-- Folding Slots -->
18:33:48:  <slot id='0' type='SMP'>
18:33:48:    <client-type v='bigadv'/>
18:33:48:    <cpus v='-1'/>
18:33:48:    <max-packet-size v='big'/>
18:33:48:    <next-unit-percentage v='99'/>
18:33:48:  </slot>
18:33:48:</config>
10:11:23:WU01:FS00:Connecting to assign3.stanford.edu:8080
10:11:23:WU01:FS00:News: Welcome to Folding@Home
10:11:23:WU01:FS00:Assigned to work server 130.237.232.237
10:11:23:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:12 from 130.237.232.237
10:11:23:WU01:FS00:Connecting to 130.237.232.237:8080
10:11:27:WU01:FS00:Downloading 7.58MiB 
10:11:33:WU01:FS00:Download 61.02%
10:11:38:WU00:FS00:0xa5:DynamicWrapper: Finished Work Unit: sleep=10000
10:11:39:WU01:FS00:Download 92.35%
10:11:40:WU01:FS00:Download complete
10:11:40:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:6903 run:3 clone:8 gen:117 core:0xa5 unit:0x000000a752be746d4de3a80d9eeff249
10:11:48:WU00:FS00:0xa5:
10:11:48:WU00:FS00:0xa5:Finished Work Unit:
10:11:48:WU00:FS00:0xa5:- Reading up to 121622496 from "00/wudata_01.trr": Read 121622496
10:11:48:WU00:FS00:0xa5:trr file hash check passed.
10:11:49:WU00:FS00:0xa5:- Reading up to 108808004 from "00/wudata_01.xtc": Read 108808004
10:11:49:WU00:FS00:0xa5:xtc file hash check passed.
10:11:49:WU00:FS00:0xa5:edr file hash check passed.
10:11:49:WU00:FS00:0xa5:logfile size: 211099
10:11:49:WU00:FS00:0xa5:Leaving Run
10:11:51:WU00:FS00:0xa5:- Writing 230814591 bytes of core data to disk...
10:12:33:WU00:FS00:0xa5:Done: 230814079 -> 222469092 (compressed to 3.3 percent)
10:12:34:WU00:FS00:0xa5:  ... Done.
10:12:59:WU00:FS00:0xa5:- Shutting down core
10:12:59:WU00:FS00:0xa5:
10:12:59:WU00:FS00:0xa5:Folding@home Core Shutdown: FINISHED_UNIT
10:13:02:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
10:13:02:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:6903 run:10 clone:22 gen:33 core:0xa5 unit:0x0000002c52be746d4de92b1520152a5b
10:13:02:WU00:FS00:Uploading 212.16MiB to 130.237.232.237
10:13:02:WU00:FS00:Connecting to 130.237.232.237:8080
10:13:02:WU01:FS00:Starting
10:13:02:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/bollix/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 01 -suffix 01 -version 701 -lifeline 1797 -checkpoint 15 -np 12
10:13:02:WU01:FS00:Started FahCore on PID 9381
10:13:02:WU01:FS00:Core PID:9385
10:13:02:WU01:FS00:FahCore 0xa5 started
10:13:02:WU01:FS00:0xa5:
10:13:02:WU01:FS00:0xa5:*------------------------------*
10:13:02:WU01:FS00:0xa5:Folding@Home Gromacs SMP Core
10:13:02:WU01:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
10:13:02:WU01:FS00:0xa5:
10:13:02:WU01:FS00:0xa5:Preparing to commence simulation
10:13:02:WU01:FS00:0xa5:- Looking at optimizations...
10:13:02:WU01:FS00:0xa5:- Created dyn
10:13:02:WU01:FS00:0xa5:- Files status OK
10:13:03:WU01:FS00:0xa5:- Expanded 7947649 -> 24903680 (decompressed 313.3 percent)
10:13:03:WU01:FS00:0xa5:Called DecompressByteArray: compressed_data_size=7947649 data_size=24903680, decompressed_data_size=24903680 diff=0
10:13:03:WU01:FS00:0xa5:- Digital signature verified
10:13:03:WU01:FS00:0xa5:
10:13:03:WU01:FS00:0xa5:Project: 6903 (Run 3, Clone 8, Gen 117)
10:13:03:WU01:FS00:0xa5:
10:13:03:WU01:FS00:0xa5:Assembly optimizations on if available.
10:13:03:WU01:FS00:0xa5:Entering M.D.
10:13:08:WU00:FS00:Upload 0.44%
10:13:09:WU01:FS00:0xa5:Mapping NT from 12 to 12 
10:13:14:WU00:FS00:Upload 0.77%
10:13:20:WU00:FS00:Upload 1.12%

SNIP


10:39:34:WU00:FS00:Upload 86.61%
10:39:45:WU00:FS00:Upload 87.02%
10:39:48:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
10:39:48:WU01:FS00:Starting
10:39:48:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/bollix/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 01 -suffix 01 -version 701 -lifeline 1797 -checkpoint 15 -np 12
10:39:48:WU01:FS00:Started FahCore on PID 9406
10:39:49:WU01:FS00:Core PID:9410
10:39:49:WU01:FS00:FahCore 0xa5 started
10:39:49:WU01:FS00:0xa5:
10:39:49:WU01:FS00:0xa5:*------------------------------*
10:39:49:WU01:FS00:0xa5:Folding@Home Gromacs SMP Core
10:39:49:WU01:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
10:39:49:WU01:FS00:0xa5:
10:39:49:WU01:FS00:0xa5:Preparing to commence simulation
10:39:49:WU01:FS00:0xa5:- Ensuring status. Please wait.
10:39:52:WU00:FS00:Upload 87.73%
10:39:59:WU01:FS00:0xa5:- Looking at optimizations...
10:39:59:WU01:FS00:0xa5:- Working with standard loops on this execution.
10:39:59:WU01:FS00:0xa5:- Previous termination of core was improper.
10:39:59:WU01:FS00:0xa5:- Files status OK
10:40:00:WU01:FS00:0xa5:- Expanded 7947649 -> 24903680 (decompressed 313.3 percent)
10:40:00:WU01:FS00:0xa5:Called DecompressByteArray: compressed_data_size=7947649 data_size=24903680, decompressed_data_size=24903680 diff=0
10:40:00:WU01:FS00:0xa5:- Digital signature verified
10:40:00:WU01:FS00:0xa5:
10:40:00:WU01:FS00:0xa5:Project: 6903 (Run 3, Clone 8, Gen 117)
10:40:00:WU01:FS00:0xa5:
10:40:00:WU01:FS00:0xa5:Entering M.D.
10:40:00:WU00:FS00:Upload 88.20%
10:40:06:WU00:FS00:Upload 88.40%
10:40:06:WU01:FS00:0xa5:Mapping NT from 12 to 12 
10:40:12:WU00:FS00:Upload 88.82%
10:40:21:WU00:FS00:Upload 89.23%

SNIP


10:43:31:WU00:FS00:Upload 99.51%
10:43:37:WU00:FS00:Upload 99.72%
10:44:10:WU00:FS00:Upload complete
10:44:10:WU00:FS00:Server responded WORK_ACK (400)
10:44:10:WU00:FS00:Final credit estimate, 306666.00 points
10:44:10:WU00:FS00:Cleaning up
11:06:37:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
11:06:37:WU01:FS00:Starting
11:06:37:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/bollix/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 01 -suffix 01 -version 701 -lifeline 1797 -checkpoint 15 -np 12
11:06:37:WU01:FS00:Started FahCore on PID 9424
11:06:38:WU01:FS00:Core PID:9428
11:06:38:WU01:FS00:FahCore 0xa5 started
11:06:39:WU01:FS00:0xa5:
11:06:39:WU01:FS00:0xa5:*------------------------------*
11:06:39:WU01:FS00:0xa5:Folding@Home Gromacs SMP Core
11:06:39:WU01:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
11:06:39:WU01:FS00:0xa5:
11:06:39:WU01:FS00:0xa5:Preparing to commence simulation
11:06:39:WU01:FS00:0xa5:- Ensuring status. Please wait.
11:06:48:WU01:FS00:0xa5:- Looking at optimizations...
11:06:48:WU01:FS00:0xa5:- Working with standard loops on this execution.
11:06:48:WU01:FS00:0xa5:- Previous termination of core was improper.
11:06:48:WU01:FS00:0xa5:- Going to use standard loops.
11:06:48:WU01:FS00:0xa5:- Files status OK
11:06:49:WU01:FS00:0xa5:- Expanded 7947649 -> 24903680 (decompressed 313.3 percent)
11:06:49:WU01:FS00:0xa5:Called DecompressByteArray: compressed_data_size=7947649 data_size=24903680, decompressed_data_size=24903680 diff=0
11:06:49:WU01:FS00:0xa5:- Digital signature verified
11:06:49:WU01:FS00:0xa5:
11:06:49:WU01:FS00:0xa5:Project: 6903 (Run 3, Clone 8, Gen 117)
11:06:49:WU01:FS00:0xa5:
11:06:49:WU01:FS00:0xa5:Entering M.D.
11:06:56:WU01:FS00:0xa5:Mapping NT from 12 to 12 
11:06:57:FS00:Paused
11:06:57:FS00:Shutting core down
11:06:59:WU01:FS00:0xa5:Client no longer detected. Shutting down core.
11:06:59:WU01:FS00:0xa5:
11:06:59:WU01:FS00:0xa5:Folding@home Core Shutdown: CLIENT_DIED
11:06:59:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Note: 10:11:27:WU01:FS00:Downloading 7.58MiB

Normal download size for a P6903 is ~54MB
Ripper36
Posts: 60
Joined: Sun Sep 18, 2011 8:55 am

6900 crashed and dumped (run:11 clone:22 gen:76 core:0xa5)

Post by Ripper36 »

WU6900 (run:11 clone:22 gen:76 core:0xa5) downloaded only 512B, core a5 started but straightaway returned FILE_IO_ERROR (117 = 0x75)

Code: Select all

08:41:59:WU03:FS00:Requesting new work unit for slot 00: RUNNING smp:8 from 130.237.232.141
08:41:59:WU03:FS00:Connecting to 130.237.232.141:8080
08:42:01:WU03:FS00:Downloading 512B
08:42:01:WU03:FS00:Download complete
08:42:01:WU03:FS00:Received Unit: id:03 state:DOWNLOAD error:OK project:6900 run:11 clone:22 gen:76 core:0xa5 unit:0x0000002952be740d4de95cfbe130fc70
08:43:06:WU01:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
08:44:38:WU01:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
08:44:38:WU01:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
08:44:48:WU01:FS00:0xa4:
08:44:48:WU01:FS00:0xa4:Finished Work Unit:
08:44:48:WU01:FS00:0xa4:- Reading up to 1294908 from "01/wudata_01.trr": Read 1294908
08:44:48:WU01:FS00:0xa4:trr file hash check passed.
08:44:48:WU01:FS00:0xa4:- Reading up to 784264 from "01/wudata_01.xtc": Read 784264
08:44:48:WU01:FS00:0xa4:xtc file hash check passed.
08:44:48:WU01:FS00:0xa4:edr file hash check passed.
08:44:48:WU01:FS00:0xa4:logfile size: 23383
08:44:48:WU01:FS00:0xa4:Leaving Run
08:44:49:WU01:FS00:0xa4:- Writing 2107959 bytes of core data to disk...
08:44:49:WU01:FS00:0xa4:Done: 2107447 -> 2021392 (compressed to 95.9 percent)
08:44:49:WU01:FS00:0xa4:  ... Done.
08:44:49:WU01:FS00:0xa4:- Shutting down core
08:44:49:WU01:FS00:0xa4:
08:44:49:WU01:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
08:44:50:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:44:50:WU01:FS00:Sending unit results: id:01 state:SEND error:OK project:8013 run:102 clone:0 gen:50 core:0xa4 unit:0x000000596652edcc4f72bc80b95ef226
08:44:50:WU01:FS00:Uploading 1.93MiB to 171.67.108.60
08:44:50:WU01:FS00:Connecting to 171.67.108.60:8080
08:44:50:WU03:FS00:Starting
08:44:50:WU03:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/John/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a5.fah/FahCore_a5.exe -dir 03 -suffix 01 -version 701 -lifeline 3056 -checkpoint 10 -np 8
08:44:50:WU03:FS00:Started FahCore on PID 7904
08:44:50:WU03:FS00:Core PID:5104
08:44:50:WU03:FS00:FahCore 0xa5 started
08:44:50:WU03:FS00:FahCore returned: FILE_IO_ERROR (117 = 0x75)
08:44:50:WARNING:WU03:FS00:Fatal error, dumping
08:44:50:WU03:FS00:Sending unit results: id:03 state:SEND error:DUMPED project:6900 run:11 clone:22 gen:76 core:0xa5 unit:0x0000002952be740d4de95cfbe130fc70
Had previously successfully completed a WU 6900

Code: Select all

06:21:09:WU00:FS00:0xa5:Completed 250000 out of 250000 steps  (100%)
06:21:22:WU00:FS00:0xa5:DynamicWrapper: Finished Work Unit: sleep=10000
06:21:32:WU00:FS00:0xa5:
06:21:32:WU00:FS00:0xa5:Finished Work Unit:
06:21:32:WU00:FS00:0xa5:- Reading up to 52713120 from "00/wudata_01.trr": Read 52713120
06:21:33:WU00:FS00:0xa5:trr file hash check passed.
06:21:33:WU00:FS00:0xa5:- Reading up to 46991916 from "00/wudata_01.xtc": Read 46991916
06:21:33:WU00:FS00:0xa5:xtc file hash check passed.
06:21:33:WU00:FS00:0xa5:edr file hash check passed.
06:21:33:WU00:FS00:0xa5:logfile size: 221498
06:21:33:WU00:FS00:0xa5:Leaving Run
06:21:35:WU00:FS00:0xa5:- Writing 100094474 bytes of core data to disk...
06:21:36:WU00:FS00:0xa5:  ... Done.
06:21:38:WU04:FS02:0x15:Completed  15000000 out of 50000000 steps (30%).
06:21:42:WU03:FS01:0x15:Completed  24500000 out of 50000000 steps (49%).
06:21:42:WU00:FS00:0xa5:- Shutting down core
06:21:42:WU00:FS00:0xa5:
06:21:42:WU00:FS00:0xa5:Folding@home Core Shutdown: FINISHED_UNIT
06:21:43:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
06:21:43:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:6900 run:42 clone:15 gen:140 core:0xa5 unit:0x0000009a52be740d4de96d806bb8f91e
06:21:43:WU00:FS00:Uploading 95.46MiB to 130.237.232.141
Was this just a bad unit or something at my end?
Image
Post Reply