Two Project: 7809 WUs dumped

Moderators: Site Moderators, FAHC Science Team

Post Reply
SomeStones
Posts: 53
Joined: Wed Aug 26, 2009 12:10 am

Two Project: 7809 WUs dumped

Post by SomeStones »

Hi 7im and rewron,

I have had two 7809s dumped in the past 24 hours. Previously I had completed a 7809 ( project:7809 run:7 clone:325 gen:7 ) on this machine on Oct 30th.

I'm running an HP P6636F with Windows 7 Home Premium 64-bit. It has 8G of ram. The processor is a Phenom II 4X 840 running at 3.2 gHz - not overclocked. Video is a GeForce 9800 GTX (factory overclocked EVGA 512-P3-N872-AR).

I'm using Folding@Home Client Control 7.1.24 with the cores divided as 2 smp clients and a GPU client all running together. The system has been remarkably stable except when I try to run the GPU faster.

As you can imagine, the log file is large and difficult to read. I have extracted and commented the parts pertaining to the loading, starting, and completion of the two WUs in question:

Code: Select all

~~~~~~~~~~~~~~~~~~~~~~~~~~~First dump of 7809~~~~~~~~~~~~


Previous unit (Unit 01) reaches 99% and new unit ( project:7809 run:6 clone:300 gen:9 ) is fetched in anticipation of completion of Unit 01. New unit will be assigned as Unit 00.


15:48:49:Unit 01:Completed 1980000 out of 2000000 steps  (99%)
15:48:49:Connecting to assign3.stanford.edu:8080
15:48:50:News: Welcome to Folding@Home
15:48:50:Assigned to work server 171.64.65.99
15:48:50:Requesting new work unit for slot 01: RUNNING smp:2 from 171.64.65.99
15:48:50:Connecting to 171.64.65.99:8080
15:48:52:Slot 01: Downloading 1.98MiB
15:48:58:Slot 01: 19.01%
15:49:04:Slot 01: 36.95%
15:49:10:Slot 01: 54.15%
15:49:16:Slot 01: 73.65%
15:49:22:Slot 01: 93.14%
15:49:25:Slot 01: Download complete
15:49:25:Received Unit: id:00 state:DOWNLOAD project:7809 run:6 clone:300 gen:9 core:0xa4 unit:0x000000090a3b1e874e3112d79e56bf89
16:11:41:Unit 01:Completed 2000000 out of 2000000 steps  (100%)
16:11:41:Unit 01:DynamicWrapper: Finished Work Unit: sleep=10000
16:11:51:Unit 01:
16:11:51:Unit 01:Finished Work Unit:
16:11:51:Unit 01:- Reading up to 446424 from "01/wudata_01.trr": Read 446424
16:11:51:Unit 01:trr file hash check passed.
16:11:51:Unit 01:- Reading up to 261584 from "01/wudata_01.xtc": Read 261584
16:11:51:Unit 01:xtc file hash check passed.
16:11:51:Unit 01:edr file hash check passed.
16:11:51:Unit 01:logfile size: 34947
16:11:51:Unit 01:Leaving Run
16:11:55:Unit 01:- Writing 749831 bytes of core data to disk...
16:11:55:Unit 01:Done: 749319 -> 699677 (compressed to 93.3 percent)
16:11:55:Unit 01:  ... Done.
16:12:01:Unit 01:- Shutting down core
16:12:01:Unit 01:
16:12:01:Unit 01:Folding@home Core Shutdown: FINISHED_UNIT
16:12:02:FahCore, running Unit 01, returned: FINISHED_UNIT (100)
16:12:02:Sending unit results: id:01 state:SEND project:7600 run:30 clone:51 gen:14 core:0xa4 unit:0x0000001d664f2dcd4dee8a628f9ca7d4
16:12:02:Unit 01: Uploading 683.78KiB
16:12:02:Starting Unit 00
16:12:02:Connecting to 171.64.65.101:8080
16:12:02:Running core: C:/Users/Squinch/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -lifeline 3536 -version 701 -checkpoint 12 -np 2 -forceasm
16:12:02:Started core on PID 276
16:12:02:FahCore 0xa4 started
16:12:02:Unit 00:
16:12:02:Unit 00:*------------------------------*
16:12:02:Unit 00:Folding@Home Gromacs GB Core
16:12:02:Unit 00:Version 2.27 (Dec. 15, 2010)
16:12:02:Unit 00:
16:12:02:Unit 00:Preparing to commence simulation
16:12:02:Unit 00:- Assembly optimizations manually forced on.
16:12:02:Unit 00:- Not checking prior termination.
16:12:02:Unit 00:- Expanded 2079472 -> 5386224 (decompressed 259.0 percent)
16:12:02:Unit 00:Called DecompressByteArray: compressed_data_size=2079472 data_size=5386224, decompressed_data_size=5386224 diff=0
16:12:03:Unit 00:- Digital signature verified
16:12:03:Unit 00:
16:12:03:Unit 00:Project: 7809 (Run 6, Clone 300, Gen 9)
16:12:03:Unit 00:
16:12:03:Unit 00:Assembly optimizations on if available.
16:12:03:Unit 00:Entering M.D.
16:12:08:Unit 01: 36.85%
16:12:09:Unit 00:Mapping NT from 2 to 2 
16:12:09:Unit 00:Completed 0 out of 1500000 steps  (0%)
16:12:14:Unit 01: 76.63%
16:12:17:Unit 01: Upload complete
16:12:17:Server responded WORK_ACK (400)
16:12:17:Final credit estimate, 3055.00 points
16:12:17:Cleaning up Unit 01
16:51:50:Unit 00:Completed 15000 out of 1500000 steps  (1%)


etc., etc, until 99% complete on Unit 00 ( project:7809 run:6 clone:300 gen:9 )


09:50:27:Unit 00:Completed 1485000 out of 1500000 steps  (99%)
09:50:28:Connecting to assign3.stanford.edu:8080
09:50:28:News: Welcome to Folding@Home
09:50:28:Assigned to work server 129.74.85.15
09:50:28:Requesting new work unit for slot 01: RUNNING smp:2 from 129.74.85.15
09:50:28:Connecting to 129.74.85.15:8080
09:50:29:Slot 01: Downloading 52.87KiB
09:50:29:Slot 01: Download complete
09:50:29:Received Unit: id:03 state:DOWNLOAD project:7008 run:2 clone:58 gen:65 core:0xa4 unit:0x000001000001329c4dfb927a1c2f1bbe
10:32:07:Unit 00:Completed 1500000 out of 1500000 steps  (100%)
10:32:07:Unit 00:DynamicWrapper: Finished Work Unit: sleep=10000
10:32:17:Unit 00:
10:32:17:Unit 00:Finished Work Unit:
10:32:17:Unit 00:- Reading up to 2908800 from "00/wudata_01.trr": Read 2908800
10:32:17:Unit 00:trr file hash check passed.
10:32:17:Unit 00:- Reading up to 1554492 from "00/wudata_01.xtc": Read 1554492
10:32:17:Unit 00:xtc file hash check passed.
10:32:17:Unit 00:edr file hash check passed.
10:32:17:Unit 00:logfile size: 43443
10:32:17:Unit 00:Leaving Run
10:32:19:Unit 00:- Writing 4511747 bytes of core data to disk...
10:32:20:Unit 00:Done: 4511235 -> 4326360 (compressed to 95.9 percent)
10:32:20:Unit 00:  ... Done.
10:32:44:Unit 02:Completed 80%
10:32:53:Unit 00:- Shutting down core
10:32:53:Unit 00:
10:32:53:Unit 00:Folding@home Core Shutdown: FINISHED_UNIT
10:32:58:FahCore, running Unit 00, returned: FINISHED_UNIT (100)
10:32:58:Sending unit results: id:00 state:SEND project:7809 run:6 clone:300 gen:9 core:0xa4 unit:0x000000090a3b1e874e3112d79e56bf89
10:32:58:Unit 00: Uploading 4.13MiB
10:32:58:Starting Unit 03
10:32:58:Connecting to 171.64.65.99:8080
10:32:58:Running core: C:/Users/Squinch/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 03 -suffix 01 -lifeline 3536 -version 701 -checkpoint 12 -np 2 -forceasm
10:32:58:Started core on PID 5004
10:32:58:FahCore 0xa4 started
10:33:04:Unit 00: 5.68%
10:33:10:Unit 00: 12.02%
10:33:16:Unit 00: 18.36%
10:33:22:Unit 00: 24.61%
10:33:28:Unit 00: 30.96%
10:33:34:Unit 00: 37.20%
10:33:40:Unit 00: 43.36%
10:33:46:Unit 00: 49.60%
10:33:52:Unit 00: 55.66%
10:33:58:Unit 00: 61.91%
10:34:04:Unit 00: 68.06%
10:34:10:Unit 00: 74.31%
10:34:16:Unit 00: 80.65%
10:34:22:Unit 00: 86.62%
10:34:28:Unit 00: 92.96%
10:34:34:Unit 00: 99.21%
10:34:34:Unit 00: Upload complete


Dump of project:7809 run:6 clone:300 gen:9:


10:34:34:Server responded WORK_QUIT (404)
10:34:34:WARNING: Server did not like results, dumping
10:34:35:Cleaning up Unit 00


~~~~~~~~~~~~~~~~~~~~~~~~~~~Second dump of 7809~~~~~~~~~~~~


Previous unit (Unit 02) reaches 99% and new unit ( project:7809 run:5 clone:191 gen:7 ) is fetched in anticipation of completion of Unit 02. New Unit will be assigned as Unit 01.

07:17:23:Unit 02:Completed 1980000 out of 2000000 steps  (99%)
07:17:24:Connecting to assign3.stanford.edu:8080
07:17:24:News: Welcome to Folding@Home
07:17:24:Assigned to work server 171.64.65.99
07:17:24:Requesting new work unit for slot 00: RUNNING smp:2 from 171.64.65.99
07:17:24:Connecting to 171.64.65.99:8080
07:17:25:Slot 00: Downloading 1.98MiB
07:17:31:Slot 00: 16.54%
07:17:37:Slot 00: 53.13%
07:17:43:Slot 00: 77.77%
07:17:46:Slot 00: Download complete
07:17:46:Received Unit: id:01 state:DOWNLOAD project:7809 run:5 clone:191 gen:7 core:0xa4 unit:0x000000090a3b1e874e31102fbed63d6d
07:37:16:Unit 02:Completed 2000000 out of 2000000 steps  (100%)
07:37:16:Unit 02:DynamicWrapper: Finished Work Unit: sleep=10000
07:37:26:Unit 02:
07:37:26:Unit 02:Finished Work Unit:
07:37:26:Unit 02:- Reading up to 488544 from "02/wudata_01.trr": Read 488544
07:37:26:Unit 02:trr file hash check passed.
07:37:26:Unit 02:- Reading up to 57832 from "02/wudata_01.xtc": Read 57832
07:37:26:Unit 02:xtc file hash check passed.
07:37:26:Unit 02:edr file hash check passed.
07:37:26:Unit 02:logfile size: 45843
07:37:26:Unit 02:Leaving Run
07:37:29:Unit 02:- Writing 616135 bytes of core data to disk...
07:37:29:Unit 02:Done: 615623 -> 546248 (compressed to 88.7 percent)
07:37:29:Unit 02:  ... Done.
07:37:35:Unit 02:- Shutting down core
07:37:35:Unit 02:
07:37:35:Unit 02:Folding@home Core Shutdown: FINISHED_UNIT
07:37:36:FahCore, running Unit 02, returned: FINISHED_UNIT (100)
07:37:36:Sending unit results: id:02 state:SEND project:7610 run:332 clone:0 gen:40 core:0xa4 unit:0x00000034664f2dd04de6d3dcbdc79a5a
07:37:36:Unit 02: Uploading 533.95KiB
07:37:36:Starting Unit 01
07:37:36:Connecting to 171.64.65.104:8080
07:37:36:Running core: C:/Users/Squinch/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -lifeline 3536 -version 701 -checkpoint 12 -np 2 -forceasm
07:37:36:Started core on PID 1764
07:37:36:FahCore 0xa4 started
07:37:36:Unit 01:
07:37:36:Unit 01:*------------------------------*
07:37:36:Unit 01:Folding@Home Gromacs GB Core
07:37:36:Unit 01:Version 2.27 (Dec. 15, 2010)
07:37:36:Unit 01:
07:37:36:Unit 01:Preparing to commence simulation
07:37:36:Unit 01:- Assembly optimizations manually forced on.
07:37:36:Unit 01:- Not checking prior termination.
07:37:36:Unit 01:- Expanded 2079605 -> 5386224 (decompressed 259.0 percent)
07:37:36:Unit 01:Called DecompressByteArray: compressed_data_size=2079605 data_size=5386224, decompressed_data_size=5386224 diff=0
07:37:37:Unit 01:- Digital signature verified
07:37:37:Unit 01:
07:37:37:Unit 01:Project: 7809 (Run 5, Clone 191, Gen 7)
07:37:37:Unit 01:
07:37:37:Unit 01:Assembly optimizations on if available.
07:37:37:Unit 01:Entering M.D.
07:37:42:Unit 02: 44.20%
07:37:43:Unit 01:Mapping NT from 2 to 2 
07:37:43:Unit 01:Completed 0 out of 1500000 steps  (0%)
07:37:48:Unit 02: 95.89%
07:37:49:Unit 02: Upload complete
07:37:49:Server responded WORK_ACK (400)
07:37:49:Final credit estimate, 2642.00 points
07:37:49:Cleaning up Unit 02
08:20:31:Unit 01:Completed 15000 out of 1500000 steps  (1%)


etc., etc, until 99% complete on Unit 01 ( project:7809 run:5 clone:191 gen:7 )


02:52:45:Unit 01:Completed 1485000 out of 1500000 steps  (99%)
02:52:46:Connecting to assign3.stanford.edu:8080
02:52:46:News: Welcome to Folding@Home
02:52:46:Assigned to work server 129.74.85.15
02:52:46:Requesting new work unit for slot 00: RUNNING smp:2 from 129.74.85.15
02:52:46:Connecting to 129.74.85.15:8080
02:52:47:Slot 00: Downloading 53.90KiB
02:52:47:Slot 00: Download complete
02:52:47:Received Unit: id:00 state:DOWNLOAD project:7002 run:0 clone:0 gen:65 core:0xa4 unit:0x000000e80001329c4dfb8274b23c77b9
03:34:29:Unit 01:Completed 1500000 out of 1500000 steps  (100%)
03:34:30:Unit 01:DynamicWrapper: Finished Work Unit: sleep=10000
03:34:40:Unit 01:
03:34:40:Unit 01:Finished Work Unit:
03:34:40:Unit 01:- Reading up to 2908800 from "01/wudata_01.trr": Read 2908800
03:34:40:Unit 01:trr file hash check passed.
03:34:40:Unit 01:- Reading up to 1554516 from "01/wudata_01.xtc": Read 1554516
03:34:40:Unit 01:xtc file hash check passed.
03:34:40:Unit 01:edr file hash check passed.
03:34:40:Unit 01:logfile size: 44345
03:34:40:Unit 01:Leaving Run
03:34:45:Unit 01:- Writing 4512673 bytes of core data to disk...
03:34:46:Unit 01:Done: 4512161 -> 4327143 (compressed to 95.8 percent)
03:34:46:Unit 01:  ... Done.
03:35:19:Unit 01:- Shutting down core
03:35:19:Unit 01:
03:35:19:Unit 01:Folding@home Core Shutdown: FINISHED_UNIT
03:35:24:FahCore, running Unit 01, returned: FINISHED_UNIT (100)
03:35:24:Sending unit results: id:01 state:SEND project:7809 run:5 clone:191 gen:7 core:0xa4 unit:0x000000090a3b1e874e31102fbed63d6d
03:35:24:Unit 01: Uploading 4.13MiB
03:35:24:Starting Unit 00
03:35:24:Connecting to 171.64.65.99:8080
03:35:24:Running core: C:/Users/Squinch/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -lifeline 3536 -version 701 -checkpoint 12 -np 2 -forceasm
03:35:24:Started core on PID 4408
03:35:24:FahCore 0xa4 started
03:35:25:Unit 00:
03:35:25:Unit 00:*------------------------------*
03:35:25:Unit 00:Folding@Home Gromacs GB Core
03:35:25:Unit 00:Version 2.27 (Dec. 15, 2010)
03:35:25:Unit 00:
03:35:25:Unit 00:Preparing to commence simulation
03:35:25:Unit 00:- Assembly optimizations manually forced on.
03:35:25:Unit 00:- Not checking prior termination.
03:35:25:Unit 00:- Expanded 54678 -> 203368 (decompressed 371.9 percent)
03:35:25:Unit 00:Called DecompressByteArray: compressed_data_size=54678 data_size=203368, decompressed_data_size=203368 diff=0
03:35:25:Unit 00:- Digital signature verified
03:35:25:Unit 00:
03:35:25:Unit 00:Project: 7002 (Run 0, Clone 0, Gen 65)
03:35:25:Unit 00:
03:35:25:Unit 00:Assembly optimizations on if available.
03:35:25:Unit 00:Entering M.D.
03:35:30:Unit 01: 5.77%
03:35:30:Unit 00:Mapping NT from 2 to 2 
03:35:30:Unit 00:Completed 0 out of 10000000 steps  (0%)
03:35:36:Unit 01: 11.93%
03:35:42:Unit 01: 18.27%
03:35:48:Unit 01: 24.51%
03:35:54:Unit 01: 30.95%
03:36:00:Unit 01: 35.87%
03:36:06:Unit 01: 42.31%
03:36:12:Unit 01: 48.84%
03:36:18:Unit 01: 55.27%
03:36:24:Unit 01: 61.71%
03:36:30:Unit 01: 68.15%
03:36:36:Unit 01: 74.39%
03:36:42:Unit 01: 81.02%
03:36:48:Unit 01: 87.55%
03:36:54:Unit 01: 93.98%
03:36:59:Unit 01: Upload complete


Dump of project:7809 run:5 clone:191 gen:7:


03:36:59:Server responded WORK_QUIT (404)
03:36:59:WARNING: Server did not like results, dumping
03:37:00:Cleaning up Unit 01
Everything looked normal to me... right up until the end.

Thanks for your help!

Mod Edit: Changed Quote Tags To Code Tags - PantherX
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Two Project: 7809 WUs dumped

Post by sortofageek »

From the database currently:


First dumped WU:
Project: 7809 (Run 6, Clone 300, Gen 9)
No data back from query

Second dumped WU:
Project: 7809 (Run 5, Clone 191, Gen 7)
No data back from query
SomeStones
Posts: 53
Joined: Wed Aug 26, 2009 12:10 am

Re: Two Project: 7809 WUs dumped

Post by SomeStones »

Hi sortofageek. I'm not sure what your reply means. Does it mean that there are no records available concerning these two dumped units?

I notice that three of the top four "Issues with a specific WU" are related back to 7809. It is a pretty large unit and the letdown for its failure is about the same. Is there anything we can do to help find a solution?

Thanks!
Tobit
Posts: 342
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

Re: Two Project: 7809 WUs dumped

Post by Tobit »

On this forum, I've seen multiple reports of people having this exact problem with 7809.
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Two Project: 7809 WUs dumped

Post by sortofageek »

I see three recent topics which may or may not be presenting a pattern. This message is often an indication of marginally stable overclocking.

As always, the FAH project never recommends overclocking. If you do overclock, you may find that specific projects may stress your computer differently than other projects. If the WU runs successfully on a non-overclocked computer, fixing it is completely your own responsibility.
This is my track so far.

So far, possibly related:
viewtopic.php?f=19&t=19960&p=198451#p198451
viewtopic.php?f=19&t=19953
viewtopic.php?f=19&t=19954


Server reports problem with 6900 unit, but two others completed it:
viewtopic.php?f=19&t=19916


All I know at this point is that the mod database confirms that SomeStones has not received credit. Nor has anyone else for those two WUs.
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Two Project: 7809 WUs dumped

Post by sortofageek »

SomeStones wrote:Hi sortofageek. I'm not sure what your reply means. Does it mean that there are no records available concerning these two dumped units?

I notice that three of the top four "Issues with a specific WU" are related back to 7809. It is a pretty large unit and the letdown for its failure is about the same. Is there anything we can do to help find a solution?

Thanks!

Hi SomeStones. My apologies that my message was not clear. Yes, it means that there were no records so far that anyone has been able to send back results for those two WUs. I just checked again and that is still the case.

Yes, we now have three topics with similar complaints about Project 7809. They were in the same thread and I split them to separate topics. We're watching to see if a pattern emerges.

The only thing I know you can do is to continue to monitor your folding and report problems just as you did here.
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Two Project: 7809 WUs dumped

Post by sortofageek »

I am now wondering if the problems we have been seeing might be related to the server upgrade. Please see this post ---> viewtopic.php?f=18&t=19982
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Two Project: 7809 WUs dumped

Post by 7im »

How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
SomeStones
Posts: 53
Joined: Wed Aug 26, 2009 12:10 am

Re: Two Project: 7809 WUs dumped

Post by SomeStones »

Thanks 7im, sortofageek, and schwancr.
Post Reply