Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Moderators: Site Moderators, FAHC Science Team

Post Reply
-alias-
Posts: 121
Joined: Sun Feb 22, 2009 1:20 pm

Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Post by -alias- »

Delivered a P6904, but told that it has already been delivered, and thus no credit & bonus, which totaled approximately 500K points.

Code: Select all

[02:07:54] Project: 6904 (Run 2, Clone 36, Gen 89)
[02:07:54] 
[02:07:54] Assembly optimizations on if available.
[02:07:54] Entering M.D.
[02:08:01] Mapping NT from 24 to 24 
[02:08:06] Completed 0 out of 250000 steps  (0%)
[02:12:34] ng M.D.
[02:12:40] Using Gromacs checkpoints
[02:12:43] Mapping NT from 24 to 24 
[02:13:12] Resuming from checkpoint
[02:13:18] Verified work/wudata_00.log
[02:13:19] Verified work/wudata_00.trr
[02:13:19] Verified work/wudata_00.xtc
[02:13:19] Verified work/wudata_00.edr
[02:13:20] Completed 205 out of 250000 steps  (0%)
[02:35:11] - Autosending finished units... [March 8 02:35:11 UTC]
[02:35:11] Trying to send all finished work units
[02:35:11] + No unsent completed units remaining.
[02:35:11] - Autosend completed
[02:47:47] Completed 2500 out of 250000 steps  (1%)
[03:25:23] Completed 5000 out of 250000 steps  (2%)
[04:02:53] Completed 7500 out of 250000 steps  (3%)
[04:40:27] Completed 10000 out of 250000 steps  (4%)
[05:17:56] Completed 12500 out of 250000 steps  (5%)
***
[15:15:44] Completed 240000 out of 250000 steps (96%)
[15:51:31] Completed 242500 out of 250000 steps (97%)
[16:27:17] Completed 245000 out of 250000 steps (98%)
[17:03:03] Completed 247500 out of 250000 steps (99%)
[17:38:51] Completed 250000 out of 250000 steps (100%)
[17:39:10] DynamicWrapper: Finished Work Unit: sleep=10000
[17:39:20]
[17:39:20] Finished Work Unit:
[17:39:20] - Reading up to 121544064 from "work/wudata_00.trr": Read 121544064
[17:39:21] trr file hash check passed.
[17:39:21] - Reading up to 108757368 from "work/wudata_00.xtc": Read 108757368
[17:39:21] xtc file hash check passed.
[17:39:21] edr file hash check passed.
[17:39:21] logfile size: 277731
[17:39:21] Leaving Run
[17:39:24] - Writing 230752155 bytes of core data to disk...
[17:39:55] Done: 230751643 -> 222365552 (compressed to 3.3 percent)
[17:39:55] ... Done.
[17:40:11] - Shutting down core
[17:40:11]
[17:40:11] Folding@home Core Shutdown: FINISHED_UNIT
[17:40:13] CoreStatus = 64 (100)
[17:40:13] Unit 0 finished with 84 percent of time to deadline remaining.
[17:40:13] Updated performance fraction: 0.856052
[17:40:13] Sending work to server
[17:40:13] Project: 6904 (Run 2, Clone 36, Gen 89)
[17:40:13] + Attempting to send results [March 10 17:40:13 UTC]
[17:40:13] - Reading file work/wuresults_00.dat from core
[17:40:13] (Read 222366064 bytes from disk)
[17:40:13] Connecting to http://130.237.232.237:8080/
[17:46:11] Posted data.
[17:46:11] Initial: 0000; - Uploaded at ~606 kB/s
[17:46:11] - Averaged speed for that direction ~585 kB/s
[17:46:11] - Server has already received unit. 
A long time ago, but this has happened to me before, so maybe someone can look at what has happened? Should like to have those around 500K points.
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Post by bruce »

The WU was uploaded prior to 2012-03-09 17:03:48 UTC and contained an error. You can't upload it again between 17:40 and 17:46, even if it has been completed a second time without error.

Entered into logs at: 2012-03-10 01:03:48 PST
Hi -alias- (team 37651),
Your WU (P6904 R2 C36 G89) was added to the stats database on 2012-03-10 01:08:09 for 0 points of credit.
-alias-
Posts: 121
Joined: Sun Feb 22, 2009 1:20 pm

Re: Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Post by -alias- »

I have a backup system in case of error in/on hardware that runs after every %. If an error occurs, the client is restarted from the last error-free%. What is wrong with this system? The work unit that were uploaded had an error because it was not finished. The last one that was uploaded error-free, but was rejected because the first had an error. Anyway, if your system is such that it is only based on the aborted WU, then I give up, but I think that my backup system is better.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Post by Grandpa_01 »

-alias- wrote:I have a backup system in case of error in/on hardware that runs after every %. If an error occurs, the client is restarted from the last error-free%. What is wrong with this system? The work unit that were uploaded had an error because it was not finished. The last one that was uploaded error-free, but was rejected because the first had an error. Anyway, if your system is such that it is only based on the aborted WU, then I give up, but I think that my backup system is better.
Just a guess but I am thinking you are running a backup system because you are running on OCed hardware and occasionally have problems. Which is OK I do it also, but I also disconnect my rigs from the Internet when I am using the backup system just in case they error they will not send and receive another WU, when you error-ed out and sent the WU you received another 1 perhaps it was the same one or maybe it was not. Irregardless you had to do something with the WU received what did you do with it.? And by the way a WU will not send before it finishes unless it errors and or crashes.

Obviously you started the WU from the backup you had so the WU and the time you had into folding it to that point would not be lost. So how is the system you are using better if it is not being used properly. You need to take 1 more step and disconnect from the Internet if you are going to use the system as it was meant to be used. I only use the backup system when I am working on an overclock trying to get it folding stable and if I crash a WU it is my fault not the WU's fault so I accept the risk knowing that the only way a WU will fail is if I have a unstable machine and I do everything within my power to prevent that. So just take that 1 more step and the system you are using will be better for what you are doing. And if you are not OCing then you have a hardware problem that needs to be fixed. :wink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
-alias-
Posts: 121
Joined: Sun Feb 22, 2009 1:20 pm

Re: Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Post by -alias- »

Here is the WU before the troubled one

Code: Select all

[01:58:14] + Attempting to send results [March 8 01:58:14 UTC]
[01:58:14] - Reading file work/wuresults_09.dat from core
[01:58:15]   (Read 222321075 bytes from disk)
[01:58:15] Connecting to http://130.237.232.237:8080/
[02:04:31] Posted data.
[02:04:31] Initial: 0000; - Uploaded at ~577 kB/s
[02:04:31] - Averaged speed for that direction ~580 kB/s
[02:04:31] + Results successfully sent
[02:04:31] Thank you for your contribution to Folding@Home.
[02:04:31] + Number of Units Completed: 25
It shows that this machine have completed 25 WUs, and down under it starting the troubled one.

Code: Select all

[02:07:54] Project: 6904 (Run 2, Clone 36, Gen 89)
[02:07:54] 
[02:07:54] Assembly optimizations on if available.
[02:07:54] Entering M.D.
[02:08:01] Mapping NT from 24 to 24 
[02:08:06] Completed 0 out of 250000 steps  (0%)
[02:12:34] ng M.D.
[02:12:40] Using Gromacs checkpoints
[02:12:43] Mapping NT from 24 to 24 
[02:13:12] Resuming from checkpoint
[02:13:18] Verified work/wudata_00.log
[02:13:19] Verified work/wudata_00.trr
[02:13:19] Verified work/wudata_00.xtc
[02:13:19] Verified work/wudata_00.edr
[02:13:20] Completed 205 out of 250000 steps  (0%)
[02:35:11] - Autosending finished units... [March 8 02:35:11 UTC]
[02:35:11] Trying to send all finished work units
[02:35:11] + No unsent completed units remaining.
[02:35:11] - Autosend completed
[02:47:47] Completed 2500 out of 250000 steps  (1%)
[03:25:23] Completed 5000 out of 250000 steps  (2%)
[04:02:53] Completed 7500 out of 250000 steps  (3%)
[04:40:27] Completed 10000 out of 250000 steps  (4%)
[05:17:56] Completed 12500 out of 250000 steps  (5%)
[05:55:28] Completed 15000 out of 250000 steps  (6%)
[06:32:58] Completed 17500 out of 250000 steps  (7%)
[07:10:38] Completed 20000 out of 250000 steps  (8%)
[07:48:09] Completed 22500 out of 250000 steps  (9%)
[08:25:46] Completed 25000 out of 250000 steps  (10%)
[08:35:11] - Autosending finished units... [March 8 08:35:11 UTC]
[08:35:11] Trying to send all finished work units
[08:35:11] + No unsent completed units remaining.
[08:35:11] - Autosend completed
[09:03:23] Completed 27500 out of 250000 steps  (11%)
[09:41:01] Completed 30000 out of 250000 steps  (12%)
[10:18:35] Completed 32500 out of 250000 steps  (13%)
[10:56:12] Completed 35000 out of 250000 steps  (14%)
[11:33:42] Completed 37500 out of 250000 steps  (15%)
[12:11:17] Completed 40000 out of 250000 steps  (16%)
[12:48:46] Completed 42500 out of 250000 steps  (17%)
[13:26:19] Completed 45000 out of 250000 steps  (18%)
[14:03:48] Completed 47500 out of 250000 steps  (19%)
[14:35:11] - Autosending finished units... [March 8 14:35:11 UTC]
[14:35:11] Trying to send all finished work units
[14:35:11] + No unsent completed units remaining.
[14:35:11] - Autosend completed
[14:41:23] Completed 50000 out of 250000 steps  (20%)
[15:18:54] Completed 52500 out of 250000 steps  (21%)
[15:56:26] Completed 55000 out of 250000 steps  (22%)
[16:33:58] Completed 57500 out of 250000 steps  (23%)
[17:11:35] Completed 60000 out of 250000 steps  (24%)
[17:49:09] Completed 62500 out of 250000 steps  (25%)
[18:26:40] Completed 65000 out of 250000 steps  (26%)
[19:04:14] Completed 67500 out of 250000 steps  (27%)
[19:41:48] Completed 70000 out of 250000 steps  (28%)
[20:19:29] Completed 72500 out of 250000 steps  (29%)
[20:35:11] - Autosending finished units... [March 8 20:35:11 UTC]
[20:35:11] Trying to send all finished work units
[20:35:11] + No unsent completed units remaining.
[20:35:11] - Autosend completed
[20:57:03] Completed 75000 out of 250000 steps  (30%)
[21:34:44] Completed 77500 out of 250000 steps  (31%)
[22:12:21] Completed 80000 out of 250000 steps  (32%)
[22:50:00] Completed 82500 out of 250000 steps  (33%)
[23:27:36] Completed 85000 out of 250000 steps  (34%)
[00:05:14] Completed 87500 out of 250000 steps  (35%)
[00:42:47] Completed 90000 out of 250000 steps  (36%)
[01:20:22] Completed 92500 out of 250000 steps  (37%)
[01:57:59] Completed 95000 out of 250000 steps  (38%)
[02:35:11] - Autosending finished units... [March 9 02:35:11 UTC]
[02:35:11] Trying to send all finished work units
[02:35:11] + No unsent completed units remaining.
[02:35:11] - Autosend completed
[02:35:38] Completed 97500 out of 250000 steps  (39%)
[03:13:17] Completed 100000 out of 250000 steps  (40%)
[03:50:51] Completed 102500 out of 250000 steps  (41%)
[04:28:31] Completed 105000 out of 250000 steps  (42%)
[05:06:05] Completed 107500 out of 250000 steps  (43%)
[05:43:44] Completed 110000 out of 250000 steps  (44%)
[06:21:20] Completed 112500 out of 250000 steps  (45%)
[06:59:31] Completed 115000 out of 250000 steps  (46%)
[07:37:08] Completed 117500 out of 250000 steps  (47%)
[08:14:38] Completed 120000 out of 250000 steps  (48%)
[08:35:11] - Autosending finished units... [March 9 08:35:11 UTC]
[08:35:11] Trying to send all finished work units
[08:35:11] + No unsent completed units remaining.
[08:35:11] - Autosend completed
[08:52:18] Completed 122500 out of 250000 steps  (49%)
[09:29:53] Completed 125000 out of 250000 steps  (50%)
[10:07:33] Completed 127500 out of 250000 steps  (51%)
[10:45:12] Completed 130000 out of 250000 steps  (52%)
[11:22:52] Completed 132500 out of 250000 steps  (53%)
[12:00:31] Completed 135000 out of 250000 steps  (54%)
[12:38:16] Completed 137500 out of 250000 steps  (55%)
[13:15:59] Completed 140000 out of 250000 steps  (56%)
[13:53:45] Completed 142500 out of 250000 steps  (57%)
[14:31:29] Completed 145000 out of 250000 steps  (58%)
[14:35:11] - Autosending finished units... [March 9 14:35:11 UTC]
[14:35:11] Trying to send all finished work units
[14:35:11] + No unsent completed units remaining.
[14:35:11] - Autosend completed
[15:09:10] Completed 147500 out of 250000 steps  (59%)
[15:46:56] Completed 150000 out of 250000 steps  (60%)
[16:24:35] Completed 152500 out of 250000 steps  (61%)
[17:02:20] Completed 155000 out of 250000 steps  (62%)
[17:39:57] Completed 157500 out of 250000 steps  (63%)
[18:17:39] Completed 160000 out of 250000 steps  (64%)
[18:55:21] Completed 162500 out of 250000 steps  (65%)
[19:33:02] Completed 165000 out of 250000 steps  (66%)
[20:10:48] Completed 167500 out of 250000 steps  (67%)
[20:35:11] - Autosending finished units... [March 9 20:35:11 UTC]
[20:35:11] Trying to send all finished work units
[20:35:11] + No unsent completed units remaining.
[20:35:11] - Autosend completed
[20:48:33] Completed 170000 out of 250000 steps  (68%)
[21:26:25] Completed 172500 out of 250000 steps  (69%)
[22:04:21] Completed 175000 out of 250000 steps  (70%)
[22:42:11] Completed 177500 out of 250000 steps  (71%)
[23:20:05] Completed 180000 out of 250000 steps  (72%)
[23:57:56] Completed 182500 out of 250000 steps  (73%)
[00:35:51] Completed 185000 out of 250000 steps  (74%)
[01:13:51] Completed 187500 out of 250000 steps  (75%)
[01:51:48] Completed 190000 out of 250000 steps  (76%)
[02:29:48] Completed 192500 out of 250000 steps  (77%)
[02:35:11] - Autosending finished units... [March 10 02:35:11 UTC]
[02:35:11] Trying to send all finished work units
[02:35:11] + No unsent completed units remaining.
[02:35:11] - Autosend completed
[03:07:47] Completed 195000 out of 250000 steps  (78%)
[03:45:41] Completed 197500 out of 250000 steps  (79%)
[04:23:36] Completed 200000 out of 250000 steps  (80%)
[05:01:31] Completed 202500 out of 250000 steps  (81%)
[05:39:23] Completed 205000 out of 250000 steps  (82%)
[06:17:13] Completed 207500 out of 250000 steps  (83%)
[06:55:36] Completed 210000 out of 250000 steps  (84%)
[07:33:24] Completed 212500 out of 250000 steps  (85%)
Here I stoppet the machine to do some maintance, and the backup was taken when 85% was done.
Starting the machine again:

Code: Select all

--- Opening Log file [March 10 08:55:58 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/vidar/fah
Executable: ./fah6
Arguments: -smp 24 -bigadv -verbosity 9 

[08:55:58] - Ask before connecting: No
[08:55:58] - User name: -alias- (Team 37651)
[08:55:58] - User ID: 6796771C5F8BE568
[08:55:58] - Machine ID: 2
[08:55:58] 
[08:55:58] Loaded queue successfully.
[08:55:58] 
[08:55:58] + Processing work unit
[08:55:58] Core required: FahCore_a5.exe
[08:55:58] Core found.
[08:55:58] - Autosending finished units... [March 10 08:55:58 UTC]
[08:55:58] Trying to send all finished work units
[08:55:58] + No unsent completed units remaining.
[08:55:58] - Autosend completed
[08:55:58] Working on queue slot 00 [March 10 08:55:58 UTC]
[08:55:58] + Working ...
[08:55:58] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 00 -np 24 -checkpoint 3 -verbose -lifeline 2130 -version 634'

[08:55:58] 
[08:55:58] *------------------------------*
[08:55:58] Folding@Home Gromacs SMP Core
[08:55:58] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[08:55:58] 
[08:55:58] Preparing to commence simulation
[08:55:58] - Ensuring status. Please wait.
[08:56:08] - Looking at optimizations...
[08:56:08] - Working with standard loops on this execution.
[08:56:08] - Previous termination of core was improper.
[08:56:08] - Going to use standard loops.
[08:56:08] - Files status OK
[08:56:11] - Expanded 57210974 -> 71843392 (decompressed 50.5 percent)
[08:56:11] Called DecompressByteArray: compressed_data_size=57210974 data_size=71843392, decompressed_data_size=71843392 diff=0
[08:56:12] - Digital signature verified
[08:56:12] 
[08:56:12] Project: 6904 (Run 2, Clone 36, Gen 89)
[08:56:12] 
[08:56:12] Entering M.D.
[08:56:18] Using Gromacs checkpoints
[08:56:21] Mapping NT from 24 to 24 
[08:56:28] Resuming from checkpoint
[08:56:41] Verified work/wudata_00.log
[08:56:42] Verified work/wudata_00.trr
[08:56:43] Verified work/wudata_00.xtc
[08:56:43] Verified work/wudata_00.edr
[08:56:44] Completed 213530 out of 250000 steps  (85%)
[09:17:45] Completed 215000 out of 250000 steps  (86%)
[09:53:30] Completed 217500 out of 250000 steps  (87%)
[10:29:17] Completed 220000 out of 250000 steps  (88%)
[11:05:07] Completed 222500 out of 250000 steps  (89%)
[11:40:56] Completed 225000 out of 250000 steps  (90%)
[12:16:46] Completed 227500 out of 250000 steps  (91%)
[12:52:34] Completed 230000 out of 250000 steps  (92%)
[13:28:22] Completed 232500 out of 250000 steps  (93%)
[14:04:08] Completed 235000 out of 250000 steps  (94%)
[14:39:56] Completed 237500 out of 250000 steps  (95%)
[14:55:58] - Autosending finished units... [March 10 14:55:58 UTC]
[14:55:58] Trying to send all finished work units
[14:55:58] + No unsent completed units remaining.
[14:55:58] - Autosend completed
[15:15:44] Completed 240000 out of 250000 steps  (96%)
[15:51:31] Completed 242500 out of 250000 steps  (97%)
[16:27:17] Completed 245000 out of 250000 steps  (98%)
[17:03:03] Completed 247500 out of 250000 steps  (99%)
[17:38:51] Completed 250000 out of 250000 steps  (100%)
[17:39:10] DynamicWrapper: Finished Work Unit: sleep=10000
[17:39:20] 
[17:39:20] Finished Work Unit:
[17:39:20] - Reading up to 121544064 from "work/wudata_00.trr": Read 121544064
[17:39:21] trr file hash check passed.
[17:39:21] - Reading up to 108757368 from "work/wudata_00.xtc": Read 108757368
[17:39:21] xtc file hash check passed.
[17:39:21] edr file hash check passed.
[17:39:21] logfile size: 277731
[17:39:21] Leaving Run
[17:39:24] - Writing 230752155 bytes of core data to disk...
[17:39:55] Done: 230751643 -> 222365552 (compressed to 3.3 percent)
[17:39:55]   ... Done.
[17:40:11] - Shutting down core
[17:40:11] 
[17:40:11] Folding@home Core Shutdown: FINISHED_UNIT
[17:40:13] CoreStatus = 64 (100)
[17:40:13] Unit 0 finished with 84 percent of time to deadline remaining.
[17:40:13] Updated performance fraction: 0.856052
[17:40:13] Sending work to server
[17:40:13] Project: 6904 (Run 2, Clone 36, Gen 89)


[17:40:13] + Attempting to send results [March 10 17:40:13 UTC]
[17:40:13] - Reading file work/wuresults_00.dat from core
[17:40:13]   (Read 222366064 bytes from disk)
[17:40:13] Connecting to http://130.237.232.237:8080/
[17:46:11] Posted data.
[17:46:11] Initial: 0000; - Uploaded at ~606 kB/s
[17:46:11] - Averaged speed for that direction ~585 kB/s
[17:46:11] - Server has already received unit.
[17:46:11] Trying to send all finished work units
[17:46:11] + No unsent completed units remaining.
[17:46:11] - Preparing to get new work unit...
[17:46:11] Cleaning up work directory
The machine was started between the stop and start shown above, but crashed for some reason. I then stopped it manually and restarted it again from the first backup, and things went back to normal. Unfortunately I did not keep the log from the crash so I do not exactly know what happened. The machine is based on the EVGA SR-2, with 2 x Intel 5645E, easily overclocked to 3.6GMHz and it has been running stable since last January when it was set to fold for the f@h. Since I did not take care of the log that contains the crash I can not argue about the details of this, so I'll just have to accept that credit disappeared for some reason, but it does not feel good. :mrgreen:
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6904 (Run 2, Clone 36, Gen 89) missing credit?

Post by bruce »

-alias- wrote:The machine was started between the stop and start shown above, but crashed for some reason. I then stopped it manually and restarted it again from the first backup, and things went back to normal. Unfortunately I did not keep the log from the crash so I do not exactly know what happened. The machine is based on the EVGA SR-2, with 2 x Intel 5645E, easily overclocked to 3.6GMHz and it has been running stable since last January when it was set to fold for the f@h. Since I did not take care of the log that contains the crash I can not argue about the details of this, so I'll just have to accept that credit disappeared for some reason, but it does not feel good. :mrgreen:
The log of the time when the machine crashed (between the log segments that you did post) is exactly what Grandpa_01 is talking about. During that time, the WU had an error, the error report was uploaded, and a new WU was downloaded. He's asking what happened to the new WU that was downloaded before you stopped it manually. You can't erase events that actually happened, even if you didn't keep the log from the crash. Saying it "crashed for some reason" doesn't give anybody the ability to help figure out WHY it crashed and exactly what happened.

If it ever happens again, keep the other log, and the other WU too. This will also answer the questions about why it crashed and about what WU was downloaded and then overwritten when you restored the backup.

Think of it this way: From the FAH server perspective you returned a partial WU and then you dumped (overwrote) the next WU that was assigned to you. That's the WU that didn't get credit. That's the WU that the server had assigned to you when you restored the backup, not the older one. It will eventually expire but the server will wait that long before assigning it to someone else.
Post Reply