Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Moderators: Site Moderators, FAHC Science Team

Post Reply
Ragnar Dan
Posts: 52
Joined: Fri Dec 07, 2007 3:21 am
Location: U.S. (TechReport.com's Team 2630)

Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by Ragnar Dan »

I've gotten that error for two 2662 WU's, in fact. I've been trying to upload the first one since last Friday, August 1:

Code: Select all

[23:30:16] - Connecting to assignment server
[23:30:16] Connecting to http://assign.stanford.edu:8080/
[23:30:16] Posted data.
[23:30:16] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[23:30:16] + News From Folding@Home: Welcome to Folding@Home
[23:30:16] Loaded queue successfully.
[23:30:16] Connecting to http://171.64.65.56:8080/
[23:30:21] Posted data.
[23:30:21] Initial: 0000; - Receiving payload (expected size: 4917681)
[23:30:41] - Downloaded at ~240 kB/s
[23:30:41] - Averaged speed for that direction ~427 kB/s
[23:30:41] + Received work.
[23:30:41] Trying to send all finished work units
[23:30:41] + No unsent completed units remaining.
[23:30:41] + Closed connections
[23:30:41] 
[23:30:41] + Processing work unit
[23:30:41] Core required: FahCore_a2.exe
[23:30:41] Core found.
[23:30:41] Working on Unit 03 [July 30 23:30:41]
[23:30:41] + Working ...
[23:30:41] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -priority 96 -checkpoint 5 -forceasm -verbose -lifeline 5406 -version 602'

[23:30:41] 
[23:30:41] *------------------------------*
[23:30:41] Folding@Home Gromacs SMP Core
[23:30:41] Version 1.91 (2007)
[23:30:41] 
[23:30:41] Preparing to commence simulation
[23:30:41] - Ensuring status. Please wait.
[23:30:58] - Assembly optimizations manually forced on.
[23:30:58] - Not checking prior termination.
[23:30:58] Error: Work unit read from disk is invalid
[23:30:58] Finalizing output
[23:31:02] - Expanded 4917169 -> 24360573 (decompressed 495.4 percent)
[23:31:02] 
[23:31:02] Project: 2662 (Run 2, Clone 207, Gen 1)
[23:31:02] 
[23:31:03] Assembly optimizations on if available.
[23:31:03] Entering M.D.
[23:31:13] Completed 0 out of 250000 steps  (0%)

[...]

[20:53:53] Completed 248750 out of 250000 steps  (100%)
[20:59:00] Timer requesting checkpoint
[21:04:02] Timer requesting checkpoint
[21:09:05] Timer requesting checkpoint
[21:09:15] 
[21:09:15] Finished Work Unit:
[21:09:15] - Reading up to 21310708 from "work/wudata_03.trr": Read 21310708
[21:09:16] - Reading up to 4722492 from "work/wudata_03.xtc": Read 4722492
[21:09:17] logfile size: 160672
[21:09:17] Leaving Run
[21:09:18] - Writing 26375780 bytes of core data to disk...
[21:09:18]   ... Done.
[21:09:26] - Shutting down core
[21:11:26] 
[21:11:26] Folding@home Core Shutdown: FINISHED_UNIT
[21:14:40] CoreStatus = 64 (100)
[21:14:40] Unit 3 finished with 36 percent of time to deadline remaining.
[21:14:40] Updated performance fraction: 0.545557
[21:14:40] Sending work to server


[21:14:40] + Attempting to send results
[21:14:40] - Reading file work/wuresults_03.dat from core
[21:14:40]   (Read 26375780 bytes from disk)
[21:14:40] Connecting to http://171.64.65.56:8080/
[21:14:55] - Couldn't send HTTP request to server
[21:14:55] + Could not connect to Work Server (results)
[21:14:55]     (171.64.65.56:8080)
[21:14:55] - Error: Could not transmit unit 03 (completed August 1) to work server.
[21:14:55] - 1 failed uploads of this unit.
[21:14:55]   Keeping unit 03 in queue.
[21:14:55] Trying to send all finished work units


[21:14:55] + Attempting to send results
[21:14:55] - Reading file work/wuresults_03.dat from core
[21:14:55]   (Read 26375780 bytes from disk)
[21:14:55] Connecting to http://171.64.65.56:8080/
[21:15:11] - Couldn't send HTTP request to server
[21:15:11] + Could not connect to Work Server (results)
[21:15:11]     (171.64.65.56:8080)
[21:15:11] - Error: Could not transmit unit 03 (completed August 1) to work server.
[21:15:11] - 2 failed uploads of this unit.


[21:15:11] + Attempting to send results
[21:15:11] - Reading file work/wuresults_03.dat from core
[21:15:11]   (Read 26375780 bytes from disk)
[21:15:11] Connecting to http://171.64.122.86:8080/
[21:18:21] Posted data.
[21:18:21] Initial: 0000; - Uploaded at ~135 kB/s
[21:18:21] - Averaged speed for that direction ~159 kB/s
[21:18:21] - Server does not have record of this unit. Will try again later.
[21:18:21]   Could not transmit unit 03 to Collection server; keeping in queue.
[21:18:21] + Sent 0 of 1 completed units to the server
[21:18:21] - Preparing to get new work unit...
[21:18:21] + Attempting to get work packet
[21:18:21] - Will indicate memory of 250 MB
[21:18:21] - Connecting to assignment server
[21:18:21] Connecting to http://assign.stanford.edu:8080/
[21:18:22] Posted data.
[21:18:22] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[21:18:22] + News From Folding@Home: Welcome to Folding@Home
[21:18:22] Loaded queue successfully.
[21:18:22] Connecting to http://171.64.65.56:8080/
[21:18:27] Posted data.
[21:18:27] Initial: 0000; - Receiving payload (expected size: 4922291)
[21:18:41] - Downloaded at ~343 kB/s
[21:18:41] - Averaged speed for that direction ~410 kB/s
[21:18:41] + Received work.
[21:18:41] Trying to send all finished work units


[21:18:41] + Attempting to send results
[21:18:41] - Reading file work/wuresults_03.dat from core
[21:18:41]   (Read 26375780 bytes from disk)
[21:18:41] Connecting to http://171.64.65.56:8080/
[21:18:56] - Couldn't send HTTP request to server
[21:18:56] + Could not connect to Work Server (results)
[21:18:56]     (171.64.65.56:8080)
[21:18:56] - Error: Could not transmit unit 03 (completed August 1) to work server.
[21:18:56] - 3 failed uploads of this unit.
That WU says it expired Saturday night. I tend to think that if the thing hadn't had whatever server side problems exist, and if it wasn't an overly long 6 hours between attempts to upload, it may not have expired without value.

The next WU the client downloaded was also a 2662 and had similar results. It was a...
Project: 2662 (Run 2, Clone 11, Gen 16)

It is about to expire, too, and I'm seeing no evidence that the powers that be are aware of this problem except that they talk about deleting FahCore_a2.exe and watching how things improve. I appear to be the only person on my team who has even seen any posts here about that, which is something that I would have thought would have been a large public announcement not only on the Announcements subforum but also on Dr. Pande's typepad blog, at minimum, and maybe even those moderators who are members of various teams' forums might have wanted to mention it there.

Here's the next WU's edited log which starts immediately after the previous one:

Code: Select all

[21:18:56] + Attempting to send results
[21:18:56] - Reading file work/wuresults_03.dat from core
[21:18:56]   (Read 26375780 bytes from disk)
[21:18:56] Connecting to http://171.64.122.86:8080/
[21:22:53] Posted data.
[21:22:53] Initial: 0000; - Uploaded at ~108 kB/s
[21:22:53] - Averaged speed for that direction ~149 kB/s
[21:22:53] - Server does not have record of this unit. Will try again later.
[21:22:53]   Could not transmit unit 03 to Collection server; keeping in queue.
[21:22:53] + Sent 0 of 1 completed units to the server
[21:22:53] + Closed connections
[21:22:53] 
[21:22:53] + Processing work unit
[21:22:53] Core required: FahCore_a2.exe
[21:22:53] Core found.
[21:22:53] Working on Unit 04 [August 1 21:22:53]
[21:22:53] + Working ...
[21:22:53] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -priority 96 -checkpoint 5 -forceasm -verbose -lifeline 5406 -version 602'

[21:22:54] 
[21:22:54] *------------------------------*
[21:22:54] Folding@Home Gromacs SMP Core
[21:22:54] Version 1.91 (2007)
[21:22:54] 
[21:22:54] Preparing to commence simulation
[21:22:54] - Ensuring status. Please wait.
[21:22:55] 
[21:22:56] Project: 2662 (Run 2, Clone 11, Gen 16)
[21:22:56] 
[21:22:56] Assembly optimizations on if available.
[21:22:56] Entering M.D.
[21:23:13]  on if available.
[21:23:13] Entering M.D.
[21:23:23] Completed 0 out of 250000 steps  (0%)
Here's more of the problem uploading, and then another 2662 WU it downloaded that eventually got deleted after I downloaded the new 2.00 FahCore_a2.exe and the client couldn't figure out what to do with the apparently older version's data, queue, or something:

Code: Select all

[02:20:00] - Autosending finished units...
[02:20:00] Trying to send all finished work units


[02:20:00] + Attempting to send results
[02:20:00] - Reading file work/wuresults_03.dat from core
[02:20:02]   (Read 26375780 bytes from disk)
[02:20:02] Connecting to http://171.64.65.56:8080/
[02:20:17] - Couldn't send HTTP request to server
[02:20:17] + Could not connect to Work Server (results)
[02:20:17]     (171.64.65.56:8080)
[02:20:17] - Error: Could not transmit unit 03 (completed August 1) to work server.
[02:20:17] - 4 failed uploads of this unit.


[02:20:17] + Attempting to send results
[02:20:17] - Reading file work/wuresults_03.dat from core
[02:20:18]   (Read 26375780 bytes from disk)
[02:20:18] Connecting to http://171.64.122.86:8080/
[02:20:18] - Couldn't send HTTP request to server
[02:20:18] + Could not connect to Work Server (results)
[02:20:18]     (171.64.122.86:8080)
[02:20:18]   Could not transmit unit 03 to Collection server; keeping in queue.
[02:20:19] + Sent 0 of 1 completed units to the server
[02:20:19] - Autosend completed

[...]

[08:24:25] - Autosending finished units...
[08:24:25] Trying to send all finished work units


[08:24:25] + Attempting to send results
[08:24:25] - Reading file work/wuresults_03.dat from core
[08:24:26]   (Read 26375780 bytes from disk)
[08:24:26] Connecting to http://171.64.65.56:8080/
[08:24:38] - Couldn't send HTTP request to server
[08:24:38] + Could not connect to Work Server (results)
[08:24:38]     (171.64.65.56:8080)
[08:24:38] - Error: Could not transmit unit 03 (completed August 1) to work server.
[08:24:38] - 5 failed uploads of this unit.


[08:24:38] + Attempting to send results
[08:24:38] - Reading file work/wuresults_03.dat from core
[08:24:38]   (Read 26375780 bytes from disk)
[08:24:38] Connecting to http://171.64.122.86:8080/
[08:24:39] - Couldn't send HTTP request to server
[08:24:39] + Could not connect to Work Server (results)
[08:24:39]     (171.64.122.86:8080)
[08:24:39]   Could not transmit unit 03 to Collection server; keeping in queue.
[08:24:39] + Sent 0 of 1 completed units to the server
[08:24:39] - Autosend completed

[...]

[14:24:39] - Autosending finished units...
[14:24:39] Trying to send all finished work units


[14:24:39] + Attempting to send results
[14:24:39] - Reading file work/wuresults_03.dat from core
[14:24:41]   (Read 26375780 bytes from disk)
[14:24:41] Connecting to http://171.64.65.56:8080/
[14:24:51] Timer requesting checkpoint
[14:24:53] - Couldn't send HTTP request to server
[14:24:53] + Could not connect to Work Server (results)
[14:24:53]     (171.64.65.56:8080)
[14:24:53] - Error: Could not transmit unit 03 (completed August 1) to work server.
[14:24:53] - 6 failed uploads of this unit.


[14:24:53] + Attempting to send results
[14:24:53] - Reading file work/wuresults_03.dat from core
[14:24:53]   (Read 26375780 bytes from disk)
[14:24:53] Connecting to http://171.64.122.86:8080/
[14:27:37] Completed 88750 out of 250000 steps  (36%)
[14:28:02] Posted data.
[14:28:02] Initial: 0000; - Uploaded at ~136 kB/s
[14:28:02] - Averaged speed for that direction ~147 kB/s
[14:28:02] - Server does not have record of this unit. Will try again later.
[14:28:02]   Could not transmit unit 03 to Collection server; keeping in queue.
[14:28:02] + Sent 0 of 1 completed units to the server
[14:28:02] - Autosend completed

[...]

[20:28:02] - Autosending finished units...
[20:28:02] Trying to send all finished work units


[20:28:02] + Attempting to send results
[20:28:02] - Reading file work/wuresults_03.dat from core
[20:28:02]   (Read 26375780 bytes from disk)
[20:28:02] Connecting to http://171.64.65.56:8080/
[20:28:18] - Couldn't send HTTP request to server
[20:28:18] + Could not connect to Work Server (results)
[20:28:18]     (171.64.65.56:8080)
[20:28:18] - Error: Could not transmit unit 03 (completed August 1) to work server.
[20:28:18] - 7 failed uploads of this unit.


[20:28:18] + Attempting to send results
[20:28:18] - Reading file work/wuresults_03.dat from core
[20:28:18]   (Read 26375780 bytes from disk)
[20:28:18] Connecting to http://171.64.122.86:8080/
[20:28:20] - Couldn't send HTTP request to server
[20:28:20] + Could not connect to Work Server (results)
[20:28:20]     (171.64.122.86:8080)
[20:28:20]   Could not transmit unit 03 to Collection server; keeping in queue.
[20:28:20] + Sent 0 of 1 completed units to the server
[20:28:20] - Autosend completed

[...]

[23:23:24] Timer requesting checkpoint
[23:28:26] Timer requesting checkpoint
[23:33:27] Timer requesting checkpoint
[23:38:30] Timer requesting checkpoint
[23:43:32] Timer requesting checkpoint

[...]

[02:26:06] Timer requesting checkpoint
[02:28:20] Unit 3's deadline (August 2 23:30) has passed.
[02:28:20] - Autosending finished units...
[02:28:20] Trying to send all finished work units


[02:28:20] + Attempting to send results
[02:28:20] - Reading file work/wuresults_03.dat from core
[02:28:20]   (Read 26375780 bytes from disk)
[02:28:20] Connecting to http://171.64.65.56:8080/
[02:28:32] - Couldn't send HTTP request to server
[02:28:32] + Could not connect to Work Server (results)
[02:28:32]     (171.64.65.56:8080)
[02:28:32] - Error: Could not transmit unit 03 (completed August 1) to work server.
[02:28:32] - 8 failed uploads of this unit.


[02:28:32] + Attempting to send results
[02:28:32] - Reading file work/wuresults_03.dat from core
[02:28:32]   (Read 26375780 bytes from disk)
[02:28:32] Connecting to http://171.64.122.86:8080/
[02:28:33] - Couldn't send HTTP request to server
[02:28:33] + Could not connect to Work Server (results)
[02:28:33]     (171.64.122.86:8080)
[02:28:33]   Could not transmit unit 03 to Collection server; keeping in queue.
[02:28:33] + Sent 0 of 1 completed units to the server
[02:28:33] - Autosend completed

[...]

[08:28:33] Unit 3's deadline (August 2 23:30) has passed.
[08:28:33] - Autosending finished units...
[08:28:33] Trying to send all finished work units


[08:28:33] + Attempting to send results
[08:28:33] - Reading file work/wuresults_03.dat from core
[08:28:33]   (Read 26375780 bytes from disk)
[08:28:33] Connecting to http://171.64.65.56:8080/
[08:28:44] - Couldn't send HTTP request to server
[08:28:44] + Could not connect to Work Server (results)
[08:28:44]     (171.64.65.56:8080)
[08:28:44] - Error: Could not transmit unit 03 (completed August 1) to work server.
[08:28:44] - 9 failed uploads of this unit.


[08:28:44] + Attempting to send results
[08:28:44] - Reading file work/wuresults_03.dat from core
[08:28:45]   (Read 26375780 bytes from disk)
[08:28:45] Connecting to http://171.64.122.86:8080/
[08:28:46] - Couldn't send HTTP request to server
[08:28:46] + Could not connect to Work Server (results)
[08:28:46]     (171.64.122.86:8080)
[08:28:46]   Could not transmit unit 03 to Collection server; keeping in queue.
[08:28:46] + Sent 0 of 1 completed units to the server
[08:28:46] - Autosend completed

[...]

[14:28:46] Unit 3's deadline (August 2 23:30) has passed.
[14:28:46] - Autosending finished units...
[14:28:46] Trying to send all finished work units


[14:28:46] + Attempting to send results
[14:28:46] - Reading file work/wuresults_03.dat from core
[14:28:48]   (Read 26375780 bytes from disk)
[14:28:48] Connecting to http://171.64.65.56:8080/
[14:29:01] - Couldn't send HTTP request to server
[14:29:01] + Could not connect to Work Server (results)
[14:29:01]     (171.64.65.56:8080)
[14:29:01] - Error: Could not transmit unit 03 (completed August 1) to work server.
[14:29:01] - 10 failed uploads of this unit.


[14:29:01] + Attempting to send results
[14:29:01] - Reading file work/wuresults_03.dat from core
[14:29:01]   (Read 26375780 bytes from disk)
[14:29:01] Connecting to http://171.64.122.86:8080/
[14:29:02] - Couldn't send HTTP request to server
[14:29:02] + Could not connect to Work Server (results)
[14:29:02]     (171.64.122.86:8080)
[14:29:02]   Could not transmit unit 03 to Collection server; keeping in queue.
[14:29:02] + Sent 0 of 1 completed units to the server
[14:29:02] - Autosend completed

[...]

[20:13:16] Timer requesting checkpoint
[20:15:59] Completed 248750 out of 250000 steps  (100%)
[20:21:05] Timer requesting checkpoint
[20:26:05] Timer requesting checkpoint
[20:29:02] Unit 3's deadline (August 2 23:30) has passed.
[20:29:02] - Autosending finished units...
[20:29:02] Trying to send all finished work units


[20:29:02] + Attempting to send results
[20:29:02] - Reading file work/wuresults_03.dat from core
[20:29:02]   (Read 26375780 bytes from disk)
[20:29:02] Connecting to http://171.64.65.56:8080/
[20:29:16] - Couldn't send HTTP request to server
[20:29:16] + Could not connect to Work Server (results)
[20:29:16]     (171.64.65.56:8080)
[20:29:16] - Error: Could not transmit unit 03 (completed August 1) to work server.
[20:29:16] - 11 failed uploads of this unit.


[20:29:16] + Attempting to send results
[20:29:16] - Reading file work/wuresults_03.dat from core
[20:29:16]   (Read 26375780 bytes from disk)
[20:29:16] Connecting to http://171.64.122.86:8080/
[20:29:17] - Couldn't send HTTP request to server
[20:29:17] + Could not connect to Work Server (results)
[20:29:17]     (171.64.122.86:8080)
[20:29:17]   Could not transmit unit 03 to Collection server; keeping in queue.
[20:29:17] + Sent 0 of 1 completed units to the server
[20:29:17] - Autosend completed
[20:31:00] 
[20:31:00] Finished Work Unit:
[20:31:00] - Reading up to 21310708 from "work/wudata_04.trr": Read 21310708
[20:31:01] - Reading up to 4724320 from "work/wudata_04.xtc": Read 4724320
[20:31:02] logfile size: 160673
[20:31:02] Leaving Run
[20:31:05] - Writing 26377609 bytes of core data to disk...
[20:31:05]   ... Done.
[20:31:08] Timer requesting checkpoint
[20:31:10] - Shutting down core
[20:33:10] 
[20:33:10] Folding@home Core Shutdown: FINISHED_UNIT
[20:36:24] CoreStatus = 64 (100)
[20:36:24] Unit 4 finished with 34 percent of time to deadline remaining.
[20:36:24] Updated performance fraction: 0.505069
[20:36:24] Sending work to server


[20:36:24] + Attempting to send results
[20:36:24] - Reading file work/wuresults_04.dat from core
[20:36:24]   (Read 26377609 bytes from disk)
[20:36:24] Connecting to http://171.64.65.56:8080/
[20:36:38] - Couldn't send HTTP request to server
[20:36:38] + Could not connect to Work Server (results)
[20:36:38]     (171.64.65.56:8080)
[20:36:38] - Error: Could not transmit unit 04 (completed August 3) to work server.
[20:36:38] - 1 failed uploads of this unit.
[20:36:38]   Keeping unit 04 in queue.
[20:36:38] Trying to send all finished work units


[20:36:38] + Attempting to send results
[20:36:38] - Reading file work/wuresults_03.dat from core
[20:36:39]   (Read 26375780 bytes from disk)
[20:36:39] Connecting to http://171.64.65.56:8080/
[20:36:53] - Couldn't send HTTP request to server
[20:36:53] + Could not connect to Work Server (results)
[20:36:53]     (171.64.65.56:8080)
[20:36:53] - Error: Could not transmit unit 03 (completed August 1) to work server.
[20:36:53] - 12 failed uploads of this unit.


[20:36:53] + Attempting to send results
[20:36:53] - Reading file work/wuresults_03.dat from core
[20:36:53]   (Read 26375780 bytes from disk)
[20:36:53] Connecting to http://171.64.122.86:8080/
[20:36:55] - Couldn't send HTTP request to server
[20:36:55] + Could not connect to Work Server (results)
[20:36:55]     (171.64.122.86:8080)
[20:36:55]   Could not transmit unit 03 to Collection server; keeping in queue.


[20:36:55] + Attempting to send results
[20:36:55] - Reading file work/wuresults_04.dat from core
[20:36:55]   (Read 26377609 bytes from disk)
[20:36:55] Connecting to http://171.64.65.56:8080/
[20:37:08] - Couldn't send HTTP request to server
[20:37:08] + Could not connect to Work Server (results)
[20:37:08]     (171.64.65.56:8080)
[20:37:08] - Error: Could not transmit unit 04 (completed August 3) to work server.
[20:37:08] - 2 failed uploads of this unit.


[20:37:08] + Attempting to send results
[20:37:08] - Reading file work/wuresults_04.dat from core
[20:37:08]   (Read 26377609 bytes from disk)
[20:37:08] Connecting to http://171.64.122.86:8080/
[20:37:10] - Couldn't send HTTP request to server
[20:37:10] + Could not connect to Work Server (results)
[20:37:10]     (171.64.122.86:8080)
[20:37:10]   Could not transmit unit 04 to Collection server; keeping in queue.
[20:37:10] + Sent 0 of 2 completed units to the server
[20:37:10] - Preparing to get new work unit...
[20:37:10] + Attempting to get work packet
[20:37:10] - Will indicate memory of 250 MB
[20:37:10] - Connecting to assignment server
[20:37:10] Connecting to http://assign.stanford.edu:8080/
[20:37:10] Posted data.
[20:37:10] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[20:37:10] + News From Folding@Home: Welcome to Folding@Home
[20:37:11] Loaded queue successfully.
[20:37:11] Connecting to http://171.64.65.56:8080/
[20:37:16] Posted data.
[20:37:16] Initial: 0000; - Receiving payload (expected size: 4920218)
[20:37:27] - Downloaded at ~436 kB/s
[20:37:27] - Averaged speed for that direction ~415 kB/s
[20:37:27] + Received work.
[20:37:27] Trying to send all finished work units


[20:37:27] + Attempting to send results
[20:37:27] - Reading file work/wuresults_03.dat from core
[20:37:27]   (Read 26375780 bytes from disk)
[20:37:27] Connecting to http://171.64.65.56:8080/
[20:37:40] - Couldn't send HTTP request to server
[20:37:40] + Could not connect to Work Server (results)
[20:37:40]     (171.64.65.56:8080)
[20:37:40] - Error: Could not transmit unit 03 (completed August 1) to work server.
[20:37:40] - 13 failed uploads of this unit.


[20:37:40] + Attempting to send results
[20:37:40] - Reading file work/wuresults_03.dat from core
[20:37:40]   (Read 26375780 bytes from disk)
[20:37:40] Connecting to http://171.64.122.86:8080/
[20:40:49] Posted data.
[20:40:50] Initial: 0000; - Uploaded at ~135 kB/s
[20:40:50] - Averaged speed for that direction ~144 kB/s
[20:40:50] - Server does not have record of this unit. Will try again later.
[20:40:50]   Could not transmit unit 03 to Collection server; keeping in queue.


[20:40:50] + Attempting to send results
[20:40:50] - Reading file work/wuresults_04.dat from core
[20:40:50]   (Read 26377609 bytes from disk)
[20:40:50] Connecting to http://171.64.65.56:8080/
[20:41:07] - Couldn't send HTTP request to server
[20:41:07] + Could not connect to Work Server (results)
[20:41:07]     (171.64.65.56:8080)
[20:41:07] - Error: Could not transmit unit 04 (completed August 3) to work server.
[20:41:07] - 3 failed uploads of this unit.


[20:41:07] + Attempting to send results
[20:41:07] - Reading file work/wuresults_04.dat from core
[20:41:07]   (Read 26377609 bytes from disk)
[20:41:07] Connecting to http://171.64.122.86:8080/
[20:44:49] Posted data.
[20:44:49] Initial: 0000; - Uploaded at ~116 kB/s
[20:44:49] - Averaged speed for that direction ~138 kB/s
[20:44:49] - Server does not have record of this unit. Will try again later.
[20:44:49]   Could not transmit unit 04 to Collection server; keeping in queue.
[20:44:49] + Sent 0 of 2 completed units to the server
[20:44:49] + Closed connections
[20:44:49] 
[20:44:49] + Processing work unit
[20:44:49] Core required: FahCore_a2.exe
[20:44:49] Core found.
[20:44:50] Working on Unit 05 [August 3 20:44:50]
[20:44:50] + Working ...
[20:44:50] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 05 -priority 96 -checkpoint 5 -forceasm -verbose -lifeline 5406 -version 602'

[20:44:50] 
[20:44:50] *------------------------------*
[20:44:50] Folding@Home Gromacs SMP Core
[20:44:50] Version 1.91 (2007)
[20:44:50] 
[20:44:50] Preparing to commence simulation
[20:44:50] - Ensuring status. Please wait.
[20:44:51] 
[20:44:51] Project: 2662 (Run 2, Clone 200, Gen 2)
[20:44:51] 
[20:44:52] Assembly optimizations on if available.
[20:44:52] Entering M.D.
[20:45:09]  on if available.
[20:45:09] Entering M.D.
[20:45:18] Completed 0 out of 250000 steps  (0%)
Having complained that way, I will say that the new a2 core is running now with another 2662 WU, and it's more than 35% faster, which is nice. Hopefully it will upload successfully.
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by GTron »

http://foldingforum.org/viewtopic.php?f ... =15#p44070
kasson wrote:One very important thing to note--if you are running on advanced methods, please see the announcement of the A2 core for advanced methods.

The current versions of the A2 core are 2.00 for Linux, 2.01 for OSX. As stated in the announcement, if you have an *old* version of the A2 core (before 1.95), it may not auto-upgrade. Old versions of the core had both poorer performance and some bugs. The servers may not properly accept work done with the old core. It is extremely important to delete versions of the A2 core prior to 1.95 to force an upgrade.

PS we normally do this automatically, but old versions of the core had a bug in the auto-upgrade code.
PPS We're also checking server configuration options and may perform some recredits.
Your logs show Version 1.91 for your a2 core so I think you ran into the red highligted issue Kasson refers to in the quote. By downloading the v2.00 core your future WUs will upload OK, but your v1.91 completed WUs will remain in your queue until they pass deadline.

Greg
Ragnar Dan
Posts: 52
Joined: Fri Dec 07, 2007 3:21 am
Location: U.S. (TechReport.com's Team 2630)

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by Ragnar Dan »

It does not seem justifiable to me that Stanford would allow WU's to be downloaded to be folded by pre-2.00 A2 cores when they knew in advance they would reject the results.

I only read a page in this forum about the new A2 core by chance, and decided to download it. Whoever is in charge of what WU's are assigned to what machines should have stopped any WU's whose results wouldn't be accepted from being assigned in the first place. It's behavior like that, wasting contributors' time and electricity, that irritates quite a few people. Enough for them to quit helping, indeed.

Thanks for the information, GTron.
toTOW
Site Moderator
Posts: 6394
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by toTOW »

This was not expected before v2.0 release ... it's a bug with the servers :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Ragnar Dan
Posts: 52
Joined: Fri Dec 07, 2007 3:21 am
Location: U.S. (TechReport.com's Team 2630)

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by Ragnar Dan »

A bug with the servers? They assigned the WU's, they were folded by machines all over the world, and they attempted to upload them. What could the problem be? I don't see how a new project number would cause a server to disallow it from uploading, but be that as it may, why don't they just modify them quickly to change it so that those WU's will be accepted? At least do as I mentioned earlier, stop earlier cores from being assigned those WU's.

I can't imagine either one of those code changes taking more than a few hours at most from assignment to completion and implementation. Even if they normally go through an extended debugging/proving period, making an exception for this case so that some value could be gained rather than letting it slide with only a thread in this forum (which the vast majority of Folding contributors do not know exists) makes sense.

When did Stanford realize there was this problem? I downloaded my 1st 2662 Project on a v. 1.91 A2 core on July 30, 23:30:41 UTC, and my 2nd 2662 Project on August 1, 21:18:41 UTC. If by then they knew it existed, they should have quickly implemented a change or at minimum made a big public announcement, on the above-mentioned places and also including the http://folding.stanford.edu/ web site (or anywhere you can download a Linux SMP client, at least) so that all comers to any place related to the effort would learn of the issue and quickly correct the problem themselves.
nwkelley
Pande Group Member
Posts: 57
Joined: Wed May 14, 2008 9:43 pm

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by nwkelley »

while i can't comment directly on your concerns right now, i can make sure those involved are aware of them (which i would assume they are) I will try and get the details on some of these issues in the mean time if possible...

-nick
Ragnar Dan
Posts: 52
Joined: Fri Dec 07, 2007 3:21 am
Location: U.S. (TechReport.com's Team 2630)

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by Ragnar Dan »

I appreciate the reply.

Meanwhile, I assume I should just kill my client and force deletion of the 2 expired WU's?
micro
Posts: 4
Joined: Fri Aug 29, 2008 10:53 am

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by micro »

I too have wasted three fully completed 2662's on a machine I built specifically to fold 24/7. I used VMware Player and Notfred Linux. The 2662's would finish then just retry all the way till I got tired of seeing it after 12 hours of failing to send and could not get another work assignment.

I thought it might be my Player andmachine or that blasted server, so I just shut it down, and installed the 6.22 mpich client and went back to folding A1 cores for the week-end there. At least something will get done and uploaded.

However, when one purchases the top of line Core 2 Duo to fold, they are not looing to do A1 cores or use 6.22 mpich.

Needless to say I was within an eyelash of quitting folding wth all three of my dual cores.

So now, I am supposed to look at the version of future 2662's and if it is 1.95 or older delete it from the Player, and start over and PRAY and HOPE we get one that will run because we have nothing better to do than sit in front of a computer and make sure we get a work unit that should function correctly.

Sounds kind of asinine to me. I have a business to run so I think Stanford should get it's act together.

Just my two cents.

micro...
team 328
toTOW
Site Moderator
Posts: 6394
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by toTOW »

Finally it showed up in the stats :

Hi Ragnar_Dan (team 2630),
Your WU (P2662 R2 C207 G1) was added to the stats database on 2008-08-01 14:33:41 for 1920 points of credit.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Baowoulf
Posts: 208
Joined: Wed Dec 12, 2007 8:44 pm
Hardware configuration: Pentium 4 2.8 GHz, 512MB DDR Ram, 128MB Radeon 9800, Creative Soundblaster Audigy 4 Pro
Location: Jupiter 6
Contact:

Re: Project: 2662 (Run 2, Clone 207, Gen 1) - Unable to send

Post by Baowoulf »

micro wrote:So now, I am supposed to look at the version of future 2662's and if it is 1.95 or older delete it from the Player, and start over and PRAY and HOPE we get one that will run because we have nothing better to do than sit in front of a computer and make sure we get a work unit that should function correctly.
From the link GTron it sounds like it's a one time thing and once you do it you shouldn't have to do it again on the client. Or maybe you can check ahead of time in case you don't have time to wait for whenever your current WU finishes, I can't do the SMP client on my computer so I have no way of knowing.
Post Reply