One of my team mates reported this problem. He completed a WU and had the collection server dump it, then successfully uploaded the next, and once more had the server dump the third WU. All are P7903.
14:26:30:WU01:FS00:0xa4:Completed 2400000 out of 2500000 steps (96%)
14:30:35:WU01:FS00:0xa4:Completed 2425000 out of 2500000 steps (97%)
14:34:40:WU01:FS00:0xa4:Completed 2450000 out of 2500000 steps (98%)
14:38:46:WU01:FS00:0xa4:Completed 2475000 out of 2500000 steps (99%)
14:38:47:WU00:FS00:Connecting to assign3.stanford.edu:8080
14:38:47:WU00:FS00:News: Welcome to Folding@Home
14:38:47:WU00:FS00:Assigned to work server 128.113.12.161
14:38:47:WU00:FS00:Requesting new work unit for slot 00: RUNNING smp:8 from 128.113.12.161
14:38:47:WU00:FS00:Connecting to 128.113.12.161:8080
14:38:48:WU00:FS00:Downloading 646.38KiB
14:38:48:WU00:FS00:Download complete
14:38:48:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:OK project:7903 run:210 clone:9 gen:24 core:0xa4 unit:0x0000001900ac9c214eca68d81525fe45
14:42:51:WU01:FS00:0xa4:Completed 2500000 out of 2500000 steps (100%)
14:42:52:WU01:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
14:43:02:WU01:FS00:0xa4:
14:43:02:WU01:FS00:0xa4:Finished Work Unit:
14:43:02:WU01:FS00:0xa4:- Reading up to 35910936 from "01/wudata_01.trr": Read 35910936
14:43:02:WU01:FS00:0xa4:trr file hash check passed.
14:43:02:WU01:FS00:0xa4:edr file hash check passed.
14:43:02:WU01:FS00:0xa4:logfile size: 56875
14:43:02:WU01:FS00:0xa4:Leaving Run
14:43:05:WU01:FS00:0xa4:- Writing 35997727 bytes of core data to disk...
14:43:10:WU01:FS00:0xa4:Done: 35997215 -> 30222790 (compressed to 83.9 percent)
14:43:11:WU01:FS00:0xa4: ... Done.
14:43:14:WU01:FS00:0xa4:- Shutting down core
14:43:14:WU01:FS00:0xa4:
14:43:14:WU01:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
14:43:14:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:43:14:WU01:FS00:Sending unit results: id:01 state:SEND error:OK project:7903 run:216 clone:8 gen:16 core:0xa4 unit:0x0000001400ac9c214eca68e095e47aac
14:43:14:WU01:FS00:Uploading 28.82MiB to 128.113.12.161
14:43:14:WU01:FS00:Connecting to 128.113.12.161:8080
14:43:14:WU00:FS00:Starting
14:43:14:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/User/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe" -dir 00 -suffix 01 -version 701 -lifeline 1312 -checkpoint 15 -np 8
14:43:14:WU00:FS00:Started FahCore on PID 1340
14:43:14:WU00:FS00:Core PID:3120
14:43:14:WU00:FS00:FahCore 0xa4 started
14:43:15:WU00:FS00:0xa4:
14:43:15:WU00:FS00:0xa4:*------------------------------*
14:43:15:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
14:43:15:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
14:43:15:WU00:FS00:0xa4:
14:43:15:WU00:FS00:0xa4:Preparing to commence simulation
14:43:15:WU00:FS00:0xa4:- Looking at optimizations...
14:43:15:WU00:FS00:0xa4:- Created dyn
14:43:15:WU00:FS00:0xa4:- Files status OK
14:43:15:WU00:FS00:0xa4:- Expanded 661380 -> 1008860 (decompressed 152.5 percent)
14:43:15:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=661380 data_size=1008860, decompressed_data_size=1008860 diff=0
14:43:15:WU00:FS00:0xa4:- Digital signature verified
14:43:15:WU00:FS00:0xa4:
14:43:15:WU00:FS00:0xa4:Project: 7903 (Run 210, Clone 9, Gen 24)
14:43:15:WU00:FS00:0xa4:
14:43:15:WU00:FS00:0xa4:Assembly optimizations on if available.
14:43:15:WU00:FS00:0xa4:Entering M.D.
14:43:21:WU00:FS00:0xa4:Mapping NT from 8 to 8
14:43:21:WU00:FS00:0xa4:Completed 0 out of 2500000 steps (0%)
14:43:57:WU01:FS00:Upload 76.98%
14:43:57:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
14:43:57:WU01:FS00:Trying to send results to collection server
14:43:57:WU01:FS00:Uploading 28.82MiB to 129.74.85.16
14:43:57:WU01:FS00:Connecting to 129.74.85.16:8080
14:44:03:WU01:FS00:Upload 30.79%
14:44:09:WU01:FS00:Upload 62.45%
14:44:15:WU01:FS00:Upload 94.54%
14:44:16:WU01:FS00:Upload complete
14:44:16:WU01:FS00:Server responded WORK_QUIT (404)
14:44:16:WARNING:WU01:FS00:Server did not like results, dumping
14:44:16:WU01:FS00:Cleaning up
As a general rule, it's a good idea to report the first error, not the second one. The Work Server failed before the Collection Server did. I would guess that the WU is corrupt.
Why else would it say:
14:43:57:WU01:FS00:Upload 76.98%
14:43:57:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
I'm still running v6, so the wording didn't strike me as significant. I've often had WUs fail to go to the work server and end up going to the collection server.
Would you see the same wording if there was a problem with the WS and you had to go to the CS, or does this definitely indicate a problem with the WU/data/upload?
A bad result can be rejected by both a WS and CS and that looks like what happened here, though I'm certainly not sure.
A good WU can fail to go to a WS because the WS is down and then successfully go to the CS. The message about 76.98% does prove the WS accepted part of the WU, which implies the WS was not down.
It's clearer if you look only look at WU01 and ignore the messages about WU00.
14:26:30:WU01:FS00:0xa4:Completed 2400000 out of 2500000 steps (96%)
14:30:35:WU01:FS00:0xa4:Completed 2425000 out of 2500000 steps (97%)
14:34:40:WU01:FS00:0xa4:Completed 2450000 out of 2500000 steps (98%)
14:38:46:WU01:FS00:0xa4:Completed 2475000 out of 2500000 steps (99%)
14:42:51:WU01:FS00:0xa4:Completed 2500000 out of 2500000 steps (100%)
14:42:52:WU01:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
14:43:02:WU01:FS00:0xa4:
14:43:02:WU01:FS00:0xa4:Finished Work Unit:
14:43:02:WU01:FS00:0xa4:- Reading up to 35910936 from "01/wudata_01.trr": Read 35910936
14:43:02:WU01:FS00:0xa4:trr file hash check passed.
14:43:02:WU01:FS00:0xa4:edr file hash check passed.
14:43:02:WU01:FS00:0xa4:logfile size: 56875
14:43:02:WU01:FS00:0xa4:Leaving Run
14:43:05:WU01:FS00:0xa4:- Writing 35997727 bytes of core data to disk...
14:43:10:WU01:FS00:0xa4:Done: 35997215 -> 30222790 (compressed to 83.9 percent)
14:43:11:WU01:FS00:0xa4: ... Done.
14:43:14:WU01:FS00:0xa4:- Shutting down core
14:43:14:WU01:FS00:0xa4:
14:43:14:WU01:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
14:43:14:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:43:14:WU01:FS00:Sending unit results: id:01 state:SEND error:OK project:7903 run:216 clone:8 gen:16 core:0xa4 unit:0x0000001400ac9c214eca68e095e47aac
14:43:14:WU01:FS00:Uploading 28.82MiB to 128.113.12.161
14:43:14:WU01:FS00:Connecting to 128.113.12.161:8080
14:43:57:WU01:FS00:Upload 76.98%
14:43:57:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
14:43:57:WU01:FS00:Trying to send results to collection server
14:43:57:WU01:FS00:Uploading 28.82MiB to 129.74.85.16
14:43:57:WU01:FS00:Connecting to 129.74.85.16:8080
14:44:03:WU01:FS00:Upload 30.79%
14:44:09:WU01:FS00:Upload 62.45%
14:44:15:WU01:FS00:Upload 94.54%
14:44:16:WU01:FS00:Upload complete
14:44:16:WU01:FS00:Server responded WORK_QUIT (404)
14:44:16:WARNING:WU01:FS00:Server did not like results, dumping
14:44:16:WU01:FS00:Cleaning up
06:55:37:WU00:FS00:0xa4:Completed 9600000 out of 10000000 steps (96%)
07:02:12:WU00:FS00:0xa4:Completed 9700000 out of 10000000 steps (97%)
07:08:47:WU00:FS00:0xa4:Completed 9800000 out of 10000000 steps (98%)
07:15:23:WU00:FS00:0xa4:Completed 9900000 out of 10000000 steps (99%)
07:21:57:WU00:FS00:0xa4:Completed 10000000 out of 10000000 steps (100%)
07:21:58:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
07:22:08:WU00:FS00:0xa4:
07:22:08:WU00:FS00:0xa4:Finished Work Unit:
07:22:08:WU00:FS00:0xa4:- Reading up to 2128272 from "00/wudata_01.trr": Read 2128272
07:22:08:WU00:FS00:0xa4:trr file hash check passed.
07:22:08:WU00:FS00:0xa4:- Reading up to 221796 from "00/wudata_01.xtc": Read 221796
07:22:08:WU00:FS00:0xa4:xtc file hash check passed.
07:22:08:WU00:FS00:0xa4:edr file hash check passed.
07:22:08:WU00:FS00:0xa4:logfile size: 81409
07:22:08:WU00:FS00:0xa4:Leaving Run
07:22:09:WU00:FS00:0xa4:- Writing 2455949 bytes of core data to disk...
07:22:10:WU00:FS00:0xa4:Done: 2455437 -> 1862197 (compressed to 75.8 percent)
07:22:10:WU00:FS00:0xa4: ... Done.
07:22:10:WU00:FS00:0xa4:- Shutting down core
07:22:10:WU00:FS00:0xa4:
07:22:10:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
07:22:11:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
07:22:11:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7000 run:2 clone:4 gen:94 core:0xa4 unit:0x000000d00001329c4dfb826e99a01e6a
07:22:11:WU00:FS00:Uploading 1.78MiB to 129.74.85.15
07:22:11:WU00:FS00:Connecting to 129.74.85.15:8080
07:22:51:WU00:FS00:Upload 21.11%
07:22:51:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
07:22:51:WU00:FS00:Trying to send results to collection server
07:22:51:WU00:FS00:Uploading 1.78MiB to 129.74.85.16
07:22:51:WU00:FS00:Connecting to 129.74.85.16:8080
07:22:56:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:22:56:WU00:FS00:Connecting to 129.74.85.16:80
07:23:03:ERROR:WU00:FS00:Exception: Failed to connect to 129.74.85.16:80: No connection could be made because the target machine actively refused it.
07:23:03:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7000 run:2 clone:4 gen:94 core:0xa4 unit:0x000000d00001329c4dfb826e99a01e6a
07:23:03:WU00:FS00:Uploading 1.78MiB to 129.74.85.15
07:23:03:WU00:FS00:Connecting to 129.74.85.15:8080
07:23:09:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:23:09:WU00:FS00:Connecting to 129.74.85.15:80
07:23:16:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 129.74.85.15:80: No connection could be made because the target machine actively refused it.
07:23:16:WU00:FS00:Trying to send results to collection server
07:23:16:WU00:FS00:Uploading 1.78MiB to 129.74.85.16
07:23:16:WU00:FS00:Connecting to 129.74.85.16:8080
07:23:23:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:23:23:WU00:FS00:Connecting to 129.74.85.16:80
07:23:29:ERROR:WU00:FS00:Exception: Failed to connect to 129.74.85.16:80: No connection could be made because the target machine actively refused it.
07:24:03:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7000 run:2 clone:4 gen:94 core:0xa4 unit:0x000000d00001329c4dfb826e99a01e6a
07:24:03:WU00:FS00:Uploading 1.78MiB to 129.74.85.15
07:24:03:WU00:FS00:Connecting to 129.74.85.15:8080
07:24:09:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:24:09:WU00:FS00:Connecting to 129.74.85.15:80
07:24:15:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 129.74.85.15:80: No connection could be made because the target machine actively refused it.
07:24:15:WU00:FS00:Trying to send results to collection server
07:24:15:WU00:FS00:Uploading 1.78MiB to 129.74.85.16
07:24:15:WU00:FS00:Connecting to 129.74.85.16:8080
07:24:22:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:24:22:WU00:FS00:Connecting to 129.74.85.16:80
07:24:29:ERROR:WU00:FS00:Exception: Failed to connect to 129.74.85.16:80: No connection could be made because the target machine actively refused it.
07:25:41:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7000 run:2 clone:4 gen:94 core:0xa4 unit:0x000000d00001329c4dfb826e99a01e6a
07:25:41:WU00:FS00:Uploading 1.78MiB to 129.74.85.15
07:25:41:WU00:FS00:Connecting to 129.74.85.15:8080
07:25:47:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:25:47:WU00:FS00:Connecting to 129.74.85.15:80
07:25:53:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 129.74.85.15:80: No connection could be made because the target machine actively refused it.
07:25:53:WU00:FS00:Trying to send results to collection server
07:25:53:WU00:FS00:Uploading 1.78MiB to 129.74.85.16
07:25:53:WU00:FS00:Connecting to 129.74.85.16:8080
07:26:00:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:26:00:WU00:FS00:Connecting to 129.74.85.16:80
07:26:07:ERROR:WU00:FS00:Exception: Failed to connect to 129.74.85.16:80: No connection could be made because the target machine actively refused it.
07:28:18:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7000 run:2 clone:4 gen:94 core:0xa4 unit:0x000000d00001329c4dfb826e99a01e6a
07:28:18:WU00:FS00:Uploading 1.78MiB to 129.74.85.15
07:28:18:WU00:FS00:Connecting to 129.74.85.15:8080
07:28:24:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:28:24:WU00:FS00:Connecting to 129.74.85.15:80
07:28:31:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 129.74.85.15:80: No connection could be made because the target machine actively refused it.
07:28:31:WU00:FS00:Trying to send results to collection server
07:28:31:WU00:FS00:Uploading 1.78MiB to 129.74.85.16
07:28:31:WU00:FS00:Connecting to 129.74.85.16:8080
07:28:37:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:28:37:WU00:FS00:Connecting to 129.74.85.16:80
07:28:44:ERROR:WU00:FS00:Exception: Failed to connect to 129.74.85.16:80: No connection could be made because the target machine actively refused it.
07:32:32:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7000 run:2 clone:4 gen:94 core:0xa4 unit:0x000000d00001329c4dfb826e99a01e6a
07:32:32:WU00:FS00:Uploading 1.78MiB to 129.74.85.15
07:32:32:WU00:FS00:Connecting to 129.74.85.15:8080
07:32:38:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:32:38:WU00:FS00:Connecting to 129.74.85.15:80
07:33:19:WU00:FS00:Upload 3.52%
07:33:19:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
07:33:19:WU00:FS00:Trying to send results to collection server
07:33:19:WU00:FS00:Uploading 1.78MiB to 129.74.85.16
07:33:19:WU00:FS00:Connecting to 129.74.85.16:8080
07:33:25:WU00:FS00:Upload 84.44%
07:33:26:WU00:FS00:Upload complete
07:33:26:WU00:FS00:Server responded WORK_QUIT (404)
07:33:26:WARNING:WU00:FS00:Server did not like results, dumping
07:33:26:WU00:FS00:Cleaning up
The number of refusals to both the work and connection server seems to imply some network connectivity issues. It's hard to believe they're entirely coincidental and the problem is actually with a corrupted WU.
[edit] Oh, by the way - SMP core, Windows XP, Client v7.1.52, Q6600, 3Gb ram, no overclock