GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by **bruce** » Tue Feb 16, 2010 11:36 pm

noorman wrote:
Nathan_P wrote:171.67.108.26 is in reject again, cpu load is 10+ and the net load is 828. Can someone please give it a nudge
.

It 's Accepting again ...

They are looking in to this..

Right . . . I was about to say that.

------------------------------------

I've never seen that sequence of client messages before. My first guess is that the critical message is one of these two:

ikerekes wrote: [21:01:25] DynamicWrapper: Finished Work Unit: sleep=10000
[21:01:35] Reserved 101284 bytes for xtc file; Cosm status=0
[21:01:35] Allocated 101284 bytes for xtc file
[21:01:35] - Reading up to 101284 from "work/wudata_05.xtc": Read 101284
[21:01:35] Read 101284 bytes from xtc file; available packet space=786329180
[21:01:35] xtc file hash check passed.
[21:01:35] Reserved 30216 30216 786329180 bytes for arc file=<work/wudata_05.trr> Cosm status=0
[21:01:35] Allocated 30216 bytes for arc file
[21:01:35] - Reading up to 30216 from "work/wudata_05.trr": Read 30216
[21:01:35] Read 30216 bytes from arc file; available packet space=786298964
[21:01:35] trr file hash check passed.
[21:01:35] Allocated 560 bytes for edr file
[21:01:35] Read bedfile
[21:01:35] edr file hash check passed.
[21:01:35] Logfile not read.
[21:01:35] GuardedRun: success in DynamicWrapper
[21:01:35] GuardedRun: done
[21:01:35] Run: GuardedRun completed.
[21:01:40] + Opened results file
[21:01:40] - Writing 132572 bytes of core data to disk...
[21:01:40] Done: 132060 -> 131598 (compressed to 99.6 percent)
[21:01:40] ... Done.
[21:01:40] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[21:01:40] Shutting down core
[21:01:40]
[21:01:40] Folding@home Core Shutdown: FINISHED_UNIT
[21:01:43] CoreStatus = 64 (100)
[21:01:43] Sending work to server
[21:01:43] - Preparing to get new work unit...

I'll bet you're running on a Linux filesystem with delayed allocations. Any chance on moving that client to a partition that sync's when the client expects it to do so?

Teddy · Post by **Teddy** » Wed Feb 17, 2010 1:21 am

Things don't seem to have improved Bruce..

Code: Select all

23:31:50] Completed 99%
[23:32:53] Completed 100%
[23:32:53] Successful run
[23:32:53] DynamicWrapper: Finished Work Unit: sleep=10000
[23:33:03] Reserved 11384 bytes for xtc file; Cosm status=0
[23:33:03] Allocated 11384 bytes for xtc file
[23:33:03] - Reading up to 11384 from "work/wudata_04.xtc": Read 11384
[23:33:03] Read 11384 bytes from xtc file; available packet space=786419080
[23:33:03] xtc file hash check passed.
[23:33:03] Reserved 23472 23472 786419080 bytes for arc file=<work/wudata_04.trr> Cosm status=0
[23:33:03] Allocated 23472 bytes for arc file
[23:33:03] - Reading up to 23472 from "work/wudata_04.trr": Read 23472
[23:33:03] Read 23472 bytes from arc file; available packet space=786395608
[23:33:03] trr file hash check passed.
[23:33:03] Allocated 560 bytes for edr file
[23:33:03] Read bedfile
[23:33:03] edr file hash check passed.
[23:33:03] Allocated 64951 bytes for logfile
[23:33:03] Read logfile
[23:33:03] GuardedRun: success in DynamicWrapper
[23:33:03] GuardedRun: done
[23:33:03] Run: GuardedRun completed.
[23:33:08] - Writing 100879 bytes of core data to disk...
[23:33:08] Done: 100367 -> 44235 (compressed to 44.0 percent)
[23:33:08]   ... Done.
[23:33:08] - Shutting down core 
[23:33:08] 
[23:33:08] Folding@home Core Shutdown: FINISHED_UNIT
[23:33:12] CoreStatus = 64 (100)
[23:33:12] Unit 4 finished with 98 percent of time to deadline remaining.
[23:33:12] Updated performance fraction: 0.979654
[23:33:12] Sending work to server
[23:33:12] Project: 5906 (Run 11, Clone 946, Gen 47)


[23:33:12] + Attempting to send results [February 16 23:33:12 UTC]
[23:33:12] - Reading file work/wuresults_04.dat from core
[23:33:12]   (Read 44747 bytes from disk)
[23:33:12] Connecting to http://171.64.122.70:8080/
[23:33:13] Posted data.
[23:33:14] Initial: 0000; - Uploaded at ~8 kB/s
[23:33:17] - Averaged speed for that direction ~12 kB/s
[23:33:17] + Results successfully sent
[23:33:17] Thank you for your contribution to Folding@Home.
[23:33:17] + Number of Units Completed: 3024

[23:33:21] Trying to send all finished work units
[23:33:21] Project: 10103 (Run 859, Clone 7, Gen 5)


[23:33:21] + Attempting to send results [February 16 23:33:21 UTC]
[23:33:21] - Reading file work/wuresults_02.dat from core
[23:33:21]   (Read 132076 bytes from disk)
[23:33:21] Connecting to http://171.64.65.71:8080/
[23:53:21] Posted data.
[23:57:54] Initial: 48AA; + Could not connect to Work Server (results)
[23:57:54]     (171.64.65.71:8080)
[23:57:54] + Retrying using alternative port
[23:57:54] Connecting to http://171.64.65.71:80/
[23:57:58] - Couldn't send HTTP request to server
[23:57:58] + Could not connect to Work Server (results)
[23:57:58]     (171.64.65.71:80)
[23:57:58] - Error: Could not transmit unit 02 (completed February 16) to work server.
[23:57:58] - 24 failed uploads of this unit.


[23:57:58] + Attempting to send results [February 16 23:57:58 UTC]
[23:57:58] - Reading file work/wuresults_02.dat from core
[23:57:58]   (Read 132076 bytes from disk)
[23:57:58] Connecting to http://171.67.108.26:8080/
[00:17:58] Posted data.
[00:37:58] Initial: 00AA; + Could not connect to Work Server (results)
[00:57:54]     (171.67.108.26:8080)
[00:57:54] + Retrying using alternative port
[00:57:54] Connecting to http://171.67.108.26:80/
[00:57:55] - Couldn't send HTTP request to server
[00:57:55]   (Got status 503)
[00:57:55] + Could not connect to Work Server (results)
[00:57:55]     (171.67.108.26:80)
[00:57:55]   Could not transmit unit 02 to Collection server; keeping in queue.
[00:57:55] + Sent 0 of 1 completed units to the server
[00:57:55] - Preparing to get new work unit...
[00:57:55] + Attempting to get work packet
[00:57:55] - Will indicate memory of 2046 MB
[00:57:55] - Connecting to assignment server
[00:57:55] Connecting to http://assign-GPU.stanford.edu:8080/
[00:57:56] Posted data.
[00:57:56] Initial: 40AB; - Successful: assigned to (171.64.65.71).
[00:57:56] + News From Folding@Home: Welcome to Folding@Home
[00:57:56] Loaded queue successfully.
[00:57:56] Connecting to http://171.64.65.71:8080/

It just hangs there not doing anything until I restart the client..

Teddy

DavidMudkips · Post by **DavidMudkips** » Wed Feb 17, 2010 1:50 am

Getting the ACK error on p1010x, all other projects are fine.

Code: Select all

[23:21:43] + Attempting to send results [February 16 23:21:43 UTC]
[23:21:43] - Reading file work/wuresults_02.dat from core
[23:21:43]   (Read 132407 bytes from disk)
[23:21:43] Connecting to http://171.64.65.71:8080/
[23:30:10] Posted data.
[23:40:40] - Autosending finished units... [February 16 23:40:40 UTC]
[23:40:40] Trying to send all finished work units
[23:40:40] - Already sending work
[23:40:40] + Sent 0 of 1 completed units to the server
[23:40:40] - Autosend completed
[23:50:10] Initial: 00FA; - Uploaded at ~0 kB/s
[23:57:28] - Averaged speed for that direction ~29 kB/s
[23:57:28] - Unknown packet returned from server, expected ACK for results
[23:57:28] - Error: Could not transmit unit 02 (completed February 16) to work server.
[23:57:28] - 1 failed uploads of this unit.
[23:57:28]   Keeping unit 02 in queue.
[23:57:28] Trying to send all finished work units
[23:57:28] Project: 10102 (Run 87, Clone 9, Gen 10)
[23:57:28] - Read packet limit of 540015616... Set to 524286976.


[23:57:28] + Attempting to send results [February 16 23:57:28 UTC]
[23:57:28] - Reading file work/wuresults_02.dat from core
[23:57:28]   (Read 132407 bytes from disk)
[23:57:28] Connecting to http://171.64.65.71:8080/
[23:57:51] Posted data.
[23:57:51] Initial: 0000; - Uploaded at ~5 kB/s
[23:57:51] - Averaged speed for that direction ~24 kB/s
[23:57:51] - Server has already received unit.
[23:57:51] + Sent 0 of 1 completed units to the server
[23:57:51] - Preparing to get new work unit...

On a side note, are the recredits from 2/14's "Server has already received unit" mess all processed now? Because I'm still missing credit for 4 783-pointers.

Post by **bruce** » Wed Feb 17, 2010 2:06 am

DavidMudkips wrote:On a side note, are the recredits from 2/14's "Server has already received unit" mess all processed now? Because I'm still missing credit for 4 783-pointers.

I doubt it.

The first order of business will always be to stop the recredit list from growing.

Then be sure that the data is in a safe place.

Then make sure that delays in real-time uploading/downloading/crediting are running smoothly.

Then re-gather your sanity.

Then gather the data for the recrdedit -- then have the data rechecked several times by several different people -- then apply the credits.

As long as servers are having serious problems like they have been over the past several days, a recredit will continue to be delayed. Dr. Pande has been very explicit about wanting to be sure re-crediting is done correctly. A number of different people are involved and if any one of them is busy fixing real-time problems, the recredit will be delayed.

Teddy · Post by **Teddy** » Wed Feb 17, 2010 4:19 am

I'm going to stop GPU folding until this mess is sorted out, I have folders full of unsent units over many machines, I have clients that appear to have sent units & then do NOT download a new protein so the client just sits in limbo until user intervenes (ie me) I am sick and tired of the constant babysitting over the last few weeks it really is a mess PG. & what is up with server 20 does not want results either now, so you can add that to 71 as well...

Send me an email when the mess gets sorted.

Ta Teddy

noorman · Post by **noorman** » Wed Feb 17, 2010 1:11 pm

If you have few machines/Clients that are misbehaving like that, you can try stopping the Client, deleting the Work folder and queue.dat file and restart the Client.

I did so yesterday (noon time GMT+1) and my Client has been running and Folding ever since, using other GPU servers and not using a CS (CS6), without interruptions

For the machines that have lots of Results in Queue, Bruce gave a possible reason why that is when the log says that they have been sent (succesfully).
If the server has accepted the Results but the Client didn't get the signal to clear that Queue-slot, the Client thinks it hasn't been uploaded ...
Check the log(s) for succesful uploads.
If you have the combination of 'succesful uploads' and a 'full or nearly full queue', you could also try the above suggestion to get things Folding again 'normally' ...

.

Post by **VijayPande** » Wed Feb 17, 2010 3:16 pm

noorman wrote: I did so yesterday (noon time GMT+1) and my Client has been running and Folding ever since, using other GPU servers and not using a CS (CS6), without interruptions

That's good to hear. Is that the case for new WUs for others too?

For old ones, this is reasonable:

For the machines that have lots of Results in Queue, Bruce gave a possible reason why that is when the log says that they have been sent (succesfully).
If the server has accepted the Results but the Client didn't get the signal to clear that Queue-slot, the Client thinks it hasn't been uploaded ...
Check the log(s) for succesful uploads.
If you have the combination of 'succesful uploads' and a 'full or nearly full queue', you could also try the above suggestion to get things Folding again 'normally' ...

We are working to see what we can do about this as well, but as I think everyone would agree, the first job was to get the basics working for all new WUs. The WS's have been pretty stable so far, so I think/hope we're ok for new WUs so far.

noorman · Post by **noorman** » Wed Feb 17, 2010 3:36 pm

VijayPande wrote:
noorman wrote: I did so yesterday (noon time GMT+1) and my Client has been running and Folding ever since, using other GPU servers and not using a CS (CS6), without interruptions

That's good to hear. Is that the case for new WUs for others too?

I had Projects 10105, 5766, 5767, 10103, 10102, 5771, 5781, 5765, 5770 and a 3470 since I restarted.

So, that 's work from 171.67.108.11, 171.67.108.21 and from 171.64.65.71

Over half were P 1010x though ...

.

leexgx · Post by **leexgx** » Wed Feb 17, 2010 5:54 pm

just let them time out on their own

goben_2003 · Post by **goben_2003** » Wed Feb 17, 2010 6:13 pm

VijayPande wrote:ok, thanks. Joe and I have been working to track this down.

Could people confirm the following:
- the problems are only seen in one server: 171.67.108.21
- the problems are only seen with configs with multiple GPUs in a single box

If either of the above aren't true for you, could you please post here?

I have single gpu(9800 gt) in my computer.
It appears to be both 171.67.108.26 and 171.64.65.71.

Code: Select all

[16:54:59] Shutting down core 
[16:54:59] 
[16:54:59] Folding@home Core Shutdown: FINISHED_UNIT
[16:55:03] CoreStatus = 64 (100)
[16:55:03] Sending work to server
[16:55:03] Project: 10102 (Run 420, Clone 2, Gen 9)


[16:55:03] + Attempting to send results [February 17 16:55:03 UTC]
[16:55:04] - Couldn't send HTTP request to server
[16:55:04] + Could not connect to Work Server (results)
[16:55:04]     (171.64.65.71:8080)
[16:55:04] + Retrying using alternative port
[16:55:05] - Couldn't send HTTP request to server
[16:55:05] + Could not connect to Work Server (results)
[16:55:05]     (171.64.65.71:80)
[16:55:05] - Error: Could not transmit unit 00 (completed February 17) to work server.
[16:55:05]   Keeping unit 00 in queue.
[16:55:05] Project: 10102 (Run 420, Clone 2, Gen 9)


[16:55:05] + Attempting to send results [February 17 16:55:05 UTC]
[16:55:06] - Couldn't send HTTP request to server
[16:55:06] + Could not connect to Work Server (results)
[16:55:06]     (171.64.65.71:8080)
[16:55:06] + Retrying using alternative port
[16:55:07] - Couldn't send HTTP request to server
[16:55:07] + Could not connect to Work Server (results)
[16:55:07]     (171.64.65.71:80)
[16:55:07] - Error: Could not transmit unit 00 (completed February 17) to work server.


[16:55:07] + Attempting to send results [February 17 16:55:07 UTC]
[17:06:49] + Could not connect to Work Server (results)
[17:06:49]     (171.67.108.26:8080)
[17:06:49] + Retrying using alternative port
[17:06:49] - Couldn't send HTTP request to server
[17:06:49]   (Got status 503)
[17:06:49] + Could not connect to Work Server (results)
[17:06:49]     (171.67.108.26:80)
[17:06:49]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:06:49] Project: 10102 (Run 420, Clone 2, Gen 9)


[17:06:49] + Attempting to send results [February 17 17:06:49 UTC]
[17:06:50] - Couldn't send HTTP request to server
[17:06:50] + Could not connect to Work Server (results)
[17:06:50]     (171.64.65.71:8080)
[17:06:50] + Retrying using alternative port
[17:06:51] - Couldn't send HTTP request to server
[17:06:51] + Could not connect to Work Server (results)
[17:06:51]     (171.64.65.71:80)
[17:06:51] - Error: Could not transmit unit 00 (completed February 17) to work server.


[17:06:51] + Attempting to send results [February 17 17:06:51 UTC]
[17:43:23] - Unknown packet returned from server, expected ACK for results
[17:43:23]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:43:23] - Preparing to get new work unit...
[17:43:23] + Attempting to get work packet
[17:43:23] - Connecting to assignment server
[17:43:24] - Successful: assigned to (171.67.108.11).
[17:43:24] + News From Folding@Home: Welcome to Folding@Home
[17:43:24] Loaded queue successfully.
[17:43:25] Project: 10102 (Run 420, Clone 2, Gen 9)


[17:43:25] + Attempting to send results [February 17 17:43:25 UTC]
[17:43:26] - Couldn't send HTTP request to server
[17:43:26] + Could not connect to Work Server (results)
[17:43:26]     (171.64.65.71:8080)
[17:43:26] + Retrying using alternative port
[17:43:27] - Couldn't send HTTP request to server
[17:43:27] + Could not connect to Work Server (results)
[17:43:27]     (171.64.65.71:80)
[17:43:27] - Error: Could not transmit unit 00 (completed February 17) to work server.


[17:43:27] + Attempting to send results [February 17 17:43:27 UTC]
[17:46:38] - Couldn't send HTTP request to server
[17:46:38] + Could not connect to Work Server (results)
[17:46:38]     (171.67.108.26:8080)
[17:46:38] + Retrying using alternative port
[17:46:38] - Couldn't send HTTP request to server
[17:46:38]   (Got status 503)
[17:46:38] + Could not connect to Work Server (results)
[17:46:38]     (171.67.108.26:80)
[17:46:38]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:46:38] Project: 10102 (Run 420, Clone 2, Gen 9)


[17:46:38] + Attempting to send results [February 17 17:46:38 UTC]
[17:46:39] - Couldn't send HTTP request to server
[17:46:39] + Could not connect to Work Server (results)
[17:46:39]     (171.64.65.71:8080)
[17:46:39] + Retrying using alternative port
[17:46:40] - Couldn't send HTTP request to server
[17:46:40] + Could not connect to Work Server (results)
[17:46:40]     (171.64.65.71:80)
[17:46:40] - Error: Could not transmit unit 00 (completed February 17) to work server.


[17:46:40] + Attempting to send results [February 17 17:46:40 UTC]
[17:47:20] - Server does not have record of this unit. Will try again later.
[17:47:20]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:47:20] + Closed connections

Edit(~2:30pm EST): This just happened again. It is the second unit in a row. Once again it has tried those two ip's.

[Inpact]Terminou · Post by **[Inpact]Terminou** » Wed Feb 17, 2010 7:02 pm

for me it is only on the .21

[18:54:26] Tpr hash work/wudata_02.tpr: 1468495927 2366622910 1018924041 1449731411 1711493363
[18:54:26]
[18:54:26] Calling fah_main args: 14 usage=100
[18:54:26]
[18:54:26] Working on p10105_lambda_370K
[18:54:29] Client config found, loading data.
[18:54:29] Starting GUI Server
[18:54:40] - Couldn't send HTTP request to server
[18:54:40] + Could not connect to Work Server (results)
[18:54:40] (171.67.108.21:80)
[18:54:40] - Error: Could not transmit unit 04 (completed February 16) to work server.
[18:54:40] - Read packet limit of 540015616... Set to 524286976.

same on other GPU

tobor · Post by **tobor** » Wed Feb 17, 2010 9:09 pm

Just looked like I was gonna get back to normal.... but nooooo

Wont connect to .21 or .26

Code: Select all

###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[19:51:14] - Ask before connecting: No
[19:51:14] - User name: stv911 (Team 4)
[19:51:14] - User ID: BE4069840F022A4
[19:51:14] - Machine ID: 3
[19:51:14] 
[19:51:14] Loaded queue successfully.
[19:51:14] Initialization complete
[19:51:14] 
[19:51:14] + Processing work unit
[19:51:14] Core required: FahCore_11.exe
[19:51:14] Core found.
[19:51:14] Working on queue slot 02 [February 17 19:51:14 UTC]
[19:51:14] + Working ...
[19:51:14] 
[19:51:14] *------------------------------*
[19:51:14] Folding@Home GPU Core
[19:51:14] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[19:51:14] 
[19:51:14] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[19:51:14] Build host: amoeba
[19:51:14] Board Type: Nvidia
[19:51:14] Core      : 
[19:51:14] Preparing to commence simulation
[19:51:14] - Looking at optimizations...
[19:51:14] - Files status OK
[19:51:14] - Expanded 19325 -> 137900 (decompressed 713.5 percent)
[19:51:14] Called DecompressByteArray: compressed_data_size=19325 data_size=137900, decompressed_data_size=137900 diff=0
[19:51:14] - Digital signature verified
[19:51:14] 
[19:51:14] Project: 3469 (Run 20, Clone 164, Gen 1)
[19:51:14] 
[19:51:14] Assembly optimizations on if available.
[19:51:14] Entering M.D.
[19:51:20] Will resume from checkpoint file
[19:51:20] Tpr hash work/wudata_02.tpr:  3838073550 3846720041 1443531160 1137903302 2290942673
[19:51:20] 
[19:51:20] Calling fah_main args: 14 usage=100
[19:51:20] 
[19:51:20] Working on Fs-peptide-GBSA
[19:51:20] Client config found, loading data.
[19:51:20] Resuming from checkpoint
[19:51:20] fcCheckPointResume: retreived and current tpr file hash:
[19:51:20]    0   3838073550   3838073550
[19:51:20]    1   3846720041   3846720041
[19:51:20]    2   1443531160   1443531160
[19:51:20]    3   1137903302   1137903302
[19:51:20]    4   2290942673   2290942673
[19:51:20] fcCheckPointResume: file hashes same.
[19:51:20] fcCheckPointResume: state restored.
[19:51:20] Verified work/wudata_02.log
[19:51:20] Verified work/wudata_02.edr
[19:51:20] Verified work/wudata_02.xtc
[19:51:20] Completed 21%
[19:51:20] Starting GUI Server
[19:52:04] Completed 22%
[19:52:47] Completed 23%
[19:53:31] Completed 24%
[19:54:14] Completed 25%
[19:54:58] Completed 26%
[19:55:42] Completed 27%
[19:56:25] Completed 28%
[19:57:09] Completed 29%
[19:57:52] Completed 30%
[19:58:36] Completed 31%
[19:59:20] Completed 32%
[20:00:03] Completed 33%
[20:00:47] Completed 34%
[20:01:30] Completed 35%
[20:02:14] Completed 36%
[20:02:57] Completed 37%
[20:03:41] Completed 38%
[20:04:25] Completed 39%
[20:05:08] Completed 40%
[20:05:52] Completed 41%
[20:06:36] Completed 42%
[20:07:20] Completed 43%
[20:08:03] Completed 44%
[20:08:47] Completed 45%
[20:09:30] Completed 46%
[20:10:14] Completed 47%
[20:10:58] Completed 48%
[20:11:41] Completed 49%
[20:12:25] Completed 50%
[20:13:09] Completed 51%
[20:13:52] Completed 52%
[20:14:36] Completed 53%
[20:15:19] Completed 54%
[20:16:05] Completed 55%
[20:16:48] Completed 56%
[20:17:32] Completed 57%
[20:18:16] Completed 58%
[20:18:59] Completed 59%
[20:19:43] Completed 60%
[20:20:26] Completed 61%
[20:21:10] Completed 62%
[20:21:54] Completed 63%
[20:22:37] Completed 64%
[20:23:21] Completed 65%
[20:24:05] Completed 66%
[20:24:49] Completed 67%
[20:25:35] Completed 68%
[20:26:19] Completed 69%
[20:27:03] Completed 70%
[20:27:46] Completed 71%
[20:28:30] Completed 72%
[20:29:13] Completed 73%
[20:29:57] Completed 74%
[20:30:41] Completed 75%
[20:31:24] Completed 76%
[20:32:08] Completed 77%
[20:32:52] Completed 78%
[20:33:35] Completed 79%
[20:34:19] Completed 80%
[20:35:02] Completed 81%
[20:35:46] Completed 82%
[20:36:30] Completed 83%
[20:37:13] Completed 84%
[20:37:57] Completed 85%
[20:38:41] Completed 86%
[20:39:24] Completed 87%
[20:40:08] Completed 88%
[20:40:51] Completed 89%
[20:41:35] Completed 90%
[20:42:19] Completed 91%
[20:43:02] Completed 92%
[20:43:46] Completed 93%
[20:44:30] Completed 94%
[20:45:13] Completed 95%
[20:45:57] Completed 96%
[20:46:40] Completed 97%
[20:47:24] Completed 98%
[20:48:08] Completed 99%
[20:48:51] Completed 100%
[20:48:51] Successful run
[20:48:51] DynamicWrapper: Finished Work Unit: sleep=10000
[20:49:01] Reserved 65788 bytes for xtc file; Cosm status=0
[20:49:01] Allocated 65788 bytes for xtc file
[20:49:01] - Reading up to 65788 from "work/wudata_02.xtc": Read 65788
[20:49:01] Read 65788 bytes from xtc file; available packet space=786364676
[20:49:01] xtc file hash check passed.
[20:49:01] Reserved 6456 6456 786364676 bytes for arc file=<work/wudata_02.trr> Cosm status=0
[20:49:01] Allocated 6456 bytes for arc file
[20:49:01] - Reading up to 6456 from "work/wudata_02.trr": Read 6456
[20:49:01] Read 6456 bytes from arc file; available packet space=786358220
[20:49:01] trr file hash check passed.
[20:49:01] Allocated 560 bytes for edr file
[20:49:01] Read bedfile
[20:49:01] edr file hash check passed.
[20:49:01] Logfile not read.
[20:49:01] GuardedRun: success in DynamicWrapper
[20:49:01] GuardedRun: done
[20:49:01] Run: GuardedRun completed.
[20:49:04] + Opened results file
[20:49:04] - Writing 73316 bytes of core data to disk...
[20:49:04] Done: 72804 -> 69335 (compressed to 95.2 percent)
[20:49:04]   ... Done.
[20:49:04] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[20:49:04] Shutting down core 
[20:49:04] 
[20:49:04] Folding@home Core Shutdown: FINISHED_UNIT
[20:49:08] CoreStatus = 64 (100)
[20:49:08] Sending work to server
[20:49:08] Project: 3469 (Run 20, Clone 164, Gen 1)


[20:49:08] + Attempting to send results [February 17 20:49:08 UTC]
[20:49:09] - Couldn't send HTTP request to server
[20:49:09] + Could not connect to Work Server (results)
[20:49:09]     (171.67.108.21:8080)
[20:49:09] + Retrying using alternative port
[20:49:30] - Couldn't send HTTP request to server
[20:49:30] + Could not connect to Work Server (results)
[20:49:30]     (171.67.108.21:80)
[20:49:30] - Error: Could not transmit unit 02 (completed February 17) to work server.
[20:49:30]   Keeping unit 02 in queue.
[20:49:30] Project: 3469 (Run 20, Clone 164, Gen 1)


[20:49:30] + Attempting to send results [February 17 20:49:30 UTC]
[20:49:31] - Couldn't send HTTP request to server
[20:49:31] + Could not connect to Work Server (results)
[20:49:31]     (171.67.108.21:8080)
[20:49:31] + Retrying using alternative port
[20:49:51] - Couldn't send HTTP request to server
[20:49:51] + Could not connect to Work Server (results)
[20:49:51]     (171.67.108.21:80)
[20:49:51] - Error: Could not transmit unit 02 (completed February 17) to work server.


[20:49:51] + Attempting to send results [February 17 20:49:51 UTC]
[20:50:12] - Couldn't send HTTP request to server
[20:50:12] + Could not connect to Work Server (results)
[20:50:12]     (171.67.108.26:8080)
[20:50:12] + Retrying using alternative port
[20:50:13] - Couldn't send HTTP request to server
[20:50:13]   (Got status 503)
[20:50:13] + Could not connect to Work Server (results)
[20:50:13]     (171.67.108.26:80)
[20:50:13]   Could not transmit unit 02 to Collection server; keeping in queue.
[20:50:13] - Preparing to get new work unit...
[20:50:13] + Attempting to get work packet
[20:50:13] - Connecting to assignment server
[20:50:13] - Successful: assigned to (171.67.108.11).
[20:50:13] + News From Folding@Home: Welcome to Folding@Home
[20:50:13] Loaded queue successfully.
[20:50:14] Project: 3469 (Run 20, Clone 164, Gen 1)


[20:50:14] + Attempting to send results [February 17 20:50:14 UTC]
[20:50:15] - Couldn't send HTTP request to server
[20:50:15] + Could not connect to Work Server (results)
[20:50:15]     (171.67.108.21:8080)
[20:50:15] + Retrying using alternative port
[20:50:36] - Couldn't send HTTP request to server
[20:50:36] + Could not connect to Work Server (results)
[20:50:36]     (171.67.108.21:80)
[20:50:36] - Error: Could not transmit unit 02 (completed February 17) to work server.


[20:50:36] + Attempting to send results [February 17 20:50:36 UTC]
[20:50:57] - Couldn't send HTTP request to server
[20:50:57] + Could not connect to Work Server (results)
[20:50:57]     (171.67.108.26:8080)
[20:50:57] + Retrying using alternative port
[20:50:57] - Couldn't send HTTP request to server
[20:50:57]   (Got status 503)
[20:50:57] + Could not connect to Work Server (results)
[20:50:57]     (171.67.108.26:80)
[20:50:57]   Could not transmit unit 02 to Collection server; keeping in queue.
[20:50:57] + Closed connections
[20:50:57] 
[20:50:57] + Processing work unit
[20:50:57] Core required: FahCore_11.exe
[20:50:57] Core found.
[20:50:57] Working on queue slot 03 [February 17 20:50:57 UTC]
[20:50:57] + Working ...
[20:50:57] 
[20:50:57] *------------------------------*
[20:50:57] Folding@Home GPU Core
[20:50:57] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[20:50:57] 
[20:50:57] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[20:50:57] Build host: amoeba
[20:50:57] Board Type: Nvidia
[20:50:57] Core      : 
[20:50:57] Preparing to commence simulation
[20:50:57] - Looking at optimizations...
[20:50:57] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[20:50:57] - Created dyn
[20:50:57] - Files status OK
[20:50:57] - Expanded 45390 -> 251112 (decompressed 553.2 percent)
[20:50:57] Called DecompressByteArray: compressed_data_size=45390 data_size=251112, decompressed_data_size=251112 diff=0
[20:50:57] - Digital signature verified
[20:50:57] 
[20:50:57] Project: 5769 (Run 10, Clone 2, Gen 1994)
[20:50:57] 
[20:50:57] Assembly optimizations on if available.
[20:50:57] Entering M.D.
[20:51:03] Tpr hash work/wudata_03.tpr:  2317276294 598111567 4243079128 1397114829 611891929
[20:51:03] 
[20:51:03] Calling fah_main args: 14 usage=100
[20:51:03] 
[20:51:04] Working on Protein
[20:51:04] Client config found, loading data.
[20:51:04] Starting GUI Server
[20:51:47] Completed 1%
[20:52:30] Completed 2%

Folding@Home Client Shutdown.


--- Opening Log file [February 17 20:53:09 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\steve\Application Data\Folding@home-gpu2
Arguments: -gpu 1 -forcegpu nvidia_g80" 

[20:53:09] - Ask before connecting: No
[20:53:09] - User name: stv911 (Team 4)
[20:53:09] - User ID: BE4069840F022A4
[20:53:09] - Machine ID: 3
[20:53:09] 
[20:53:09] Loaded queue successfully.
[20:53:09] Initialization complete
[20:53:09] 
[20:53:09] + Processing work unit
[20:53:09] Project: 3469 (Run 20, Clone 164, Gen 1)


[20:53:09] + Attempting to send results [February 17 20:53:09 UTC]
[20:53:09] Core required: FahCore_11.exe
[20:53:09] Core found.
[20:53:09] Working on queue slot 03 [February 17 20:53:09 UTC]
[20:53:09] + Working ...
[20:53:09] 
[20:53:09] *------------------------------*
[20:53:09] Folding@Home GPU Core
[20:53:09] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[20:53:09] 
[20:53:09] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[20:53:09] Build host: amoeba
[20:53:09] Board Type: Nvidia
[20:53:09] Core      : 
[20:53:09] Preparing to commence simulation
[20:53:09] - Looking at optimizations...
[20:53:09] - Files status OK
[20:53:09] - Expanded 45390 -> 251112 (decompressed 553.2 percent)
[20:53:09] Called DecompressByteArray: compressed_data_size=45390 data_size=251112, decompressed_data_size=251112 diff=0
[20:53:09] - Digital signature verified
[20:53:09] 
[20:53:09] Project: 5769 (Run 10, Clone 2, Gen 1994)
[20:53:09] 
[20:53:09] Assembly optimizations on if available.
[20:53:09] Entering M.D.
[20:53:10] - Couldn't send HTTP request to server
[20:53:10] + Could not connect to Work Server (results)
[20:53:10]     (171.67.108.21:8080)
[20:53:10] + Retrying using alternative port
[20:53:15] Will resume from checkpoint file
[20:53:15] Tpr hash work/wudata_03.tpr:  2317276294 598111567 4243079128 1397114829 611891929
[20:53:15] 
[20:53:15] Calling fah_main args: 14 usage=100
[20:53:15] 
[20:53:15] Working on Protein
[20:53:16] Client config found, loading data.
[20:53:16] Resuming from checkpoint
[20:53:16] fcCheckPointResume: retreived and current tpr file hash:
[20:53:16]    0   2317276294   2317276294
[20:53:16]    1    598111567    598111567
[20:53:16]    2   4243079128   4243079128
[20:53:16]    3   1397114829   1397114829
[20:53:16]    4    611891929    611891929
[20:53:16] fcCheckPointResume: file hashes same.
[20:53:16] fcCheckPointResume: state restored.
[20:53:16] Verified work/wudata_03.log
[20:53:16] Verified work/wudata_03.edr
[20:53:16] Verified work/wudata_03.xtc
[20:53:16] Completed 2%
[20:53:16] Starting GUI Server
[20:53:31] - Couldn't send HTTP request to server
[20:53:31] + Could not connect to Work Server (results)
[20:53:31]     (171.67.108.21:80)
[20:53:31] - Error: Could not transmit unit 02 (completed February 17) to work server.


[20:53:31] + Attempting to send results [February 17 20:53:31 UTC]
[20:53:52] - Couldn't send HTTP request to server
[20:53:52] + Could not connect to Work Server (results)
[20:53:52]     (171.67.108.26:8080)
[20:53:52] + Retrying using alternative port
[20:53:55] - Couldn't send HTTP request to server
[20:53:55] + Could not connect to Work Server (results)
[20:53:55]     (171.67.108.26:80)
[20:53:55]   Could not transmit unit 02 to Collection server; keeping in queue.
[20:54:00] Completed 3%
[20:54:45] Completed 4%
[20:55:29] Completed 5%
[20:56:12] Completed 6%
[20:56:56] Completed 7%
[20:57:39] Completed 8%
[20:58:22] Completed 9%
[20:59:06] Completed 10%
[20:59:49] Completed 11%

noorman · Post by **noorman** » Wed Feb 17, 2010 9:14 pm

.

For the moment, .26 is in REJECT ...

Try stopping the Client, deleting the Work folder and queue.dat file and restart the Client.

You might get redirected to other GPU servers like my system did. I haven't been without work since I did that sequence and I have no Results in queue either !

EDIT: You should, of course, always first check if your Work folder doesn't contain any finished or unfinished WU's before deleting that folder !

Edit by Mod:
CAUTION: If you have WUs in queue which you still hope will be uploaded, this is a bad idea, because it discards them.

tobor · Post by **tobor** » Wed Feb 17, 2010 9:43 pm

I've done that b4 but if I keep doen it I would'nt send anything...LOL

noorman · Post by **noorman** » Wed Feb 17, 2010 9:53 pm

tobor wrote:I've done that b4 but if I keep doen it I would'nt send anything...LOL

.

So, you didn't get redirected to a server that does its own collecting ...

I 've must have been lucky then

.

Folding Forum

GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: What do we do with all of the unsent workunits?

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26