Page 19 of 28

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:57 pm
by noorman
.

Good news (so far) / fingers crossed ... :D

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 10:21 pm
by dschief
process is still iffy, saw one go up a few minutes ago. then this from a different box.

Code: Select all

22:07:07] Trying to send all finished work units
[22:07:07] Project: 5786 (Run 2, Clone 73, Gen 46)
[22:07:07] - Read packet limit of 540015616... Set to 524286976.


[22:07:07] + Attempting to send results [February 19 22:07:07 UTC]
[22:07:07] - Reading file work/wuresults_02.dat from core
[22:07:07]   (Read 166309 bytes from disk)
[22:07:07] Connecting to http://171.67.108.21:8080/
[22:07:11] - Couldn't send HTTP request to server
[22:07:11] + Could not connect to Work Server (results)
[22:07:11]     (171.67.108.21:8080)
[22:07:11] + Retrying using alternative port
[22:07:11] Connecting to http://171.67.108.21:80/
[22:10:20] - Couldn't send HTTP request to server
[22:10:20] + Could not connect to Work Server (results)
[22:10:20]     (171.67.108.21:80)
[22:10:20] - Error: Could not transmit unit 02 (completed February 18) to work server.
[22:10:20] - 23 failed uploads of this unit.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 11:00 pm
by MichaelB
What ever PG and company are doing...........it is not working the return rate and hang for WU not found is worse today than yesterday. It seems I have a 50-50 chance of a finished WU uploading. If it doesn't upload it just hangs before it acquires a new WU. If I intervene and restart it I will get a new WU but who knows if it will upload when finished.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 1:18 am
by tobor
Whats worse ... It's almost The WEEEK END !!!! WE'RE ALL DOOOMED!!!!

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 2:15 am
by VijayPande
Well, Joe and I worked through last weekend on this, why should this one be any different!

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 3:30 am
by derrickmcc
I would like to thank Joe and Vijay for the efforts they are putting in to resolve this.

Over the last 24 hours I have had 38 WUs return to the results server, with 23 of these being over the last 6 hours. A number of these were WUs that failed to be sent back earlier in the day.

However, there are still issues: on my 4 GPU system one GPU hung for over 20 minutes attempting to send results (this was an hour ago) until I reset the wireless network connection.
(Note that my internet connection was still working fine, the reset was simply to force the GPU client to try to connect to the results server again.)

I still have some WU's which have failed to return to the results server, but will wait and see what tomorrow brings.

Image

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 3:53 am
by Mark620
This is a single gpu machine
all of my 2 gpu machines have the same problem

Code: Select all

[23:13:27] + Attempting to send results [February 19 23:13:27 UTC]
[23:13:31] - Couldn't send HTTP request to server
[23:13:31] + Could not connect to Work Server (results)
[23:13:31]     (171.67.108.21:8080)
[23:13:31] + Retrying using alternative port
[23:13:52] - Couldn't send HTTP request to server
[23:13:52] + Could not connect to Work Server (results)
[23:13:52]     (171.67.108.21:80)
[23:13:52] - Error: Could not transmit unit 08 (completed February 19) to work server.
[23:13:52] - Read packet limit of 540015616... Set to 524286976.


[23:13:52] + Attempting to send results [February 19 23:13:52 UTC]
[23:13:56] - Server does not have record of this unit. Will try again later.
[23:13:56]   Could not transmit unit 08 to Collection server; keeping in queue.
[23:13:56] - Preparing to get new work unit...
[23:13:56] + Attempting to get work packet
[23:13:56] - Connecting to assignment server
[23:13:57] - Successful: assigned to (171.64.65.20).
[23:13:57] + News From Folding@Home: Welcome to Folding@Home
[23:13:57] Loaded queue successfully.
[23:13:58] Project: 5786 (Run 8, Clone 4, Gen 19)
[23:13:58] - Read packet limit of 540015616... Set to 524286976.


[23:13:58] + Attempting to send results [February 19 23:13:58 UTC]
[23:14:02] - Couldn't send HTTP request to server
[23:14:02] + Could not connect to Work Server (results)
[23:14:02]     (171.67.108.21:8080)
[23:14:02] + Retrying using alternative port
[23:14:23] - Couldn't send HTTP request to server
[23:14:23] + Could not connect to Work Server (results)
[23:14:23]     (171.67.108.21:80)
[23:14:23] - Error: Could not transmit unit 08 (completed February 19) to work server.
[23:14:23] - Read packet limit of 540015616... Set to 524286976.


[23:14:23] + Attempting to send results [February 19 23:14:23 UTC]
[23:14:27] - Server does not have record of this unit. Will try again later.
[23:14:27]   Could not transmit unit 08 to Collection server; keeping in queue.
[23:14:27] + Closed connections
[23:14:27] 
[23:14:27] + Processing work unit
[23:14:27] Core required: FahCore_14.exe
[23:14:27] Core found.
[23:14:27] Working on queue slot 05 [February 19 23:14:27 UTC]
[23:14:27] + Working ...
[23:14:28] 
[23:14:28] *------------------------------*
[23:14:28] Folding@Home GPU Core - Beta
[23:14:28] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[23:14:28] 
[23:14:28] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[23:14:28] Build host: vspm46
[23:14:28] Board Type: Nvidia
[23:14:28] Core      : 
[23:14:28] Preparing to commence simulation
[23:14:28] - Looking at optimizations...
[23:14:28] - Created dyn
[23:14:28] - Files status OK
[23:14:28] - Expanded 66529 -> 360060 (decompressed 541.2 percent)
[23:14:28] Called DecompressByteArray: compressed_data_size=66529 data_size=360060, decompressed_data_size=360060 diff=0
[23:14:28] - Digital signature verified
[23:14:28] 
[23:14:28] Project: 5910 (Run 14, Clone 218, Gen 0)
[23:14:28] 
[23:14:28] Assembly optimizations on if available.
[23:14:28] Entering M.D.
[23:14:34] Tpr hash work/wudata_05.tpr:  2648018779 2358690084 2468980589 348568324 1229894467
[23:14:34] Working on Protein
[23:14:35] Client config found, loading data.
[23:14:35] Starting GUI Server
[23:15:55] Completed 1%
[23:17:45] Completed 2%
[23:19:31] Completed 3%
[23:21:28] Completed 4%
[23:23:10] Completed 5%
[23:25:03] Completed 6%
[23:26:46] Completed 7%
[23:28:28] Completed 8%
[23:30:17] Completed 9%
[23:32:03] Completed 10%
[23:34:00] Completed 11%
[23:35:43] Completed 12%
[23:37:29] Completed 13%
[23:39:18] Completed 14%
[23:41:00] Completed 15%
[23:42:54] Completed 16%
[23:44:40] Completed 17%
[23:46:29] Completed 18%
[23:48:15] Completed 19%
[23:50:12] Completed 20%
[23:52:13] Completed 21%
[23:53:59] Completed 22%
[23:55:55] Completed 23%
[23:57:45] Completed 24%
[23:59:35] Completed 25%
[00:01:28] Completed 26%
[00:03:10] Completed 27%
[00:05:00] Completed 28%
[00:06:49] Completed 29%
[00:08:35] Completed 30%
[00:10:29] Completed 31%
[00:12:22] Completed 32%
[00:14:11] Completed 33%
[00:15:57] Completed 34%
[00:17:47] Completed 35%
[00:19:33] Completed 36%
[00:21:19] Completed 37%
[00:23:08] Completed 38%
[00:24:54] Completed 39%
[00:26:48] Completed 40%
[00:28:34] Completed 41%
[00:30:23] Completed 42%
[00:32:09] Completed 43%
[00:34:02] Completed 44%
[00:35:48] Completed 45%
[00:37:42] Completed 46%
[00:39:28] Completed 47%
[00:41:10] Completed 48%
[00:42:56] Completed 49%
[00:44:42] Completed 50%
[00:46:24] Completed 51%
[00:48:10] Completed 52%
[00:49:52] Completed 53%
[00:51:49] Completed 54%
[00:53:35] Completed 55%
[00:55:17] Completed 56%
[00:57:22] Completed 57%
[00:59:07] Completed 58%
[01:00:53] Completed 59%
[01:02:47] Completed 60%
[01:04:33] Completed 61%
[01:06:22] Completed 62%
[01:08:12] Completed 63%
[01:10:05] Completed 64%
[01:11:55] Completed 65%
[01:13:41] Completed 66%
[01:15:31] Completed 67%
[01:17:13] Completed 68%
[01:19:02] Completed 69%
[01:20:48] Completed 70%
[01:22:34] Completed 71%
[01:24:24] Completed 72%
[01:26:25] Completed 73%
[01:28:18] Completed 74%
[01:30:04] Completed 75%
[01:31:53] Completed 76%
[01:33:39] Completed 77%
[01:35:21] Completed 78%
[01:37:07] Completed 79%
[01:39:01] Completed 80%

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 5:13 am
by bruce
derrickmcc wrote:I would like to thank Joe and Vijay for the efforts they are putting in to resolve this.

Over the last 24 hours I have had 38 WUs return to the results server, with 23 of these being over the last 6 hours. A number of these were WUs that failed to be sent back earlier in the day.

However, there are still issues: on my 4 GPU system one GPU hung for over 20 minutes attempting to send results (this was an hour ago) until I reset the wireless network connection.
(Note that my internet connection was still working fine, the reset was simply to force the GPU client to try to connect to the results server again.)

I still have some WU's which have failed to return to the results server, but will wait and see what tomorrow brings.
I consider this a very positive report. Having WUs upload which have not been uploading is a very good sign that SOMETHING is going right that wasn't going right before.

The fact that not everything can be uploaded isn't necessarily a bad sign. We are are on page 19 of this topic and there are several other topics on the same subject. If I had to respond to each of these posts, it would take me a long time. The server is facing a similar issue with a huge number of clients all trying to upload from their relatively long queue of WUs. Even if a perfect fix has been installed on the server, it will take quite a while before the server is able to successfully accept that backlog. Waiting to see what tomorrow brings is the right attitude, though it may take longer than one day. Moreover, if there are several problems, Joe/Vijay won't be able to tell until the load goes down enough to see what's actually going up, what's being rejected because of the overload, and if there still are some that are rejected for some other unexplained reason.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 5:28 am
by *hondo*
Hello Bruce can you let me know if this one got back to the server please?
*hondo* wrote:Hello there folks :) Is there anyway someone within this forum can either tell / let me know if this WU has hit the collection server please? 18:28:01] Project: 10105 (Run 26, Clone 9, Gen 19)[

If it has, I may have stumbled on a rough neck solution


[18:28:01] + Attempting to send results [February 19 18:28:01 UTC]
[18:28:05] - Server does not have record of this unit. Will try again later.
[18:28:05] - Error: Could not transmit unit 06 (completed February 19) to work server.
[18:28:05] Keeping unit 06 in queue.
[18:28:05] Project: 10105 (Run 26, Clone 9, Gen 19)


[18:28:05] + Attempting to send results [February 19 18:28:05 UTC]
[18:28:08] - Server does not have record of this unit. Will try again later.
[18:28:08] - Error: Could not transmit unit 06 (completed February 19) to work server.



[18:28:08] + Attempting to send results [February 19 18:28:08 UTC]
[18:30:40] - Server does not have record of this unit. Will try again later.
[18:30:40] Could not transmit unit 06 to Collection server; keeping in queue.
[18:30:40] - Preparing to get new work unit...

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 6:18 am
by bruce
*hondo* wrote:Hello Bruce can you let me know if this one got back to the server please?
*hondo* wrote:Hello there folks :) Is there anyway someone within this forum can either tell / let me know if this WU has hit the collection server please? 18:28:01] Project: 10105 (Run 26, Clone 9, Gen 19)[

If it has, I may have stumbled on a rough neck solution
It depends on who is asking. The WU has been completed but not by anybody named Hondo so it's reasonable to assume that it was reassigned and somebody else completed it while you still have a second copy waiting to upload.

If you can give me the name to whom it should have been credited and/or the upload/download times associated with the copy you're talking about I'll provide you the information.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 7:19 am
by VijayPande
derrickmcc wrote:I would like to thank Joe and Vijay for the efforts they are putting in to resolve this.

Over the last 24 hours I have had 38 WUs return to the results server, with 23 of these being over the last 6 hours. A number of these were WUs that failed to be sent back earlier in the day.

However, there are still issues: on my 4 GPU system one GPU hung for over 20 minutes attempting to send results (this was an hour ago) until I reset the wireless network connection.
(Note that my internet connection was still working fine, the reset was simply to force the GPU client to try to connect to the results server again.)

I still have some WU's which have failed to return to the results server, but will wait and see what tomorrow brings.

Image
That's great to hear. I'd like to think this is now fixed, but having felt that way several times over the last week, I think it's still too early to know for sure. I think at least in this case, Joe's WS fix makes sense. We'll know more tomorrow morning. As of right now, it looks good.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 7:29 am
by seanego
These WUs were sitting here several days unsuccessfully trying to upload... and this morning - WHOA!!!

Code: Select all

[07:13:22] + Processing work unit
[07:13:22] - Autosending finished units... [February 20 07:13:22 UTC]
[07:13:22] Trying to send all finished work units
[07:13:22] + Attempting to send results [February 20 07:13:22 UTC]
[07:13:22] - Reading file work/wuresults_07.dat from core
[07:13:24] Posted data.
[07:13:24] Initial: 0000; - Uploaded at ~65 kB/s
[07:13:24] - Averaged speed for that direction ~30 kB/s
[07:13:24] + Results successfully sent
[07:13:24] Thank you for your contribution to Folding@Home.

Code: Select all

[07:12:58] - Autosending finished units... [February 20 07:12:58 UTC]
[07:12:58] Trying to send all finished work units
[07:12:58] + Attempting to send results [February 20 07:12:58 UTC]
[07:12:58] - Reading file work/wuresults_01.dat from core
[07:13:00] Posted data.
[07:13:00] Initial: 0000; - Uploaded at ~32 kB/s
[07:13:00] - Averaged speed for that direction ~48 kB/s
[07:13:00] + Results successfully sent
[07:13:00] Thank you for your contribution to Folding@Home

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 7:39 am
by *hondo*
bruce wrote:It depends on who is asking. The WU has been completed but not by anybody named Hondo so it's reasonable to assume that it was reassigned and somebody else completed it while you still have a second copy waiting to upload.

If you can give me the name to whom it should have been credited and/or the upload/download times associated with the copy you're talking about I'll provide you the information.
Hello Bruce, yes, it was completed & despatched by myself *hondo* Time & Date now =07:33 GMT on the 20/2/10

--- Opening Log file [February 19 20:53:27 UTC]



Launch directory: C:\Documents and Settings\hondo\Application Data\Folding@home-gpu


[20:53:27] - Ask before connecting: No
[20:53:27] - User name: *hondo* (Team 51078)
[20:53:27] - User ID: 466BBE2697F90E6
[20:53:27] - Machine ID: 2
[20:53:27]
[20:53:27] Loaded queue successfully.
[20:53:27] Initialization complete
[20:53:27]
[20:53:27] + Processing work unit
[20:53:27] Project: 10105 (Run 26, Clone 9, Gen 19)


[20:53:27] + Attempting to send results [February 19 20:53:27 UTC]
[20:53:27] Core required: FahCore_11.exe
[20:53:27] Core found.
[20:53:27] Working on queue slot 08 [February 19 20:53:27 UTC]
[20:53:27] + Working ...
[20:53:27]
[20:53:27] *------------------------------*
[20:53:27] Folding@Home GPU Core
[20:53:27] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[20:53:27]
[20:53:27] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[20:53:27] Build host: amoeba
[20:53:27] Board Type: Nvidia
[20:53:27] Core :
[20:53:27] Preparing to commence simulation
[20:53:27] - Looking at optimizations...
[20:53:27] - Files status OK
[20:53:27] - Expanded 46700 -> 252912 (decompressed 541.5 percent)
[20:53:27] Called DecompressByteArray: compressed_data_size=46700 data_size=252912, decompressed_data_size=252912 diff=0
[20:53:28] - Digital signature verified
[20:53:28]
[20:53:28] Project: 5768 (Run 10, Clone 49, Gen 1015)
[20:53:28]
[20:53:28] Assembly optimizations on if available.
[20:53:28] Entering M.D.
[20:53:30] + Results successfully sent
[20:53:30] Thank you for your contribution to Folding@Home.
[20:53:30] + Number of Units Completed: 27

[20:53:34] Will resume from checkpoint file
[20:53:34] Tpr hash work/wudata_08.tpr: 1731069168 1455893831 2628285029 1746427847 2101809831
[20:53:34]

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 7:47 am
by PantherX
F@H has been successfully uploading WUs so i guess that's a great news for all of us. I did notice that it takes at least 6 attempts to get a new WU and am hoping that it isn't a new bug or something like that.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 20, 2010 9:50 am
by heikosch
heikosch wrote:More than 25 unsuccessful attempts to upload a WU to 171.67.108.26. :-(

Heiko
Attempt 51 was successful. :D

Heiko