Page 2 of 5

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 5:25 pm
by jclu52
Oh, well, it's rejecting again.

Code: Select all

Sat May 21 08:45:10 PDT 2011	171.64.65.64	GPU	vspg2v	lin5	full	Reject	2.25	0	0	2	17883	3101	5	0	113254	113254	113254	-	-	10	-	0	0	-	-	 1	171.64.122.86
171.67.108.25
-	0	 0	W;	100	6.119	-	49	64	-	-	; , 3	F	8080G	-	-	-	-	0	lin5	1	vspg2v	
Sat May 21 09:20:10 PDT 2011	171.64.65.64	GPU	vspg2v	lin5	full	Reject	2.16	0	0	2	17883	3101	5	0	113254	113254	113254	-	-	-	-	0	0	-	-	 1	171.64.122.86
171.67.108.25
-	0	 0	W;	100	6.119	-	49	64	-	-	; , 3	F	8080G	-	-	-	-	0	lin5	1	vspg2v	
Sat May 21 09:55:10 PDT 2011	171.64.65.64	GPU	vspg2v	lin5	full	Reject	2.31	0	0	2	17883	3100	4	0	113254	113254	113254	-	-	-	-	0	0	-	-	 1	171.64.122.86
171.67.108.25
-	0	 0	W;	100	6.119	-	49	64	-	-	; , 3	F	8080G	-	-	-	-	0	lin5	1	vspg2v

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 6:17 pm
by Xavier Zepherious
seems I can get new WU if I shutdown the client and re-start it - but it hangs again on resending when it completes
so..I have to monitor manually the client - shutdown/restarting it after each complete unit

I don't want to kill the completed WU (which it's failing on - can't send)
but in order to leave it unmonitored I have to remove it or kill it

PS: will someone respond to the users posting on this issue...we would like to kept up on what is going on

Re: Project 10720: Unable to send results

Posted: Sat May 21, 2011 6:58 pm
by icspotz
I have a similar problem with a different project number #6801 with 16 failed upload attempts

Fah log file:

Code: Select all

[15:31:34] Completed 49499999 out of 50000000 steps (99%).
[15:33:33] Completed 49999999 out of 50000000 steps (100%).
[15:33:34] Finished fah_main
[15:33:34]
[15:33:34] Successful run
[15:33:34] DynamicWrapper: Finished Work Unit: sleep=10000
[15:33:43] Reserved 2471344 bytes for xtc file; Cosm status=0
[15:33:43] Allocated 2471344 bytes for xtc file
[15:33:43] - Reading up to 2471344 from "work/wudata_01.xtc": Read 2471344
[15:33:43] Read 2471344 bytes from xtc file; available packet space=783959120
[15:33:43] xtc file hash check passed.
[15:33:43] Reserved 76680 76680 783959120 bytes for arc file=<work/wudata_01.trr> Cosm status=0
[15:33:43] Allocated 76680 bytes for arc file
[15:33:43] - Reading up to 76680 from "work/wudata_01.trr": Read 76680
[15:33:43] Read 76680 bytes from arc file; available packet space=783882440
[15:33:43] trr file hash check passed.
[15:33:43] Allocated 544 bytes for edr file
[15:33:43] Read bedfile
[15:33:43] edr file hash check passed.
[15:33:43] Allocated 120324 bytes for logfile
[15:33:43] Read logfile
[15:33:43] GuardedRun: success in DynamicWrapper
[15:33:43] GuardedRun: done
[15:33:43] Run: GuardedRun completed.
[15:33:45] + Opened results file
[15:33:45] - Writing 2669404 bytes of core data to disk...
[15:33:46] Done: 2668892 -> 2511208 (compressed to 94.0 percent)
[15:33:46] ... Done.
[15:33:46] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[15:33:47] Shutting down core
[15:33:47]
[15:33:47] Folding@home Core Shutdown: FINISHED_UNIT
[15:33:51] CoreStatus = 64 (100)
[15:33:51] Unit 1 finished with 97 percent of time to deadline remaining.
[15:33:51] Updated performance fraction: 0.971493
[15:33:51] Sending work to server
[15:33:51] Project: 6801 (Run 8739, Clone 1, Gen 13)
[15:33:51] - Read packet limit of 540015616... Set to 524286976.


[15:33:51] + Attempting to send results [May 21 15:33:51 UTC]
[15:33:51] - Reading file work/wuresults_01.dat from core
[15:33:51] (Read 2511720 bytes from disk)
[15:33:51] Gpu type=3 species=30.
[15:33:51] Connecting to http://171.64.65.64:8080/
[15:33:52] - Couldn't send HTTP request to server
[15:33:52] + Could not connect to Work Server (results)
[15:33:52] (171.64.65.64:8080)
[15:33:52] + Retrying using alternative port
[15:33:52] Connecting to http://171.64.65.64:80/
[15:33:53] - Couldn't send HTTP request to server
[15:33:53] + Could not connect to Work Server (results)
[15:33:53] (171.64.65.64:80)
[15:33:53] - Error: Could not transmit unit 01 (completed May 21) to work server.
[15:33:53] - 1 failed uploads of this unit.
[15:33:53] Keeping unit 01 in queue.
[15:33:53] Trying to send all finished work units
[15:33:53] Project: 6801 (Run 8739, Clone 1, Gen 13)
[15:33:53] - Read packet limit of 540015616... Set to 524286976.
Mod Edit: Added Code Tags - PantherX

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 7:21 pm
by 7im
While the IT department does work weekends, the PR department doesn't. We may not see updates posted until Monday morning, although I would expect the server to be working again before then, if possible. ;)

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 8:09 pm
by jclu52
PS: will someone respond to the users posting on this issue...we would like to kept up on what is going on
The Server Status Page

Server log for 171.64.65.64
Xavier Zepherious wrote:seems I can get new WU if I shutdown the client and re-start it - but it hangs again on resending when it completes
so..I have to monitor manually the client - shutdown/restarting it after each complete unit

I don't want to kill the completed WU (which it's failing on - can't send)
but in order to leave it unmonitored I have to remove it or kill it
Like you said, you don't have to kill the completed WU. Just shutdown then start the GPU client / SMP client. The WU will be kept in the queue to be submitted. Like 7im said:
7im wrote:While the IT department does work weekends, the PR department doesn't. We may not see updates posted until Monday morning, although I would expect the server to be working again before then, if possible. ;)
We just have to wait a bit. :D

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 8:49 pm
by yslin
Hi,

I've been working on this server but it might take more to fix. Sorry for the inconveniences!


yslin

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 9:25 pm
by GreyWhiskers
jclu52 wrote: Like you said, you don't have to kill the completed WU. Just shutdown then start the GPU client / SMP client. The WU will be kept in the queue to be submitted.
However, comma, if you are running v6, there is a round-robin or circular queue 10 WUs long where results are kept. In my own case, I would process a p6801 WU from 171.64.65.64 every 2.2 hours, giving ~22 hours before the circular queue got back to an item.

I'm observing that since WUs aren't available from 171.64.65.64 for the fermi, the assignment server is sending my GPU client to 171.67.108.32, where it is serving up a series of 109xx and 112xx WUs. My GPU processes these in about 1.2 hours, giving the v6 circular buffer about 12 hours for the circular queue to wrap around.

When 171.64.65.64 went down, I was processing a p6801 in queue slot 9 that couldn't be uploaded. Every time the client has a completed new WU, it also tries to get rid of the old p6801. I'm now processing p109xx or p112xx from queue slot 2, so there are still 7 hours or so until the circular queue will wrap around and this WU overwritten (I think). Bad news if 171.64.65.64 isn't accepting uploads by then is that Stanford may lose the results from my computations, and I may lose the 1348 points for the P6801 WU. Good news for my particular GTX560 Ti card is that the smaller WUs seem to be more productive on my hardware (~19k ppd vs ~14,378 PPD).

v7 does better than v6 in that it will keep a pending WU upload indefinitely until it can upload. The reports that I and many others have made about stuck v7 uploads were related to uploading partial results from EUEs, not uploading full results from successful runs.

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 10:12 pm
by jclu52
GreyWhiskers wrote:However, comma, if you are running v6, there is a round-robin or circular queue 10 WUs long where results are kept. In my own case, I would process a p6801 WU from 171.64.65.64 every 2.2 hours, giving ~22 hours before the circular queue got back to an item.
Thanks for explaining it in great details. It really helps to understand how it worked and how to deal with problems when it happens. :ewink:

I have not spent enough time to learn more about F@H as I wanted. In fact, that's one thing about F@H I am having trouble with. There are so much information but I don't know what to look for to get a better grasp about F@H. When using tools like the HFM.NET and FahSpy, I am not sure what I am looking at when viewing the logs, the benchmark, etc. All I am able to do was installing the GPU systray client and the SMP client to run (in service mode) & configure properly.

I am interested to learn more about the F@H but can't seem to locate a centralized / authoritative information. :(

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 10:28 pm
by l67swap
so i guess we just keep watching the server page to see when the server will be live again?

Re: 171.64.65.64 overloaded

Posted: Sat May 21, 2011 10:56 pm
by VijayPande
It's still having problems, we so we're doing a hard reboot. The machine will likely fsck for a while. We'll give you an update when we know more.

Re: Project 10720: Unable to send results

Posted: Sat May 21, 2011 11:57 pm
by ChrisM101
Same Issue here Today...6801 not uploading has me backed up and not making pts.

Code: Select all

--- Opening Log file [May 21 23:46:49 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: D:\Downloads\FAH_GPU_Tracker_V2\FAH GPU Tracker V2\GPU0
Executable: D:\Downloads\FAH_GPU_Tracker_V2\FAH GPU Tracker V2\FAH_GPU3.exe
Arguments: -oneunit -forcegpu nvidia_fermi -advmethods -verbosity 9 -gpu 0 

[23:46:49] - Ask before connecting: No
[23:46:49] - User name: ChrisM101 (Team 111065)
[23:46:49] - User ID: 21BE43D7336DE836
[23:46:49] - Machine ID: 3
[23:46:49] 
[23:46:49] Gpu type=3 species=30.
[23:46:49] Loaded queue successfully.
[23:46:49] - Preparing to get new work unit...
[23:46:49] Cleaning up work directory
[23:46:49] - Autosending finished units... [May 21 23:46:49 UTC]
[23:46:49] Trying to send all finished work units
[23:46:49] Project: 6801 (Run 2181, Clone 4, Gen 14)
[23:46:49] - Read packet limit of 540015616... [23:46:49] + Attempting to get work packet
Set to 524286976.
[23:46:49] Passkey found
[23:46:49] - Will indicate memory of 6135 MB


[23:46:49] Gpu type=3 species=30.
[23:46:49] + Attempting to send results [May 21 23:46:49 UTC]
[23:46:49] - Detect CPU.[23:46:49] - Reading file work/wuresults_01.dat from core
 Vendor: GenuineIntel, Family: 6, Model: 10, Stepping: 5
[23:46:49] - Connecting to assignment server
[23:46:49] Connecting to http://assign-GPU.stanford.edu:8080/
[23:46:49]   (Read 2509828 bytes from disk)
[23:46:49] Gpu type=3 species=30.
[23:46:49] Connecting to http://171.64.65.64:8080/
[23:46:50] Posted data.
[23:46:50] Initial: 43AB; - Successful: assigned to (171.67.108.32).
[23:46:50] + News From Folding@Home: Welcome to Folding@Home
[23:46:50] Loaded queue successfully.
[23:46:50] Gpu type=3 species=30.
[23:46:50] Sent data
[23:46:50] Connecting to http://171.67.108.32:8080/
[23:46:50] Posted data.
[23:46:50] Initial: 0000; - Receiving payload (expected size: 20648)
[23:46:50] Conversation time very short, giving reduced weight in bandwidth avg
[23:46:50] - Downloaded at ~40 kB/s
[23:46:50] - Averaged speed for that direction ~41 kB/s
[23:46:50] + Received work.
[23:46:50] + Closed connections
[23:46:50] 
[23:46:50] + Processing work unit
[23:46:50] Core required: FahCore_15.exe
[23:46:50] Core found.
[23:46:50] Working on queue slot 03 [May 21 23:46:50 UTC]
[23:46:50] + Working ...
[23:46:50] - Calling '.\FahCore_15.exe -dir work/ -suffix 03 -nice 19 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 5992 -version 630'

[23:46:50] 
[23:46:50] *------------------------------*
[23:46:50] Folding@Home GPU Core
[23:46:50] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[23:46:50] 
[23:46:50] Build host: SimbiosNvdWin7
[23:46:50] Board Type: NVIDIA/CUDA
[23:46:50] Core      : x=15
[23:46:50]  Window's signal control handler registered.
[23:46:50] Preparing to commence simulation
[23:46:50] - Looking at optimizations...
[23:46:50] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[23:46:51] - Couldn't send HTTP request to server
[23:46:51] + Could not connect to Work Server (results)
[23:46:51]     (171.64.65.64:8080)
[23:46:51] + Retrying using alternative port
[23:46:51] Connecting to http://171.64.65.64:80/
[23:46:51] - Created dyn
[23:46:51] - Files status OK
[23:46:51] sizeof(CORE_PACKET_HDR) = 512 file=<>
[23:46:51] - Expanded 20136 -> 77539 (decompressed 385.0 percent)
[23:46:51] Called DecompressByteArray: compressed_data_size=20136 data_size=77539, decompressed_data_size=77539 diff=0
[23:46:51] - Digital signature verified
[23:46:51] 
[23:46:51] Project: 10950 (Run 0, Clone 68, Gen 18)
[23:46:51] 
[23:46:51] Assembly optimizations on if available.
[23:46:51] Entering M.D.
[23:46:52] - Couldn't send HTTP request to server
[23:46:52] + Could not connect to Work Server (results)
[23:46:52]     (171.64.65.64:80)
[23:46:52] - Error: Could not transmit unit 01 (completed May 21) to work server.
[23:46:52] - 21 failed uploads of this unit.
[23:46:52] - Read packet limit of 540015616... Set to 524286976.

Re: Project 10720: Unable to send results

Posted: Sun May 22, 2011 12:07 am
by k1wi
ChrisM101 and icspotz - please refer to this thread:

http://foldingforum.org/viewtopic.php?f=18&t=18681

Edit by Mod: Posts moved to the correct topic.

Re: 171.64.65.64 overloaded

Posted: Sun May 22, 2011 2:00 am
by bruce
GreyWhiskers wrote:However, comma, if you are running v6, there is a round-robin or circular queue 10 WUs long where results are kept. In my own case, I would process a p6801 WU from 171.64.65.64 every 2.2 hours, giving ~22 hours before the circular queue got back to an item.
However, comma, ;) that doesn't matter. The queue will get full if you have a total of 10 WUs either processing or waiting to upload. The V6 client is perfectly happy looping through the 9 open positions when one WU is stuck uploading. It will skip a queue position that happens to be in use and does not reassign those positions unless you accumulate 10 WUs that are all waiting to upload (and that's not going to happen).

Re: 171.64.65.64 overloaded

Posted: Sun May 22, 2011 2:07 am
by GreyWhiskers
Thanks for the update. I wasn't aware that "used" queue positions were skipped. :oops: :oops:

Re: 171.64.65.64 overloaded

Posted: Sun May 22, 2011 2:17 am
by jclu52
bruce wrote:However, comma, ;) that doesn't matter. The queue will get full if you have a total of 10 WUs either processing or waiting to upload. The V6 client is perfectly happy looping through the 9 open positions when one WU is stuck uploading. It will skip a queue position that happens to be in use and does not reassign those positions unless you accumulate 10 WUs that are all waiting to upload (and that's not going to happen).
Great info!! Thanks, bruce! :biggrin: