Page 1 of 1
Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 1:22 am
by Alan C. Lawhon
I have a WU, (i.e. 10501 (182, 0, 1016)), which completed execution over an hour ago. It appears to be stuck in "Send" mode. Here are the details.
Send ID (slot?) = 01 (There is another WU in "Download" status in slot 00, but it has not begun executing. It appears to be waiting for slot 01 to finish before it begins running.)
Progress = 100.00%
ETA = "Unknown"
Credit = "Unknown"
I have a 2.39 day ETA work unit, (i.e. Project 7809 (9, 307, 43), that is currently running in slot 02. Its progress is currently indicating 81.56% completion.)
It appears that WU 10501 (182, 0, 1016) is hung and is not transmitting back to the server. What is the procedure for killing a WU or deleting a WU that appears to be stuck from a particular slot?
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 1:27 am
by Joe_H
Please post your log. Include the beginning that shows the system configuration and the section that shows the end of processing on the WU. The WU might be hung, or it could just be slow in wrapping up its work files to send back. There is also a known bug in some versions of the client where it does not recover from a network error and retry sending a WU. But we need more information to do more than guess.
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 1:29 am
by Napoleon
Just wait a while. Some servers are down at the moment, viewtopic.php?f=18&t=23759.
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 1:55 am
by Alan C. Lawhon
Napoleon:
Yep, that appears to be the hangup - a downed server. Thanks for the info.
01:11:37:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:10501 run:182 clone:0 gen:1016 core:0x11 unit:0x000008266652eda54b6ea7d300004719
01:11:37:WU01:FS00:Uploading 128.13KiB to 171.67.108.21
01:11:37:WU01:FS00:Connecting to 171.67.108.21:8080
01:11:58:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
01:11:58:WU01:FS00:Connecting to 171.67.108.21:80
01:12:19:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 3:40 am
by Ripper36
I'm getting the same problem on 171.67.108.11
Server status shows both 171.67.108.11 and 171.67.108.21 down (vsp07v and vsp07b)
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 5:03 am
by bruce
All of the vsp07* servers are down. The Pande Group was notified much earlier.
I'm not sure the nature of the problem, but they'll fix it whenever they can.
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 4:13 pm
by Alan C. Lawhon
I usually have two slots simultaneously executing WUs on my machine - one is a GPU slot and the other is an SMP slot. It's now been close to 18 hours with my machine "hung" and getting no production out of one of the slots. Can someone please tell me how to dump a non-executing slot so that I can (hopefully) get back to crunching work units on both slots?
Thanks!
Re: Completed WU (Apparently) Hung In "Send" Mode
Posted: Fri Feb 22, 2013 4:18 pm
by Joe_H
You are misunderstanding the situation. The WU waiting to be uploaded is not keeping the GPU slot from processing work. The servers that have suitable WU's for your model of GPU are down, so your client can not download a new WU to process in that slot. PG has been notified, follow the other topic Napoleon linked to to see if there are any changes.