Page 1 of 1

Client stalls while downloading a WU

Posted: Tue Apr 14, 2009 4:39 am
by anko1
I had posted awhile ago about my GPU client stalling while downloading a WU. I couldn't find the old SMP occurrence I alluded to in that post, but just got a new one. This is mostly just to let "people" know that it's still occurring occasionally.

Link to my original thread (with anandhanju's link to another report):

http://foldingforum.org/viewtopic.php?f=50&t=6874

Log of current incident. [I'm running Folding@Home Client Version 6.23 Beta R1; client proceeded normally after restart. This instance was on the Lenovo laptop w/ Windows XP Service Pack 3.]

Code: Select all

[02:22:00] Writing final coordinates.
[02:22:01] Past main M.D. loop
[02:22:01] Will end MPI now
[02:23:01] 
[02:23:01] Finished Work Unit:
[02:23:01] - Reading up to 3722928 from "work/wudata_06.arc": Read 3722928
[02:23:01] - Reading up to 1780128 from "work/wudata_06.xtc": Read 1780128
[02:23:01] goefile size: 0
[02:23:01] logfile size: 18095
[02:23:01] Leaving Run
[02:23:04] - Writing 5525551 bytes of core data to disk...
[02:23:04]   ... Done.
[02:23:04] - Failed to delete work/wudata_06.sas
[02:23:04] - Failed to delete work/wudata_06.goe
[02:23:04] Warning:  check for stray files
[02:23:04] - Shutting down core
[02:25:04] 
[02:25:04] Folding@home Core Shutdown: FINISHED_UNIT
[02:25:04] 
[02:25:04] Folding@home Core Shutdown: FINISHED_UNIT
[02:25:07] CoreStatus = 64 (100)
[02:25:07] Unit 6 finished with 71 percent of time to deadline remaining.
[02:25:07] Updated performance fraction: 0.739862
[02:25:07] Sending work to server
[02:25:07] Project: 2653 (Run 20, Clone 143, Gen 101)


[02:25:07] + Attempting to send results [April 10 02:25:07 UTC]
[02:25:07] - Reading file work/wuresults_06.dat from core
[02:25:07]   (Read 5525551 bytes from disk)
[02:25:07] Connecting to http://171.64.65.64:8080/
[02:25:13] Posted data.
[02:25:14] Initial: 0000; - Uploaded at ~599 kB/s
[02:25:16] - Averaged speed for that direction ~377 kB/s
[02:25:16] + Results successfully sent
[02:25:16] Thank you for your contribution to Folding@Home.
[02:25:16] + Number of Units Completed: 117

[02:25:20] - Warning: Could not delete all work unit files (6): Core returned invalid code
[02:25:20] Trying to send all finished work units
[02:25:20] + No unsent completed units remaining.
[02:25:20] - Preparing to get new work unit...
[02:25:20] + Attempting to get work packet
[02:25:20] - Will indicate memory of 2553 MB
[02:25:20] - Connecting to assignment server
[02:25:20] Connecting to http://assign.stanford.edu:8080/
[02:25:21] Posted data.
[02:25:21] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[02:25:21] + News From Folding@Home: Welcome to Folding@Home
[02:25:21] Loaded queue successfully.
[02:25:21] Connecting to http://171.64.65.64:8080/
[02:25:26] Posted data.
[02:25:26] Initial: 0000; - Receiving payload (expected size: 4756061)
[05:19:30] Killing all core threads
[05:19:30] Killing 3 cores
[05:19:30] Killing core 0
[05:19:30] Killing core 1
[05:19:30] Killing core 2

Folding@Home Client Shutdown at user request.
[05:19:30] ***** Got a SIGTERM signal (2)
[05:19:30] Killing all core threads
[05:19:30] Killing 3 cores
[05:19:30] Killing core 0
[05:19:30] Killing core 1
[05:19:30] Killing core 2

Folding@Home Client Shutdown.

Re: Client stalls while downloading a WU

Posted: Tue Apr 14, 2009 11:05 am
by MtM
I think the download failed and the client hung because of that, you could check with qd to see the status of all queue entries. The slot used by the wu which completed should have a status of 0, and an upload status of 1, the slot after that should be either a 4 or a 1, 1 is active 4 is fetching. If the status there is 4, on a restart it will retry the download, if it's one and there are no work files it will discard the slot and move on to the next one.

Edit: maybe the reason it didn't time out was that the network outage happend exactly between the connect and start of download, maybe the client only checks for timeout's during the initial connect, and during download it would have checks as well but not inbetween? It would also explain it's not a real common issue as it would have to be really coincidental for this to occur more then once in x attemps to download work.