Page 1 of 1
WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 1:20 pm
by DrBB1
I wasn't quite sure how to word the subject line, but here's the question. Due to the current server issues with 128.59.174.4, I have had a WU in queue, slot #2, for almost two weeks. Soon, the 10th WU after that one will be downloaded into slot #1. When that WU is complete, and the next WU is downloaded, will it write over the WU waiting for upload in slot #2, or will it skip over and go to slot #3? If the former, is there anything I can do to save the work?
Thanks in advance.
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 1:34 pm
by Grandpa_01
Good question, I would like to know the answer to this also. I have a coupple that have never uploaded to 128.59.171.4 from back in Febuary.
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 1:51 pm
by ChelseaOilman
It's supposed to skip over a slot with a WU waiting to upload a finished WU.
What's the PRCG of that WU? How close to the final deadline are you for that WU?
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 1:53 pm
by MtM
It will skip the slot if it's still occupied. If the wu has passed it's deadline, the wu will be deleted and the slot will be flagged as available again.
Grandpa_01 I can't find that server ip on the psummary or on the server status page?
Edit: normally, the forum will say 'atleast one other post has been made since you began your reply, I used the quickreply so I didn't notice, sorry for repeating your answer ChelseaOilman
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 2:08 pm
by Grandpa_01
Well if I typed it right you might be able to.
[02:51:06] + Attempting to get work packet
[02:51:06] - Connecting to assignment server
[02:51:06] - Successful: assigned to (171.67.108.13).
[02:51:06] + News From Folding@Home: Welcome to Folding@Home
[02:51:07] Loaded queue successfully.
[02:51:08] Project: 3856 (Run 304, Clone 5, Gen 42)
[02:51:08] + Attempting to send results [March 9 02:51:08 UTC]
[02:51:29] - Couldn't send HTTP request to server
[02:51:29] + Could not connect to Work Server (results)
[02:51:29] (128.59.74.4:8080)
[02:51:29] + Retrying using alternative port
[02:51:50] - Couldn't send HTTP request to server
[02:51:50] + Could not connect to Work Server (results)
[02:51:50] (128.59.74.4:80)
[02:51:50] - Error: Could not transmit unit 03 (completed February 23) to work server.
[02:51:50] + Attempting to send results [March 9 02:51:50 UTC]
[02:51:50] - Couldn't send HTTP request to server
[02:51:50] + Could not connect to Work Server (results)
[02:51:50] (171.65.103.100:8080)
[02:51:50] + Retrying using alternative port
[02:51:50] - Couldn't send HTTP request to server
[02:51:50] (Got status 503)
[02:51:50] + Could not connect to Work Server (results)
[02:51:50] (171.65.103.100:80)
[02:51:50] Could not transmit unit 03 to Collection server; keeping in queue.
[02:51:50] + Closed connections
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 2:14 pm
by MtM
Stil can't find the work server ( 128.59.74.4 ) but judging from the new wu you got assigned ( if it was the same deadline as most gromacs projects ) you're cutting it close ( 66 days )
The CS 171.65.103.100 is accepting though, maybe restart the client?
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 2:48 pm
by Grandpa_01
I have restarted it a coupple of times. I have pretty much given up on sending them. There were 2 WU #3 & #4 Feb 23 & 24
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 9:54 pm
by ChelseaOilman
MtM wrote:Stil can't find the work server ( 128.59.74.4 )
viewtopic.php?p=86874#p86874
mrshirts wrote:Update on 128.59.74.4:
The good news, all the data (2 TB) is safe. I was able to rebuild and mount the raid. The bad news is, the server won't boot normally. Since it's actually at Columbia (where I don't work anymore), I'm a bit at the mercy of the IT support staff there in terms of getting it up and running again. The current plan is therefore to copy the data off that is needed to continue the projects, and try to relay the IP to a different machine, putting it into accept only mode. Time line is probably going to be about a week, unfortunately.
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 9:58 pm
by MtM
That would explain it's not on serverstat but I didn't expect it to be removed from psummary so I didn't search for an announcement.
He should still be able to upload since the collection server was at the time of my last post online and accepting.
Re: WU Download Process When Queue Slot Filled
Posted: Mon Mar 09, 2009 10:49 pm
by kelliegang
Ya... I've got a total of 5 work units on the 4 computers awaiting upload to that server [74.4].. and the CSs have been so choked that I havent been able to upload to them either... 104 failed uploads per client now....
Mrshirts mentioned it should be up Monday... but no luck yet, hope they can get uploaded soon
Re: WU Download Process When Queue Slot Filled
Posted: Fri Mar 13, 2009 2:44 am
by DrBB1
Sorry to have asked the original question and then disappear; I've been preoccupied with more critical issues. If I understand where the thread has gone, it sounds like the queue slot is not going to be an issue, but the deadline on the WU may be at some point in time (for me the deadline is May 1).
I did notice that the collection server for the WU (171.65.103.100) seems to have cleared up today, at least if I am reading the server status report correctly....
That said, I am still unable to upload the WU. So now the question is: 1) Am I reading the SS report correctly? and 2) If I am, why is the WU still not able to upload?
Code: Select all
02:18:08] Project: 3855 (Run 628, Clone 8, Gen 27)
[02:18:08] + Attempting to send results [March 13 02:18:08 UTC]
[02:18:08] Working on queue slot 09 [March 13 02:18:08 UTC]
[02:18:08] + Working ...
[02:18:09]
[02:18:09] *------------------------------*
[02:18:09] Folding@Home Gromacs 3.3 Core
[02:18:09] Version 1.93 (July 23, 2008)
[02:18:09]
[02:18:09] Preparing to commence simulation
[02:18:09] - Looking at optimizations...
[02:18:09] - Files status OK
[02:18:10] - Expanded 1162827 -> 6173133 (decompressed 530.8 percent)
[02:18:10]
[02:18:10] Project: 5113 (Run 98, Clone 18, Gen 24)
[02:18:10]
[02:18:11] Assembly optimizations on if available.
[02:18:11] Entering M.D.
[02:18:17] FAH Init
[02:18:17] Checkpoint file:
[02:18:22] (Starting from checkpoint)
[02:18:22] Read checkpoint
[02:18:22] Protein: Calmodulin in water
[02:18:22] Writing local files
[02:18:24] Completed 31867 out of 500000 steps (6 percent)
[02:18:25] Extra SSE boost OK.
[02:18:29] - Couldn't send HTTP request to server
[02:18:29] + Could not connect to Work Server (results)
[02:18:29] (128.59.74.4:8080)
[02:18:29] + Retrying using alternative port
[02:18:50] - Couldn't send HTTP request to server
[02:18:50] + Could not connect to Work Server (results)
[02:18:50] (128.59.74.4:80)
[02:18:50] - Error: Could not transmit unit 02 (completed February 24) to work server.
[02:18:50] + Attempting to send results [March 13 02:18:50 UTC]
[02:21:59] - Couldn't send HTTP request to server
[02:21:59] + Could not connect to Work Server (results)
[02:21:59] (171.65.103.100:8080)
[02:21:59] + Retrying using alternative port
[02:25:09] - Couldn't send HTTP request to server
[02:25:09] + Could not connect to Work Server (results)
[02:25:09] (171.65.103.100:80)
[02:25:09] Could not transmit unit 02 to Collection server; keeping in queue.
[02:42:12] Writing local files
[02:42:12] Completed 35000 out of 500000 steps (7 percent)
Re: WU Download Process When Queue Slot Filled
Posted: Fri Mar 13, 2009 1:02 pm
by ChelseaOilman
MtM wrote:He should still be able to upload since the collection server was at the time of my last post online and accepting.
The CS will only accept WUs it knows about. If the WS went down before sending the WU info to the CS, the CS won't accept the WU. If this is the case, nothing can be done until the WS is back online and accepting.
Re: WU Download Process When Queue Slot Filled
Posted: Tue Mar 17, 2009 7:23 pm
by MtM
ChelseaOilman wrote:MtM wrote:He should still be able to upload since the collection server was at the time of my last post online and accepting.
The CS will only accept WUs it knows about. If the WS went down before sending the WU info to the CS, the CS won't accept the WU. If this is the case, nothing can be done until the WS is back online and accepting.
But then the server would return 'unkown work unit' or 'the server has no record of this unit' and I didn't see that in his log, just 'unable to connect to ..'
Doesn't mean the wu would have gotten through ( sorry for the thread revival ) if it could have connected, would have depended on the case you just mentioned.