Page 2 of 3
Re: R/C/G
Posted: Sat May 03, 2008 10:56 pm
by bruce
John Naylor wrote:7im wrote:And the work Stanford did over the last year on the PS3 client has added 30,000 active clients, each capable of doing 2 WUs a day, at 20x the speed of PCs. How many more WUs does that add, vs. how many were not folded by upload/download delays?
Your point about PS3's is entirely valid -- but it is for each client individually. The PS3 servers are for PS3-only, so congestion there doesn't interact with the PC clients. The PS3 congestion problem from a couple months back was not the same issue that you're talking about, either.
Have you read the posts in our PS3 forum recently? Many PS3 WUS have been worth 100 to 300 points. Recently some WUs worth 1250 points have been released which take 5x to 10x as long to run. In a sense, thats the same thing as the original request at the top of this thread. Download a single WU and work on it for a longer period of time.
How long a WU should run is a value that needs to be kept in balance. Too short, and the servers get busy for no reason; too long, and the points don't update very often and if a WU crashes, you can loose a lot. Then, too, different proteins have different requirements, so the same solution doesn't work for everything.
Re: R/C/G
Posted: Mon May 05, 2008 8:09 pm
by Soriak
Maybe it'd be helpful for people with slower upload connections if the UL/DL order was changed?
Changing:
WU finished
Upload
Download
Start Processing
To:
WU finished
Download
Start Processing
Upload
I don't know what would be involved in doing this though... but I think it happens already if the server for the finished WU can't be reached.
Re: R/C/G
Posted: Mon May 05, 2008 8:37 pm
by bruce
Soriak wrote:I don't know what would be involved in doing this though... but I think it happens already if the server for the finished WU can't be reached.
Yes, it does, but it sill tries the primary server first. It's more like this:
WU finished
Upload
Download
Start Processing and Upload again if work is still on disk.
Wait, then upload again if work is still on disk.
The first couple of upload attempts go only to the primary Work Server. If it still doesn't upload, then it starts trying both the Work Server and the Collection Server.
There was a long discussion on the old forum about changing the order, but nobody ever suggested it should depend on connection speed.
How about the following suggestion:
WU finished
If (Upload_size/upload_speed < xx), Upload
Download
Start Processing and Upload again if work is still on disk.
Wait, then upload again if work is still on disk.
where xx is a fixed time. What should xx be?
Re: R/C/G
Posted: Mon May 05, 2008 8:47 pm
by 7im
It's a good suggestion, but a very old one. If Pande Group thought this was a productive fix, they would have added this to the client back in the days of dial-up in the v3 client, where the wait times were much longer. Not sure why this wasn't worked in to a new client as v4 and v5 came out, but this is my guess...
As I posted on the previous page, a lot of people are upgrading their connection speeds all the time, and over time this problem slowly goes away without PG needing to spend any development time on it. And those still on a slow connection can either request small WUs, or deal with the longer turn around times, or upgrade to broadband. There are always choices.
Re: R/C/G
Posted: Mon May 05, 2008 9:25 pm
by bruce
7im wrote:...those still on a slow connection can either request small WUs, or deal with the longer turn around times, or upgrade to broadband. There are always choices.
When I take my laptop on the road, I have to switch it from SMP to Uniprocessor clients with the "small" setting, and it still irks me that the results to be uploaded take a long time when it could (should) be folding.
Re: R/C/G
Posted: Mon May 05, 2008 9:40 pm
by 7im
Set the CPU client to prompt for a connection. When it does prompt, exit the client, and fire up a second copy of a CPU client. Once the second CPU client is folding, start the first one with the -oneunit flag. Rinse, Repeat.
Same results.
Re: R/C/G
Posted: Mon May 05, 2008 9:47 pm
by Soriak
bruce wrote:
There was a long discussion on the old forum about changing the order, but nobody ever suggested it should depend on connection speed.
Oh I didn't mean to suggest that either, that'd take more effort than changing it for everyone. I just meant to say that this would be to the benefit of those with slower connections. Though "slow" is relative - if you have to upload a 45mb result, it can take a while even on broadband.
7im wrote:It's a good suggestion, but a very old one. If Pande Group thought this was a productive fix, they would have added this to the client back in the days of dial-up in the v3 client, where the wait times were much longer. Not sure why this wasn't worked in to a new client as v4 and v5 came out, but this is my guess...
Good point. At first I thought the client might not be able to queue up an additional WU, but that doesn't seem to be the issue as it does so already if the server can't be reached. Maybe it's just more work than I thought. The improvement considered as a percentage of processed WUs, has to be insignificantly tiny... but who knows what happens to the WU size (or broadband connections) in the future.
Re: R/C/G
Posted: Mon May 05, 2008 9:59 pm
by RAH
Well it still in discussion. The line I was thinking on:
At a given time there are around 20000 SMP work units to start.
I am sure there are over 20000 SMP folders. Maybe not. But the need/ability of the servers to
keep them updated has been suspect of late. Knowing that these things can happen.
Now, if the WUs are sent out, and as long as a 100 frame return is recieved in the time deadline, the server
just waits. The next generation, is being done by the same machine. If a generation, is not returned in the
time deadline, the server puts out a new WU, starting at the place the first one left off. That machine must
now get a new WU starting a gen 0. Or pick up a non-returned one.
This cannot waste time, since the next gen cannot be sent out, until the last one is done.
If you have ever looked at the files in the work folder, you can see the information that is applied to that certain wu.
The work files returned, should have the information for the next gen in line. This should be a rather simple program
to tranfer this info to the next wu.
WU starts - P=A R=B C=D G=E Temp=F Trag-a=G Trag-b=H is returned P=A R=B C=D G=E Temp=F Trag-a=I Trag-b=J
Program takes info and P=A R=B C=D G=E+1 Temp=F Trag-a=G Trag-b=H you now have Gen 1
I know its not that easy, but this part of the folding is most likely the easiest of them all.
After all, its just an idea.
Re: R/C/G
Posted: Mon May 05, 2008 10:07 pm
by bruce
Soriak wrote:The improvement considered as a percentage of processed WUs, has to be insignificantly tiny...
I think that's the key issue here. The Pande Group doesn't spend time on changes that produce an insignificantly tiny improvement in total throughput -- even if it's an excellent idea.
Re: R/C/G
Posted: Mon May 05, 2008 10:58 pm
by 7im
RAH wrote:Well it still in discussion. The line I was thinking on:
At a given time there are around 20000 SMP work units to start.
I am sure there are over 20000 SMP folders. Maybe not. But the need/ability of the servers to
keep them updated has been suspect of late. Knowing that these things can happen.
Now, if the WUs are sent out, and as long as a 100 frame return is recieved in the time deadline, the server
just waits. The next generation, is being done by the same machine. If a generation, is not returned in the
time deadline, the server puts out a new WU, starting at the place the first one left off. That machine must
now get a new WU starting a gen 0. Or pick up a non-returned one.
This cannot waste time, since the next gen cannot be sent out, until the last one is done.
If you have ever looked at the files in the work folder, you can see the information that is applied to that certain wu.
The work files returned, should have the information for the next gen in line. This should be a rather simple program
to tranfer this info to the next wu.
WU starts - P=A R=B C=D G=E Temp=F Trag-a=G Trag-b=H is returned P=A R=B C=D G=E Temp=F Trag-a=I Trag-b=J
Program takes info and P=A R=B C=D G=E+1 Temp=F Trag-a=G Trag-b=H you now have Gen 1
I know its not that easy, but this part of the folding is most likely the easiest of them all.
After all, its just an idea.
Let's assume this part to be trued for the sake of discussion. Let's assume the next gen of a WU could be done at the client level. How do you overcome all of the other issues that Bruce and I have posted over the last 2-3 pages?
Re: R/C/G
Posted: Mon May 05, 2008 11:52 pm
by RAH
What makes you so sure that there will be those issues?
And one of its main returns, is if the servers get locked up (but that doesn't happen often) the finished wu's are held in
queue, and the computer, video card keeps crunching.
All issues are big IFS.
And they can only be confirmed by PG.
Wether they do something like this, or not. I think if it is possible, it will be a big help.
And the one thing you and bruce have said, if its a lot of work to do, then they should not do it. I AGREE!!
Re: R/C/G
Posted: Tue May 06, 2008 12:09 am
by bruce
I said it before and I'll say it again. If you want a WU that is 10x as long as the present ones, that's easy to do. It also reduces the server overhead by skipping nine of the ten upload requests. There's a range around the optimum size of a WU that's still reasonable, depending on many factors. I have not (yet) heard any gripes about the BigWUs being TOO big, probably because people can easily turn them off.
Re: R/C/G
Posted: Tue May 06, 2008 12:41 am
by 7im
RAH wrote:What makes you so sure that there will be those issues?
6 years of empirical observations while running various fah clients, plus 2 years of Modding this forum.
I was willing to give a little, and assume your first part was true. Please be so kind as to repay my assumption and assume there will be problems (and there are ALWAYS problems making a change to a system this complex), and suggest a solution or two. Thanks.
Re: R/C/G
Posted: Tue May 06, 2008 12:57 am
by RAH
OK. I guess the changes done to the servers, clients, are what are causing all the problems now.
OK! No changes.
Re: R/C/G
Posted: Tue May 06, 2008 1:00 am
by Tobit
Rule #1 of the client/server world - The servers in a client/server project can never be fast enough or live on a backbone with enough throughput.