Page 4 of 4
Re: 128.143.48.226 : server reports problem with unit
Posted: Tue Oct 20, 2009 5:02 pm
by toTOW
Again in reject mode :
128.143.48.226 classic fold1 michael.shirts full Reject
Re: 128.143.48.226 : server reports problem with unit
Posted: Sat Oct 31, 2009 10:25 pm
by farmpuma
bruce wrote:What do you mean by "the duplicate time frame"?
Projects 3860-3864 have a preferred deadline of either 6 or 8 days and a final deadline of 60 days.
Please let us know the PRCG of each WU with a problem. Also, when was each WUs downloaded and when did each one start getting the error message.
I apologize for my delayed response. When it hits the fan around here it seems to arrive in bucket loads.
I assume there is a duplicate WU time frame which is the amount of time between the first and second download of a particular WU by the same machine. While this situation should normally be rare it is possible since there are legitimate duplicates to insure the speedy return of a complete data set. I imagine this time frame to be greater than an hour or so to prevent people from gaming the system by running the same WU over and over or by downloading the same WU back to back.
I believe I have solved my particular problem during the last run of these double gromacs by keeping a log of what WU each machine was running or had finished. Since I am still stuck with a half speed dial-up internet connection I often need to download another WU to keep the machine from finishing its current WU and then sitting idle during my personal sleep cycle. The logs revealed that the server gave me the exact same WU as the nearly finished WU which I had moved to a hold folder. This happened every time across five different machines.
I am a bit surprised that no one noticed or bothered to tell me I was returning duplicate WUs to this server with its new and "improved" security system. Perhaps a less cryptic error message is needed?
Re: 128.143.48.226 : server reports problem with unit
Posted: Sun Nov 01, 2009 7:38 pm
by bruce
OK, you bring up several issues.
First, it is not possible to "game the system by running the same WU over and over or by downloading the same WU back to back." If you download the same WU more than once, you'll only get credit for it once . . . so your assumption isn't correct.
Second, FAH is not designed to allow you to queue extra WUs as you are doing. In fact, it is specifically designed to PROHIBIT such activity whenever possible. Improving the speed with which you return a WU is important to FAH's progress, often more important than the total number of WUs that you complete. By downloading an extra WU which you do not process immediately is detrimental to the project.
Nevertheless, others do the same thing and I understand the reasons you choose to do it. Apparently you're manipulating the files in order to download a second WU. That rarely works correctly. When a client requests a new WU without returning the previous assignment, the server logic is designed to reassign the same WU to the same client, assuming that the previous download failed. This is NOT based on a time-out.
To accomplish what you're trying to so, install a second client and change the MachineID to a different value. Then run your clients with the -oneunit flag. This will stop either client when it finishes the current WU minimizing the time that you're hogging two WUs but will allow you to start a second client whenever the current WU is "nearly finished" The WU assigned to MachineID=1 will always be returned from MachineID=1 and the WU assigned to MachineID=2 will always be returned from MachineID=2. No moving of files will be required and no "hold folder" will be needed.
Do your best to minimize the time that both clients active on the same CPU.
(Of course, if you're running Linux, you should consider viewtopic.php?t=11615 )
Re: 128.143.48.226 : server reports problem with unit
Posted: Thu Nov 05, 2009 11:24 am
by farmpuma
Oops, another assumption bites the dust.
Yeah, when I put a nearly finished WU on hold I use a procedure similar to sneaker netting. Work folder, queue.dat, and unitinfo.txt into a hold folder. I still have no problems with any other server than this "new and improved" system. However, I am now on alert for further "upgrades."
Thank you for suggesting an alternate option for keeping my system from going idle during sleep and away times. Since I always run the console clients I will simply stop the nearly finished one at an appropriate point and start the other unique client until it is convenient to switch back. Thus returning the finished WUs in a timely fashion.
Thanks also for the Linux link as I plan to start learning to use the free OS with and without VMware.
Re: 128.143.48.226 : server reports problem with unit
Posted: Fri Nov 06, 2009 10:01 am
by bruce
farmpuma wrote:Thank you for suggesting an alternate option for keeping my system from going idle during sleep and away times. Since I always run the console clients I will simply stop the nearly finished one at an appropriate point and start the other unique client until it is convenient to switch back. Thus returning the finished WUs in a timely fashion.
Stopping the almost finished WU is not the way to get it returned in a timely fashion. That's why I suggested the -oneunit flag.
If you must download an extra WU, recognize that you're already slowing down one extra WU, no matter which one you're working on. You might as well let them fight over the CPU and process both of them at half speed. At least then when the almost finished one is completed, it will upload immediately (rather than waiting for you to remember to restart it) and it will shut itself down -- and the new WU will resume processing at it's natural rate.
For maximum efficiency (but contrary to the Pande Group's recommendations) check on current WU.
Is it projected to still be processing until you plan to check again?
* If so,
*** do nothing.
* If current WU is expected to finish soon,
*** Start the other client.
*** Wait while it downloads and starts a new WU.
*** Restart the almost finished WU with the -oneunit flag.
If two clients are still processing next time you check, your projection was wrong and you're impeding FAH's progress more than "just a little"
Re: 128.143.48.226 : server reports problem with unit
Posted: Wed Nov 11, 2009 12:12 am
by farmpuma
bruce wrote:it will upload immediately
I wish I could forget my dial-up connection as well. My clients always finish and then ask me to connect.
On a very positive note the use of two clients does allow me to keep the system crunching a WU while the finished WU uploads. With an average connection speed of about six minutes per MB this can be a considerable gain, particularly with uncompressed SMP uploads.