128.143.48.226 classic fold1 michael.shirts full Reject
128.143.48.226 : server reports problem with unit
Moderators: Site Moderators, FAHC Science Team
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: 128.143.48.226 : server reports problem with unit
Again in reject mode :
Re: 128.143.48.226 : server reports problem with unit
I apologize for my delayed response. When it hits the fan around here it seems to arrive in bucket loads.bruce wrote:What do you mean by "the duplicate time frame"?
Projects 3860-3864 have a preferred deadline of either 6 or 8 days and a final deadline of 60 days.
Please let us know the PRCG of each WU with a problem. Also, when was each WUs downloaded and when did each one start getting the error message.
I assume there is a duplicate WU time frame which is the amount of time between the first and second download of a particular WU by the same machine. While this situation should normally be rare it is possible since there are legitimate duplicates to insure the speedy return of a complete data set. I imagine this time frame to be greater than an hour or so to prevent people from gaming the system by running the same WU over and over or by downloading the same WU back to back.
I believe I have solved my particular problem during the last run of these double gromacs by keeping a log of what WU each machine was running or had finished. Since I am still stuck with a half speed dial-up internet connection I often need to download another WU to keep the machine from finishing its current WU and then sitting idle during my personal sleep cycle. The logs revealed that the server gave me the exact same WU as the nearly finished WU which I had moved to a hold folder. This happened every time across five different machines.
I am a bit surprised that no one noticed or bothered to tell me I was returning duplicate WUs to this server with its new and "improved" security system. Perhaps a less cryptic error message is needed?
I'm the same farmpuma from years gone by, but it appears my account went away when the passwords changed to six characters minimum.
Re: 128.143.48.226 : server reports problem with unit
OK, you bring up several issues.
First, it is not possible to "game the system by running the same WU over and over or by downloading the same WU back to back." If you download the same WU more than once, you'll only get credit for it once . . . so your assumption isn't correct.
Second, FAH is not designed to allow you to queue extra WUs as you are doing. In fact, it is specifically designed to PROHIBIT such activity whenever possible. Improving the speed with which you return a WU is important to FAH's progress, often more important than the total number of WUs that you complete. By downloading an extra WU which you do not process immediately is detrimental to the project.
Nevertheless, others do the same thing and I understand the reasons you choose to do it. Apparently you're manipulating the files in order to download a second WU. That rarely works correctly. When a client requests a new WU without returning the previous assignment, the server logic is designed to reassign the same WU to the same client, assuming that the previous download failed. This is NOT based on a time-out.
To accomplish what you're trying to so, install a second client and change the MachineID to a different value. Then run your clients with the -oneunit flag. This will stop either client when it finishes the current WU minimizing the time that you're hogging two WUs but will allow you to start a second client whenever the current WU is "nearly finished" The WU assigned to MachineID=1 will always be returned from MachineID=1 and the WU assigned to MachineID=2 will always be returned from MachineID=2. No moving of files will be required and no "hold folder" will be needed.
Do your best to minimize the time that both clients active on the same CPU.
(Of course, if you're running Linux, you should consider viewtopic.php?t=11615 )
First, it is not possible to "game the system by running the same WU over and over or by downloading the same WU back to back." If you download the same WU more than once, you'll only get credit for it once . . . so your assumption isn't correct.
Second, FAH is not designed to allow you to queue extra WUs as you are doing. In fact, it is specifically designed to PROHIBIT such activity whenever possible. Improving the speed with which you return a WU is important to FAH's progress, often more important than the total number of WUs that you complete. By downloading an extra WU which you do not process immediately is detrimental to the project.
Nevertheless, others do the same thing and I understand the reasons you choose to do it. Apparently you're manipulating the files in order to download a second WU. That rarely works correctly. When a client requests a new WU without returning the previous assignment, the server logic is designed to reassign the same WU to the same client, assuming that the previous download failed. This is NOT based on a time-out.
To accomplish what you're trying to so, install a second client and change the MachineID to a different value. Then run your clients with the -oneunit flag. This will stop either client when it finishes the current WU minimizing the time that you're hogging two WUs but will allow you to start a second client whenever the current WU is "nearly finished" The WU assigned to MachineID=1 will always be returned from MachineID=1 and the WU assigned to MachineID=2 will always be returned from MachineID=2. No moving of files will be required and no "hold folder" will be needed.
Do your best to minimize the time that both clients active on the same CPU.
(Of course, if you're running Linux, you should consider viewtopic.php?t=11615 )
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: 128.143.48.226 : server reports problem with unit
Oops, another assumption bites the dust.
Yeah, when I put a nearly finished WU on hold I use a procedure similar to sneaker netting. Work folder, queue.dat, and unitinfo.txt into a hold folder. I still have no problems with any other server than this "new and improved" system. However, I am now on alert for further "upgrades."
Thank you for suggesting an alternate option for keeping my system from going idle during sleep and away times. Since I always run the console clients I will simply stop the nearly finished one at an appropriate point and start the other unique client until it is convenient to switch back. Thus returning the finished WUs in a timely fashion.
Thanks also for the Linux link as I plan to start learning to use the free OS with and without VMware.
Yeah, when I put a nearly finished WU on hold I use a procedure similar to sneaker netting. Work folder, queue.dat, and unitinfo.txt into a hold folder. I still have no problems with any other server than this "new and improved" system. However, I am now on alert for further "upgrades."
Thank you for suggesting an alternate option for keeping my system from going idle during sleep and away times. Since I always run the console clients I will simply stop the nearly finished one at an appropriate point and start the other unique client until it is convenient to switch back. Thus returning the finished WUs in a timely fashion.
Thanks also for the Linux link as I plan to start learning to use the free OS with and without VMware.
I'm the same farmpuma from years gone by, but it appears my account went away when the passwords changed to six characters minimum.
Re: 128.143.48.226 : server reports problem with unit
Stopping the almost finished WU is not the way to get it returned in a timely fashion. That's why I suggested the -oneunit flag.farmpuma wrote:Thank you for suggesting an alternate option for keeping my system from going idle during sleep and away times. Since I always run the console clients I will simply stop the nearly finished one at an appropriate point and start the other unique client until it is convenient to switch back. Thus returning the finished WUs in a timely fashion.
If you must download an extra WU, recognize that you're already slowing down one extra WU, no matter which one you're working on. You might as well let them fight over the CPU and process both of them at half speed. At least then when the almost finished one is completed, it will upload immediately (rather than waiting for you to remember to restart it) and it will shut itself down -- and the new WU will resume processing at it's natural rate.
For maximum efficiency (but contrary to the Pande Group's recommendations) check on current WU.
Is it projected to still be processing until you plan to check again?
* If so,
*** do nothing.
* If current WU is expected to finish soon,
*** Start the other client.
*** Wait while it downloads and starts a new WU.
*** Restart the almost finished WU with the -oneunit flag.
If two clients are still processing next time you check, your projection was wrong and you're impeding FAH's progress more than "just a little"
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: 128.143.48.226 : server reports problem with unit
I wish I could forget my dial-up connection as well. My clients always finish and then ask me to connect.bruce wrote:it will upload immediately
On a very positive note the use of two clients does allow me to keep the system crunching a WU while the finished WU uploads. With an average connection speed of about six minutes per MB this can be a considerable gain, particularly with uncompressed SMP uploads.
I'm the same farmpuma from years gone by, but it appears my account went away when the passwords changed to six characters minimum.