Page 1 of 2

Project 7809 (7, 192, 16) sudden slow down, "about" error

Posted: Mon Feb 04, 2013 6:06 am
by miranda822
I have been folding the above WU since January 30th with only a few breaks due to a storm in my area, and the TPF has been less than 2 hours at worst. I was expecting it to be done tonight or tomorrow at the pace it had been on. But somewhere along the line while I was at work, the TPF jumped to 5 hours, 38 minutes.

Also, the "About Project" pane text changed to this:
<!DOCTYPE HTML PUBLIC "=//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
root@localhost and inform them of the time the error occurred,
and anything you might have done that may have
aused the error.</p>
<p>More information about this error may be available
in the server log.</p>
<hr>
<address>Apache/2.0.52 (CentOS) Server at fah-web.stanford.edu Port 80</address>
</body></html>
I have tried quitting the program and rebooting, but the problem returned once I started the program again post-reboot. It's still progressing, but I'd rather it go back to its speedier pace from before.

What went wrong? Please advise.

Re: P7809 (7, 192, 16) sudden slow down, "about" error

Posted: Mon Feb 04, 2013 6:55 am
by art_l_j_PlanetAMD64
miranda822 wrote:I have been folding the above WU since January 30th with only a few breaks due to a storm in my area, and the TPF has been less than 2 hours at worst. I was expecting it to be done tonight or tomorrow at the pace it had been on. But somewhere along the line while I was at work, the TPF jumped to 5 hours, 38 minutes.

I have tried quitting the program and rebooting, but the problem returned once I started the program again post-reboot. It's still progressing, but I'd rather it go back to its speedier pace from before.

What went wrong? Please advise.
The FahCore_a4.exe process, which does the calculations for P7809, may be getting disrupted by some other process using up some CPU time. It only takes a few percent, to disrupt the SMP FahCore calculations, and slow them down.

To observe the %CPU Usage for each 'Image', in much finer detail than the 'Processes' tab in Task Manager, please do this:
  • Press Ctrl+Shift+Esc, or right-click on an empty area of the taskbar and left-click on 'Start Task Manager'.
  • Select the 'Performance' tab.
  • Click on 'Resource Monitor' (near the bottom left), and select the 'Overview' tab.
Now you can see the 'Average CPU %' usage for each Image. FahCore_a4.exe should be getting 98-100% of the CPU time. If it is less, you may need to use (for example) smp:6 instead of smp:8 on the SMP WUs, especially if the GPU slots are folding P807x WUs. Please see my tests that I ran here.

Re: P7809 (7, 192, 16) sudden slow down, "about" error

Posted: Mon Feb 04, 2013 6:58 am
by art_l_j_PlanetAMD64
Also, for the '500 Internal Server Error', please see this topic:
129.74.85.15 giving 500 server error for project description

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 7:26 am
by miranda822
Thanks for your help. I went through the steps, and it's not getting quite that much on average (mid 80s to early 90s), but it's using the majority of the CPU time. I had the SMP set to -1 by default, and the Resource Monitor says it's using 6 threads. I tried manually setting it to thus, but it wouldn't let me because the smp in the WU's description is smp:2.

I didn't know if the error was a related issue or not, let alone that there was a program upgrade available. Should I quit the program and upgrade, or will I lose my current WU?

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 7:46 am
by miranda822
UPDATE:

Now that I look at the progress bar again, I'm beginning to think that there was a glitch regarding the TPF/time left. It was at roughly 71% when I first noticed the problem, but now it's at 73.11% and the pace is up again. I had closed a few tabs in Firefox a few minutes ago, but they had been open prior to the problem.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 8:00 am
by art_l_j_PlanetAMD64
miranda822 wrote:Thanks for your help. I went through the steps, and it's not getting quite that much on average (mid 80s to early 90s), but it's using the majority of the CPU time. I had the SMP set to -1 by default, and the Resource Monitor says it's using 6 threads. I tried manually setting it to thus, but it wouldn't let me because the smp in the WU's description is smp:2.

I didn't know if the error was a related issue or not, let alone that there was a program upgrade available. Should I quit the program and upgrade, or will I lose my current WU?
OK, I would say:

1) Keep on folding, do not quit the program.

2) OK, mid 80s to early 90s is enough to severely slow down your SMP folding slot. So, set your SMP slot to use two (2) less than the actual number of cores in your CPU. The current number of cores being used shows up in your log file, when a new 0xa4 WU is started:

Code: Select all

07:18:14:WU01:FS01:0xa4:Project: 8028 (Run 2164, Clone 1, Gen 39)
07:18:14:WU01:FS01:0xa4:
07:18:14:WU01:FS01:0xa4:Assembly optimizations on if available.
07:18:14:WU01:FS01:0xa4:Entering M.D.
07:18:19:WU00:FS01:Upload complete
07:18:19:WU00:FS01:Server responded WORK_ACK (400)
07:18:19:WU00:FS01:Final credit estimate, 1311.00 points
07:18:19:WU00:FS01:Cleaning up
07:18:19:WU01:FS01:0xa4:Mapping NT from 4 to 4 
07:18:20:WU01:FS01:0xa4:Completed 0 out of 500000 steps  (0%)
07:20:52:WU01:FS01:0xa4:Completed 5000 out of 500000 steps  (1%)
The 'Mapping NT from 4 to 4' means 4 cores, so you would set your SMP slot to use 2 cores:
  • In FAHControl, you must be in either the 'Advanced' or 'Expert' mode, selected from the dropdown menu at the upper right
  • Click 'Pause' in FAHControl, and wait until all slots show Paused
  • Click on 'Configure', and select the 'Slots' tab
  • Click on the 'smp' slot to highlight it, then click on 'Edit'
  • In the 'SMP' part of the 'Configure folding slot' window, edit the 'CPUs' to be 2 cores
  • Click on 'OK', then click on 'Save'
  • Click on 'Fold', to get the folding started again
That should improve the folding performance for the SMP slot. Please let me know if this makes any improvement. Thanks!

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 8:04 am
by art_l_j_PlanetAMD64
miranda822 wrote:UPDATE:

Now that I look at the progress bar again, I'm beginning to think that there was a glitch regarding the TPF/time left. It was at roughly 71% when I first noticed the problem, but now it's at 73.11% and the pace is up again. I had closed a few tabs in Firefox a few minutes ago, but they had been open prior to the problem.
OK, but I would still say to try the change to the number of CPU cores being used. This should prevent the SMP folding slot from being slowed down due to other tasks (like Firefox tabs or GPU P807x folding). Please try it, it's always easy to change it back if the change does not result in any improvement.

EDIT:
If you use 2 cores from a 4-core CPU, then the 'target' CPU % in the 'Resource Monitor' display will be 50%, not 100%. It is whatever the ratio is:
( ( # of cores used ) / ( total number of cores ) ) * 100%
So using 6 cores out of 8, like I am in one computer, results in a target of 75% in the 'Resource Monitor' display.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 8:31 am
by art_l_j_PlanetAMD64
Also, an odd number of cores (5, 7) usually does not work, but 3 cores is usually OK. So, if 2 cores works OK for you, then using 3 cores should also work OK. Plus, smp:3 should (of course) reduce the TPF compared to using smp:2.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 8:54 am
by art_l_j_PlanetAMD64
miranda822 wrote:I had the SMP set to -1 by default, and the Resource Monitor says it's using 6 threads. I tried manually setting it to thus, but it wouldn't let me because the smp in the WU's description is smp:2.
The number of threads displayed by Resource Monitor for the FahCore_a4.exe process is greater than the actual number of CPU cores. Please see my post above, to determine the actual number of CPU cores.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 11:30 am
by art_l_j_PlanetAMD64
miranda822 wrote:UPDATE:

Now that I look at the progress bar again, I'm beginning to think that there was a glitch regarding the TPF/time left. It was at roughly 71% when I first noticed the problem, but now it's at 73.11% and the pace is up again. I had closed a few tabs in Firefox a few minutes ago, but they had been open prior to the problem.
Also, there are some WUs that will 'jump' between two different TPF values, please look here:
art_l_j_PlanetAMD64 wrote:I just got a P8049 WU on my #6 computer (AMD FX-8150 smp:8). I have watched it for about 10 minutes, and it will jump between these two values:
8049 (264, 14, 2), Estimated TPF 1:33, Estimated PPD 13227.48
8049 (264, 14, 2), Estimated TPF 2:15, Estimated PPD 8740.98

It will spend a minute or so at each value, then immediately jump to the other value.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 6:28 pm
by PantherX
art_l_j_PlanetAMD64 wrote:Also, an odd number of cores (5, 7) usually does not work, but 3 cores is usually OK. So, if 2 cores works OK for you, then using 3 cores should also work OK. Plus, smp:3 should (of course) reduce the TPF compared to using smp:2.
Odd/Prime numbers like 5 and 7 can normally be used for folding and rarely cause errors. The only time it does cause errors is when a very small project (one with very few atoms) is being folded. Other than that, 5 and 7 should be usable for the majority of projects. I recall having this issue on 2 projects so far over 4 years so it's safe to say that it is usable. Moreover, larger prime/odd numbers which are known to cause issues are handled automatically by the FahCore by rounding down to the lower good number.

miranda822 -> Can you please paste the initial section of your log which contains the system configuration and your F@H configuration as described here (viewtopic.php?f=61&t=16206) so we can better help you. Furthermore, I have marked Project: 7809 (Run 7, Clone 192, Gen 16) for a follow-up in case it is a bad WU.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 7:45 pm
by art_l_j_PlanetAMD64
PantherX wrote:Odd/Prime numbers like 5 and 7 can normally be used for folding and rarely cause errors. The only time it does cause errors is when a very small project (one with very few atoms) is being folded. Other than that, 5 and 7 should be usable for the majority of projects.
OK, I thought I had read somewhere that odd numbers higher than 3 were problematic, but it's good if that is no longer true.

Is this no longer true?
Re: Radeon 7950 not folding. No WUs or problem?
Joe_H wrote:One other issue to go over is that the folding core for ATI GPU's uses up to a full core of your CPU to move data in and out of the GPU. So if you do keep folding on your 7950, you should adjust the SMP setting for that slot. Change the SMP setting from -1, the default which uses all cores available, to 3 or 2. 3 is one exception to the message in FAHControl of setting the number to an even number.

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 7:54 pm
by Jesse_V
art_l_j_PlanetAMD64 wrote:
PantherX wrote:Odd/Prime numbers like 5 and 7 can normally be used for folding and rarely cause errors. The only time it does cause errors is when a very small project (one with very few atoms) is being folded. Other than that, 5 and 7 should be usable for the majority of projects.
OK, I thought I had read somewhere that odd numbers higher than 3 were problematic, but it's good if that is no longer true.
See:
http://folding.stanford.edu/English/FAQ-SMP#ntoc5
viewtopic.php?f=16&t=23060&p=230790#p230790
viewtopic.php?f=19&t=21920&p=218801#p218801

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 8:14 pm
by art_l_j_PlanetAMD64
Thanks, Jesse, for those links. From the information in those links, the answer for 5 is "should usually be OK but not 100% guaranteed", and for 7 it is "there are quite a few problems reported, so this is not recommended".

Am I reading that correctly, or did I misunderstand what I read there?

Re: Project 7809 (7, 192, 16) sudden slow down, "about" erro

Posted: Mon Feb 04, 2013 8:44 pm
by bruce
The original definition of poor choices of SMP values was to avoid "large primes" but "large" was never defined. A lot of WUs have been folded since then and the Pande Group has gradually established what works and what does not work. As PantherX has said, the FahCores will no longer let you run 31 cores on your 32-core bigadv machine or 23 on your 24-core machine. [Most certainly "large" primes.] They will let you run 3, 5, or 7 but they should prevent the assignment of proteins with very few atoms to those machines. In other words, the Pande Group has apparently established suitable criteria that can minimize those problems. It's still a statistical thing, though based on the shape of those "few atoms" in 1/7th of the protein. If you could actually run with 11 of your 12 cores, you would have a higher failure rate than somebody with 7 but you would still have quite a few successes, compared to 23 or 31.

The Pande Group doesn't want failed WUs either. We do still see some "bad WUs" but the frequency is going down. Unfortunately a WU can error-out due to the number of cores together with the number of atoms, to overclocking, to hardware faults, and to several other They continue to collect data, so if they need to make additional assignment tweaks, I'm sure they will.