Project 7600 (19,0,53)
Moderators: Site Moderators, FAHC Science Team
Project 7600 (19,0,53)
I received a rather low score for this unit. While it did complete successfully, the TPF averaged 35 minutes. It took over two and a half days to finish, and netted me an underwhelming 2162 points.
My computer is an i7 2600k overclocked to 4.4 Ghz, 4 GB of 1333 ram, Hyperthreading is enabled and the computer was not running any other programs, nor was it folding on the GPU. All 8 threads were dedicated to this WU, and were at 100% load throughout. My passkey is correct and I have folded many, many WU's before that so I should be receiving a bonus, and judging by the log I did in fact get a bonus.
All signs point to the WU... the TPF, the large deadlines... I should have received more points for two and a half days worth of folding than that.
My computer is an i7 2600k overclocked to 4.4 Ghz, 4 GB of 1333 ram, Hyperthreading is enabled and the computer was not running any other programs, nor was it folding on the GPU. All 8 threads were dedicated to this WU, and were at 100% load throughout. My passkey is correct and I have folded many, many WU's before that so I should be receiving a bonus, and judging by the log I did in fact get a bonus.
All signs point to the WU... the TPF, the large deadlines... I should have received more points for two and a half days worth of folding than that.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Project 7600 (19,0,53)
There are a number of reports of occasional WU's from the SMP projects 76xx that have unusually long processing times. Currently those are 7600, 7610 and 7611. The project leader has posted that from reports on such WU's in the 7611 project they were able to identify a problem group and remove them processing. They are looking into the issue to see if they can identify a cause for the small percentage that take an abnormally long time to process. There are a number of threads posted here on this, the project leader posted here, viewtopic.php?f=19&t=20976#p210709.
As an example of a more normal processing time for a Project 7600 WU, I get TPF figures of 6:30-7:00 minutes on my iMac that has an i7 860. So the base points and other settings for these units are okay for the vast majority that are sent out, it would very difficult to set them differently for the small number of exceptions. Hopefully you won't get anymore, do report it if you do.
As an example of a more normal processing time for a Project 7600 WU, I get TPF figures of 6:30-7:00 minutes on my iMac that has an i7 860. So the base points and other settings for these units are okay for the vast majority that are sent out, it would very difficult to set them differently for the small number of exceptions. Hopefully you won't get anymore, do report it if you do.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: Project 7600 (19,0,53)
You probably want to change "difficult" to "impossible"Joe_H wrote:. . . the base points and other settings for these units are okay for the vast majority that are sent out, it would very difficult to set them differently for the small number of exceptions. Hopefully you won't get anymore, do report it if you do.
If some WUs have been improperly generated, and they can detect it, they'll fix that problem. If it happens randomly, then it's impossible to fix because they'd have to run the WU before they could assign points to it -- and then they wouldn't need for you to process it at all.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Project 7600 (19,0,53)
In an attempt to remain as objective as possible, would there be a way that instead of attempting to weed out each bad WU, to instead include a point bonus for those that work through the long-winded units? I don't mind folding 24/7, and my motto has always been the cure before the accolades, but a little over 2000 points for 2.5 days of folding is a bit underwhelming. Perhaps a system where people could send a validation through their log file to receive additional credit for a completed unit?
-
- Posts: 523
- Joined: Fri Mar 23, 2012 5:16 pm
Re: Project 7600 (19,0,53)
I had one just like that, and it was also a 7600. Seems to be quite common.
That would be too complicated I think, and not very practical.Perhaps a system where people could send a validation through their log file to receive additional credit for a completed unit?
Re: Project 7600 (19,0,53)
Hi FTBIG,
Thanks for the report - I'm sorry that you got stuck with a very large WU. Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before. Unfortunately, we can't detect and stop those WUs once they're out, but can only see stuff is wrong when they come back to Stanford. I have been (automatically) shutting them down when I do see them come back, so at least we limit them as much as possible.
Hopefully the next WU you get is normal! Let us know if otherwise.
TJ
Thanks for the report - I'm sorry that you got stuck with a very large WU. Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before. Unfortunately, we can't detect and stop those WUs once they're out, but can only see stuff is wrong when they come back to Stanford. I have been (automatically) shutting them down when I do see them come back, so at least we limit them as much as possible.
Hopefully the next WU you get is normal! Let us know if otherwise.
TJ
Re: Project 7600 (19,0,53)
Well, after reading up on some of the other 76XX series WU's it does seem to be a small minority of people. I don't think it would be too complicated given how rarely these WU's come up.iceman1992 wrote:I had one just like that, and it was also a 7600. Seems to be quite common.That would be too complicated I think, and not very practical.Perhaps a system where people could send a validation through their log file to receive additional credit for a completed unit?
Thanks TJ. I'll know next time to just ditch the bad WU. The only reason why I kept on with it was to see if there would be a bonus and, of course, the spirit of folding. I did get a regular unit this go around, a 7809 with a TPF of 7 min 10 seconds, looking a bit more healthy thereHi FTBIG,
Thanks for the report - I'm sorry that you got stuck with a very large WU. Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before. Unfortunately, we can't detect and stop those WUs once they're out, but can only see stuff is wrong when they come back to Stanford. I have been (automatically) shutting them down when I do see them come back, so at least we limit them as much as possible.
Hopefully the next WU you get is normal! Let us know if otherwise.
TJ
-
- Posts: 523
- Joined: Fri Mar 23, 2012 5:16 pm
Re: Project 7600 (19,0,53)
Ah yes but if you read your own sentence from another perspective, why should they give additional complexity to the system for rare WUs? Doesn't really seem worthwhile, since it's so rare, why bother?FTBIG wrote:Well, after reading up on some of the other 76XX series WU's it does seem to be a small minority of people. I don't think it would be too complicated given how rarely these WU's come up.
Re: Project 7600 (19,0,53)
What do we know about the "something" that can go wrong?tjlane wrote:Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before.
As with all mathematical simulations, the equations used contain approximations and have imitations. Though it might not be a real situation for Gromacs, perhaps an equation is only valid when each pair of atoms is farther than each other by some very small number and that (almost) never happens, and when it does, it almost always creates an identifiable error. It would be very difficult to predict that sort of situation.
Do we know if all A4 semi-failures that continue to run are caused by the equations or the data or is it also possible that the same condition can be caused by a hardware error, including things such as overclocking or overheating? If that's also a possibility, FAH most certainly would not want to award any kind of bonus for conditions that are within the control of the computer owner.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Project 7600 (19,0,53)
Or from another, that I folded for 2.5 days and only received 2162 points. What's done is done, and while I know in the future to dump the problematic WU, others might not feel as compassionate to the cause.iceman1992 wrote:Ah yes but if you read your own sentence from another perspective, why should they give additional complexity to the system for rare WUs? Doesn't really seem worthwhile, since it's so rare, why bother?FTBIG wrote:Well, after reading up on some of the other 76XX series WU's it does seem to be a small minority of people. I don't think it would be too complicated given how rarely these WU's come up.
Judging from this WU, and the others that have run into other flawed 7600's, their TPF has varied depending on the CPU and overclock. A stock i7 920 has seen 80+ minute TPF's, where other overclocked systems have been anywhere from 30-60. I find it hard to believe that, unless the 7600 does not like HyperThreading, that there is a hardware issue for this WU. It is not to say that ALL A4 WU's have issues, nor do all the 76XX series.Do we know if all A4 semi-failures that continue to run are caused by the equations or the data or is it also possible that the same condition can be caused by a hardware error, including things such as overclocking or overheating? If that's also a possibility, FAH most certainly would not want to award any kind of bonus for conditions that are within the control of the computer owner.
Re: Project 7600 (19,0,53)
Let me put the "FAH does not like HyperThreading" question in perspective.FTBIG wrote:I find it hard to believe that, unless the 7600 does not like HyperThreading, that there is a hardware issue for this WU. It is not to say that ALL A4 WU's have issues, nor do all the 76XX series.
As a general rule, Stanford's FahCore treat a pair of HyperThreaded processors as about 1.2 real processors. In other words, a dedicated i7 with 4 real cores/8 threads is worth about 20% more than an equivalent i5 without HyperThreading.
Those numbers are relatively old esimates from Core_78 (uniprocessor) and FahCore_a3 (smp) projects but I'm pretty confident that they also apply to FahCore_a4 and FahCore_a5 projects without an active GPU project.(Adding a GPU complicates it, depending on whether it's an NV GPU or an AMD GPU and which model you have.)
The biggest thing that a SMP project like 78xx does not like is when it does not have dedicated access to the cores that you have given it when it starts processing. With even a moderate amount of resource conflict from some other tasks (if it's a long-term interruption) you'll find that -smp 6 can outperform -smp 8 even though Windows will tell you that 24% of your CPUs are idle . . . and in real terms, you're "wasting" as much as 10% of your system by not using those two extra virtual cores.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 523
- Joined: Fri Mar 23, 2012 5:16 pm
Re: Project 7600 (19,0,53)
I understand your feeling, I myself folded for around 50 hours and received about 1500 or so points. Yes we both fold for the cause, others may fold only for the points, I know. They base the bonus points in reference to a benchmark system I think, so how would they know whether you folded for 2.5 days because of a troublesome WU or of slow hardware?FTBIG wrote:Or from another, that I folded for 2.5 days and only received 2162 points. What's done is done, and while I know in the future to dump the problematic WU, others might not feel as compassionate to the cause.
Re: Project 7600 (19,0,53)
Excluding the QRB, if the benchmark hardware takes 5 days, the WU gets 5x a many points as if the benchmark system takes 1 day. If your system is twice as fast as the benchmark system, it can finish them in 2.5 days or 12 hours and it will earn twice the PPD.
If your system isn't exactly twice as fast ... say it has a slightly faster CPU but slower RAM or smaller cache ... some WUs will be a bit faster than 2x as fast and others will be a little slower than 2x as fast.
They have no idea whether you folded in 2.5 days because you have slow hardware or because it was a troublesome WU or because you turned off the machine for a while or your home had a power failure and the machine wasn't folding the whole time. It's a bit like the classic "the dog ate my homework" story kids try to use on their teacher. You get credit for what you turn in and whether it was late or not, not what you how easy or hard you actually worked.
If your system isn't exactly twice as fast ... say it has a slightly faster CPU but slower RAM or smaller cache ... some WUs will be a bit faster than 2x as fast and others will be a little slower than 2x as fast.
They have no idea whether you folded in 2.5 days because you have slow hardware or because it was a troublesome WU or because you turned off the machine for a while or your home had a power failure and the machine wasn't folding the whole time. It's a bit like the classic "the dog ate my homework" story kids try to use on their teacher. You get credit for what you turn in and whether it was late or not, not what you how easy or hard you actually worked.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Project 7600 (19,0,53)
I'm running project 7610 (105,0,85) and the TPF is 1 hr 16 min. ETA is now 3.56 days and I'm at 33% complete. I guess I got a bad WU? Normally these SMP ones go pretty fast.
Re: Project 7600 (19,0,53)
I have just suffered through two of these units p7611 (4,38,128) and 7610 (142,0,96)
my rig is an i7-2600k @4.5 with 8 GB 1600Mhz Ram Win7 running the V7 client (latest release)
both were ok tpf wise (around 5 mins) but the ppd!! down at 12-14k. very very bad!
I don't mind running these, but the ppd needs to be fixed.
my rig is an i7-2600k @4.5 with 8 GB 1600Mhz Ram Win7 running the V7 client (latest release)
both were ok tpf wise (around 5 mins) but the ppd!! down at 12-14k. very very bad!
I don't mind running these, but the ppd needs to be fixed.