Project 7600 (19,0,53)

Moderators: Site Moderators, FAHC Science Team

FTBIG
Posts: 4
Joined: Tue Apr 24, 2012 2:42 am

Project 7600 (19,0,53)

Post by FTBIG »

I received a rather low score for this unit. While it did complete successfully, the TPF averaged 35 minutes. It took over two and a half days to finish, and netted me an underwhelming 2162 points.

My computer is an i7 2600k overclocked to 4.4 Ghz, 4 GB of 1333 ram, Hyperthreading is enabled and the computer was not running any other programs, nor was it folding on the GPU. All 8 threads were dedicated to this WU, and were at 100% load throughout. My passkey is correct and I have folded many, many WU's before that so I should be receiving a bonus, and judging by the log I did in fact get a bonus.

All signs point to the WU... the TPF, the large deadlines... I should have received more points for two and a half days worth of folding than that.
Joe_H
Site Admin
Posts: 7927
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 7600 (19,0,53)

Post by Joe_H »

There are a number of reports of occasional WU's from the SMP projects 76xx that have unusually long processing times. Currently those are 7600, 7610 and 7611. The project leader has posted that from reports on such WU's in the 7611 project they were able to identify a problem group and remove them processing. They are looking into the issue to see if they can identify a cause for the small percentage that take an abnormally long time to process. There are a number of threads posted here on this, the project leader posted here, viewtopic.php?f=19&t=20976#p210709.

As an example of a more normal processing time for a Project 7600 WU, I get TPF figures of 6:30-7:00 minutes on my iMac that has an i7 860. So the base points and other settings for these units are okay for the vast majority that are sent out, it would very difficult to set them differently for the small number of exceptions. Hopefully you won't get anymore, do report it if you do.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 7600 (19,0,53)

Post by bruce »

Joe_H wrote:. . . the base points and other settings for these units are okay for the vast majority that are sent out, it would very difficult to set them differently for the small number of exceptions. Hopefully you won't get anymore, do report it if you do.
You probably want to change "difficult" to "impossible"

If some WUs have been improperly generated, and they can detect it, they'll fix that problem. If it happens randomly, then it's impossible to fix because they'd have to run the WU before they could assign points to it -- and then they wouldn't need for you to process it at all.
FTBIG
Posts: 4
Joined: Tue Apr 24, 2012 2:42 am

Re: Project 7600 (19,0,53)

Post by FTBIG »

In an attempt to remain as objective as possible, would there be a way that instead of attempting to weed out each bad WU, to instead include a point bonus for those that work through the long-winded units? I don't mind folding 24/7, and my motto has always been the cure before the accolades, but a little over 2000 points for 2.5 days of folding is a bit underwhelming. Perhaps a system where people could send a validation through their log file to receive additional credit for a completed unit?
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Project 7600 (19,0,53)

Post by iceman1992 »

I had one just like that, and it was also a 7600. Seems to be quite common.
Perhaps a system where people could send a validation through their log file to receive additional credit for a completed unit?
That would be too complicated I think, and not very practical.
tjlane
Pande Group Member
Posts: 161
Joined: Wed Jun 01, 2011 11:19 pm
Location: Stanford, CA

Re: Project 7600 (19,0,53)

Post by tjlane »

Hi FTBIG,

Thanks for the report - I'm sorry that you got stuck with a very large WU. Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before. Unfortunately, we can't detect and stop those WUs once they're out, but can only see stuff is wrong when they come back to Stanford. I have been (automatically) shutting them down when I do see them come back, so at least we limit them as much as possible.

Hopefully the next WU you get is normal! Let us know if otherwise.

TJ
FTBIG
Posts: 4
Joined: Tue Apr 24, 2012 2:42 am

Re: Project 7600 (19,0,53)

Post by FTBIG »

iceman1992 wrote:I had one just like that, and it was also a 7600. Seems to be quite common.
Perhaps a system where people could send a validation through their log file to receive additional credit for a completed unit?
That would be too complicated I think, and not very practical.
Well, after reading up on some of the other 76XX series WU's it does seem to be a small minority of people. I don't think it would be too complicated given how rarely these WU's come up.
Hi FTBIG,

Thanks for the report - I'm sorry that you got stuck with a very large WU. Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before. Unfortunately, we can't detect and stop those WUs once they're out, but can only see stuff is wrong when they come back to Stanford. I have been (automatically) shutting them down when I do see them come back, so at least we limit them as much as possible.

Hopefully the next WU you get is normal! Let us know if otherwise.

TJ
Thanks TJ. I'll know next time to just ditch the bad WU. The only reason why I kept on with it was to see if there would be a bonus and, of course, the spirit of folding. I did get a regular unit this go around, a 7809 with a TPF of 7 min 10 seconds, looking a bit more healthy there :)
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Project 7600 (19,0,53)

Post by iceman1992 »

FTBIG wrote:Well, after reading up on some of the other 76XX series WU's it does seem to be a small minority of people. I don't think it would be too complicated given how rarely these WU's come up.
Ah yes but if you read your own sentence from another perspective, why should they give additional complexity to the system for rare WUs? :wink: Doesn't really seem worthwhile, since it's so rare, why bother? :D
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 7600 (19,0,53)

Post by bruce »

tjlane wrote:Unfortunately, in the course of molecular simulation, sometimes things go wrong and the system can become unstable (in <0.1% of cases). This usually causes an immediate exit, but for the A4 core, it appears that sometimes the core proceeds if nothing has happened - just much more slowly than before.
What do we know about the "something" that can go wrong?

As with all mathematical simulations, the equations used contain approximations and have imitations. Though it might not be a real situation for Gromacs, perhaps an equation is only valid when each pair of atoms is farther than each other by some very small number and that (almost) never happens, and when it does, it almost always creates an identifiable error. It would be very difficult to predict that sort of situation.

Do we know if all A4 semi-failures that continue to run are caused by the equations or the data or is it also possible that the same condition can be caused by a hardware error, including things such as overclocking or overheating? If that's also a possibility, FAH most certainly would not want to award any kind of bonus for conditions that are within the control of the computer owner.
FTBIG
Posts: 4
Joined: Tue Apr 24, 2012 2:42 am

Re: Project 7600 (19,0,53)

Post by FTBIG »

iceman1992 wrote:
FTBIG wrote:Well, after reading up on some of the other 76XX series WU's it does seem to be a small minority of people. I don't think it would be too complicated given how rarely these WU's come up.
Ah yes but if you read your own sentence from another perspective, why should they give additional complexity to the system for rare WUs? :wink: Doesn't really seem worthwhile, since it's so rare, why bother? :D
Or from another, that I folded for 2.5 days and only received 2162 points. What's done is done, and while I know in the future to dump the problematic WU, others might not feel as compassionate to the cause.
Do we know if all A4 semi-failures that continue to run are caused by the equations or the data or is it also possible that the same condition can be caused by a hardware error, including things such as overclocking or overheating? If that's also a possibility, FAH most certainly would not want to award any kind of bonus for conditions that are within the control of the computer owner.
Judging from this WU, and the others that have run into other flawed 7600's, their TPF has varied depending on the CPU and overclock. A stock i7 920 has seen 80+ minute TPF's, where other overclocked systems have been anywhere from 30-60. I find it hard to believe that, unless the 7600 does not like HyperThreading, that there is a hardware issue for this WU. It is not to say that ALL A4 WU's have issues, nor do all the 76XX series.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 7600 (19,0,53)

Post by bruce »

FTBIG wrote:I find it hard to believe that, unless the 7600 does not like HyperThreading, that there is a hardware issue for this WU. It is not to say that ALL A4 WU's have issues, nor do all the 76XX series.
Let me put the "FAH does not like HyperThreading" question in perspective.
As a general rule, Stanford's FahCore treat a pair of HyperThreaded processors as about 1.2 real processors. In other words, a dedicated i7 with 4 real cores/8 threads is worth about 20% more than an equivalent i5 without HyperThreading.

Those numbers are relatively old esimates from Core_78 (uniprocessor) and FahCore_a3 (smp) projects but I'm pretty confident that they also apply to FahCore_a4 and FahCore_a5 projects without an active GPU project.(Adding a GPU complicates it, depending on whether it's an NV GPU or an AMD GPU and which model you have.)

The biggest thing that a SMP project like 78xx does not like is when it does not have dedicated access to the cores that you have given it when it starts processing. With even a moderate amount of resource conflict from some other tasks (if it's a long-term interruption) you'll find that -smp 6 can outperform -smp 8 even though Windows will tell you that 24% of your CPUs are idle . . . and in real terms, you're "wasting" as much as 10% of your system by not using those two extra virtual cores.
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Project 7600 (19,0,53)

Post by iceman1992 »

FTBIG wrote:Or from another, that I folded for 2.5 days and only received 2162 points. What's done is done, and while I know in the future to dump the problematic WU, others might not feel as compassionate to the cause.
I understand your feeling, I myself folded for around 50 hours and received about 1500 or so points. Yes we both fold for the cause, others may fold only for the points, I know. They base the bonus points in reference to a benchmark system I think, so how would they know whether you folded for 2.5 days because of a troublesome WU or of slow hardware?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 7600 (19,0,53)

Post by bruce »

Excluding the QRB, if the benchmark hardware takes 5 days, the WU gets 5x a many points as if the benchmark system takes 1 day. If your system is twice as fast as the benchmark system, it can finish them in 2.5 days or 12 hours and it will earn twice the PPD.

If your system isn't exactly twice as fast ... say it has a slightly faster CPU but slower RAM or smaller cache ... some WUs will be a bit faster than 2x as fast and others will be a little slower than 2x as fast.

They have no idea whether you folded in 2.5 days because you have slow hardware or because it was a troublesome WU or because you turned off the machine for a while or your home had a power failure and the machine wasn't folding the whole time. It's a bit like the classic "the dog ate my homework" story kids try to use on their teacher. You get credit for what you turn in and whether it was late or not, not what you how easy or hard you actually worked.
Brazos
Posts: 27
Joined: Tue Feb 24, 2009 2:02 pm

Re: Project 7600 (19,0,53)

Post by Brazos »

I'm running project 7610 (105,0,85) and the TPF is 1 hr 16 min. ETA is now 3.56 days and I'm at 33% complete. I guess I got a bad WU? Normally these SMP ones go pretty fast.
Simba123
Posts: 47
Joined: Mon Aug 01, 2011 12:46 pm

Re: Project 7600 (19,0,53)

Post by Simba123 »

I have just suffered through two of these units p7611 (4,38,128) and 7610 (142,0,96)
my rig is an i7-2600k @4.5 with 8 GB 1600Mhz Ram Win7 running the V7 client (latest release)
both were ok tpf wise (around 5 mins) but the ppd!! down at 12-14k. very very bad!
I don't mind running these, but the ppd needs to be fixed.
Post Reply