Selective Folding (branched from Nvidia issues forum)

LuarAzul · Post by **LuarAzul** » Thu Jun 04, 2009 7:27 am

I'm sorry to intrude on this discussion, but sometimes we forget the obvious:

1. heating is a hardware problem, CPUs, GPUs, and other chipsets are designed throughout the industry to run at maximum speed, if Nvidia's GPUs can't, then it's a hardware problem.

2. Donors should have the obvious right to choose which WUs they are folding. Especially when these problems emerge. This right to choose is good for everyone, and indeed for the science to go on as swiftly as possible. If Stanford finds out that some projects are getting behind, they can simply increase their point value, so as to achieve the right balance. Surely this must be easy to implement!

It is really simple guys. The further question of: «is there a bug in the client, drivers or WU?» is an interesting and important one, and it will certainly be addressed in due time. But right now the important thing is to give donors the ability to choose their WUs and it would also be nice if people here accepted the general premise that CPUs and GPUs are intended to work correctly no matter what kind of software you throw at them. We are really entering a dangerous path for the industry if we allow software developers to be criticized by badly designed hardware.

Good luck to you all!

X1900AIW · Post by **X1900AIW** » Thu Jun 04, 2009 8:07 am

LuarAzul wrote:2. Donors should have the obvious right ...

There is no "obvious right", we are donors, if we would take part into the project, if we pay for something a discussion about return services could make sense. Does have FAH the right to use (y)our hardware ?

You can switch to other (existing) projects using CUDA, this year some projects will introduce CUDA applications, the total TFLOPS dropped from a higher level, but FAH will (surely) always keep loyal members.

John Naylor · Post by **John Naylor** » Thu Jun 04, 2009 10:18 am

LuarAzul wrote:2. Donors should have the obvious right to choose which WUs they are folding. Especially when these problems emerge. This right to choose is good for everyone, and indeed for the science to go on as swiftly as possible. If Stanford finds out that some projects are getting behind, they can simply increase their point value, so as to achieve the right balance. Surely this must be easy to implement!

I'm sorry to rain on your parade but this feature never has and never will be implemented because the Pande Group knows which research needs doing quickest... sometimes that research could be units that are slower and/or disliked by donors (e.g. the 511 point GPU units) and donors choosing not to fold such units would be slowing down the science that needs to be done. All that allowing donors to select units will do is slow down some of the science, and at the end of the day it all still needs to be done.

shdbcamping · Post by **shdbcamping** » Thu Jun 04, 2009 1:18 pm

Hi again all,
Again we are getting to the "I'm not having a problem with these work unit's", Or "I'm Fine with the Temps". This is not helping me or others with hot 9800GX2 cards. Also, the GX2's I continued to buy (8 of them) were because the projects were cool... 70'sC. Pande broke stride and went to flame throwing Projects. I can only Bin-3 with EVGA precision. 2 511's or 430's on one card shoot over 102C. Other project without down-Binning run mid 70'sC .

Let's get back to ideas to implement User interface choices with the WU or CFG to choose WU classes and let the user throttle them to the DONORS comfort level regarding GPU core heat. Thast way I can make up the difference by letting the non Heat affected WU's go racecourse mode. I have begun stopping some instances of the HOT core11 WU's and deleting until I get a non burner WU. If Pande will not give me a way to ensure my system is used in a manner that I am comfortable with, I will send them back and draw again until I get another one that accomodates. For me, optimum is 75C. I can do 75c massively shader OC'd and mem and core stock on all my GX2's. It is a software design issue with these Overheated WU clases. All who have no problems could let the defaults run and Max them if it worked. I'm not saying slow every one down... Just looking for an opportunity to be able to run them where I can be comfortable. I hate having having to shut down Half of some GX2's because I want to run for the long haul and don't have the money to replace 8 of them if they get burnt up prematurely.

If what I keep hearing is true, that any contribution is as equal as any other... Then why can't I downsize my contributions on certain projects and yet leave the option for WU's that Fly on my 3X GX2 systems get the lead out. Currently, ALL WU's are crippled for the Flamethrowers because there are no options to preserve my equipment. Please let's not turn this into a blame thing because someone is running a single 9800GTx with no heat problems. MY post is about my 9800GX2 cards. Any other card is irrelevant.

I'm still trying to manage my system with max contribution. I am only trying to find a way to protect my investment and continue to "fold on" for a very long time. PS..... I just had another 9800GX2 delivered to fill out the last slot in my last system.I find it a waste to close 4-5 clients because I am declined the opportunity to keep heat specs to my liking. That's more science not getting done than me being able to throttle to my Heat specs.

Can someone tell me what happens with Pande servers when they get a bunch of aborted flamethrowers back after they have been prevoiusly assigned? If this is what I need to do to contribute to science and protect my system in a way I wish to... I'm sorry, I just don't have another $10,000.

Edited for tone - susato

Edit: What does editing for tone mean? And please PM me so that I can talk and understand the problem. I am seriously trying to get a thread discussion going somewhere without all the sidetrack diversions. PM sent.

John Naylor · Post by **John Naylor** » Thu Jun 04, 2009 1:33 pm

shdbcamping wrote:Can someone tell me what happens with Pande servers when they get a bunch of aborted flamethrowers back after they have been prevoiusly assigned? If this is what I need to do to contribute to science and protect my system in a way I wish to... I'm sorry, I just dont have another $10,000.

The servers are unaware that the units have been aborted; when the preferred deadline expires the unit is simply re-added to the queue of work to be distributed, until that unit is completed or an as-yet unspecified number of people EUE it at the same point (the natural end of the run).

Insidious · Post by **Insidious** » Thu Jun 04, 2009 1:56 pm

I agree with shdbcamping that we, as a group would not cripple science if we were allowed to selectively choose which core was going to run on our instances of F@H.

Frankly, if the Pande Group noticed a significant number of specific cores/units not getting selected for processing by their user base (as it seems to be assumed would happen), well... maybe they should consider a sea change in their thinking about the "equality" of WUs, and their present methods of balancing points, temperature or stability.

If ego were to be put aside for a moment, and the Pande Group scientists walked the talk of objective science, that would be an invaluable source of feedback to the students developing these projects and the growth that could ensue due to incorporating some consideration for your donor base could be very significant in the long run. I doubt that anyone would argue that if your donor base doubled... more science would get done.

My 'temporary' solution has been to remove the 2nd video card from each of my PCs, and run with a system that has not been artifically crippled by underclocking (I bought powerful video cards for a reason) or removing the side of the case (it's there for a reason).

-Sid

Post by **susato** » Thu Jun 04, 2009 1:59 pm

If someone "dumps" a unit it is reassigned at the end of the preferred deadline. This means a 2 day delay (for the 5514 - 5519 and 5732-5739 projects) or a 4 day delay (for the 5773 - 5780 projects) before another donor has a chance to try the unit. Each dumped unit sets the science back by delaying that generation and all future generations of its (run,clone) by that interval. When you reckon that a fast GPU will finish 4-10 WU/day depending on size, each dumped unit sets the (run,clone) back by 10-15 generations. This is terrible for the science - if people chose to complete only the work units they "liked" then projects with a reputation (fair or not) for low points or problems wouldn't get done at all.

Earlier in the project, in the old forum, there were massive, resentful discussions of work unit dumping, as some donors defended their right to cherry-pick high point value WU's and others weighed in against the practice. I'm glad those days are over - but if you look at the .sigs of folders who have been with the project a long time, many of them contain the phrase "folding whatever they send me" as a pledge to put science before points.

I recognize that your issue is hardware temperature, not ppd, but dumping work units does the same serious damage in either case - it sets back the project and is bad for the science.

Hoping this discussion of work unit dumping does not derail the thread - I look forward to more useful discussion on thermal issues with the 9800 GX2 cards that shdbcamping and many other folders are now using.

Insidious · Post by **Insidious** » Thu Jun 04, 2009 2:04 pm

I don't see the pain you do of dumping some work units when that behavior is compared to not folding at all.

And yes, I do believe my idea would certainly be a departure from the attiudes that presently exist.

That is pretty clearly implied when I wrote

...maybe they should consider a sea change in their thinking ...

-Sid

Post by **susato** » Thu Jun 04, 2009 2:29 pm

Insidious wrote:I don't see the pain you do of dumping some work units when that behavior is compared to not folding at all.

Actually, dumping work units is worse than not folding at all. Sounds like a paradox, but here's why:

The work unit dumped has to wait 2-4 days before anyone else starts it, thus delaying its project. If you just turned off your rig, that unit would be assigned to someone else and would complete on schedule, keeping the project on schedule.

Insidious · Post by **Insidious** » Thu Jun 04, 2009 3:07 pm

It wasn't me that brought it up. I'll take a pass on the flame threads.... thanks!

shdbcamping · Post by **shdbcamping** » Fri Jun 05, 2009 9:21 am

LuarAzul wrote:I'm sorry to intrude on this discussion, but sometimes we forget the obvious:

1. heating is a hardware problem, CPUs, GPUs, and other chipsets are designed throughout the industry to run at maximum speed, if Nvidia's GPUs can't, then it's a hardware problem. Then how or why are we overclocking them if they are designed to run at maximum speed???????

2. Donors should have the obvious right to choose which WUs they are folding. Especially when these problems emerge. This right to choose is good for everyone, and indeed for the science to go on as swiftly as possible. If Stanford finds out that some projects are getting behind, they can simply increase their point value, so as to achieve the right balance. Surely this must be easy to implement!
+1

It is really simple guys. The further question of: «is there a bug in the client, drivers or WU?» is an interesting and important one, and it will certainly be addressed in due time. But right now the important thing is to give donors the ability to choose their WUs and it would also be nice if people here accepted the general premise that CPUs and GPUs are intended to work correctly no matter what kind of software you throw at them. We are really entering a dangerous path for the industry if we allow software developers to be criticized by badly designed hardware.
Except fot Fit, Form and Function of the Hardware design. Both AMD and NVidea have Hardware designed specifically for this type application. Video Graphics cards are only CUDA Capable and not "Designed for CUDA, therefore do not need the extra cost of the Designed for stuff.

Good luck to you all!

Agreed in most part. Added comments in bold
Sean

gwildperson · Post by **gwildperson** » Fri Jun 05, 2009 2:45 pm

I think you're forgetting that Fit, Form and Function also applies to Software design. CUDA (and CAL) are "Designed for" GPUs, and therefore have a responsibility not to damage GPUs.

Could a Mod put this part of the discussion back into the discussion on heat. It seems that sdbcamping wants to continue is arguing about that discussion here. It doesn't belong in a discussion of Selective Folding.

PM sent - thread split/move/merge under review - susato

Bill1024 · Post by **Bill1024** » Fri Jun 05, 2009 3:04 pm

shdbcamping wrote:Hi again all,
Let's get back to ideas to implement User interface choices with the WU or CFG to choose WU classes and let the user throttle them to the DONORS comfort level regarding GPU core heat. Thast way I can make up the difference by letting the non Heat affected WU's go racecourse mode. I have begun stopping some instances of the HOT core11 WU's and deleting until I get a non burner WU. If Pande will not give me a way to ensure my system is used in a manner that I am comfortable with, I will send them back and draw again until I get another one that accomodates. For me, optimum is 75C. I can do 75c massively shader OC'd and mem and core stock on all my GX2's. It is a software design issue with these Overheated WU clases. All who have no problems could let the defaults run and Max them if it worked. I'm not saying slow every one down... Just looking for an opportunity to be able to run them where I can be comfortable. I hate having having to shut down Half of some GX2's because I want to run for the long haul and don't have the money to replace 8 of them if they get burnt up prematurely.

Can someone tell me what happens with Pande servers when they get a bunch of aborted flamethrowers back after they have been prevoiusly assigned? If this is what I need to do to contribute to science and protect my system in a way I wish to... I'm sorry, I just don't have another $10,000.

Looks like cherry picking to me.
Is Stanford going to let that stand?
What action has been taken in the past? And is this going to be done now?
Cherry picking is not right no matter the reason.

v00d00 · Post by **v00d00** » Fri Jun 05, 2009 3:25 pm

Im sure if Pandegroup see a lot of this happening the accounts for those donors will end up zeroed as they have been in the past for other and similar cases. They have made a point of reminding people of this many times before, but im guessing an example will have to be made soon, once again.

Might be better to make it very public this time to hammer it in. People who do this are hurting the project, and that isnt acceptable.

Edited, pm sent. -susato

kiore · Post by **kiore** » Fri Jun 05, 2009 3:26 pm

Better you reduce your hardware than purposefully crash units, thus delaying the project and risking results.
Let the researchers decide the priorities, having priorities set by donors is problematic, not just on this project.
Yes I'd prefer to get only sweet ones too, but I'll trust the scientists to run the science on this.
kiore.

Folding Forum