future core 17 WU?

NookieBandit · Post by **NookieBandit** » Tue Apr 01, 2014 11:26 pm

Joe_H, we are in rabid agreement on the time it would take to implement changes in the assignment server allocation logic to properly assign WUs to those GPUs able to run them most efficiently. I've been folding since 2009 and recognize the time and effort needed to get changes like this implemented with a small development team. I certainly respect, and more importantly understand, the challenges FAH has with a small development staff. Over the years, I've run both large development teams (over 150 engineers) and small teams and know the issues inherent in both, but one commonality between them is efficient allocation of resources to address critical product requirements.

With that backdrop, I want to address your contention that there is an opportunity cost of a having a GPU not fold. In the simplest terms, future WUs are issued based upon the results delivered from prior WU simulation runs. The faster those foundational WUs can be processed and delivered back to Stanford, the faster the next batch of WUs can be issued, and so on. Assuming the FAH software engineering team spent resources on developing assignment server logic that enforced specific WUs to be allocated to the most efficient hardware able to run them, then I contend the overall efficiency of FAH would increase dramatically. That assumption should be fairly easy to test: Compare the average TPF of a returned Core_17 WU to that of the average for a 780/780ti/Titan and HD7970/R9 290x GPU and the difference in TPF is a measure of the inefficiency of the current WU allocation logic. Reducing that inefficiency can either be done by intelligent allocation of WUs to optimized hardware able to run them efficiently, or simply removing low-end GPUs from the FAH network that can't efficiently run Core_17 WUs, absent improved allocation logic. When WUs are allocated to hardware systems not optimized for that WU (think a GT 430 running Core_17), then there will be delays invoked for processing subsequent generations of WUs. Those delays negatively impact the ability to immediately engage high-end GPUs optimized for that next-batch of WUs, resulting in a massive opportunity cost of having a low-end GPU folding a complex WU. Therefore, removing the bottleneck of a low-end GPU grinding its way through a Core_17 WU will speed up the entire FAH system for Core_17 projects. The opportunity cost is masked by having 8018 WUs allocated to high-end Kepler GPUs, but it still exists nonetheless. Having a large number of low-performing GPUs working on complex Core_17 WUs in a given generation taking substantially longer to deliver results (versus optimized allocation) reduces the value of the science; by definition and by representation via the QRB.

It's just my opinion, but allocating development resources to optimize WU distribution to given hardware platforms would be an enormously valuable investment. The effective throughput of the FAH network would increase dramatically, the scientists would get their results faster, and donors would be far more satisfied.

7im · Post by **7im** » Wed Apr 02, 2014 1:04 am

Running the WU on the most efficient hardware is not the most efficient way to get work done. There is no optimal time to wait for a more efficient machine if that machine gets turned off or upgraded tomorrow, or the next day. Waiting days for a more optimal machine is a waste, when a slower machine can still do the work in less than a day. Or a less than optimal machine like yours can do it even faster.

And designing that optimal prediction AI would take a whole new DC project it's so complex. You can never predict well enough network outages, buyer preferences for upgrading, vacations, seasonality, server outages, projects ending, starting, etc.

PG determines the order the work goes out to best move the science of the project forward. Those assignments are not always optimal for you, but they are optimal for FAH at the moment you get that work unit on that hardware.

You are saying the same thing as, "...in a perfect world" and that just doesn't happen. In a perfect world, we would get perfect WUs. Those don't exist, neither does the perfect world. Would you rather the GPU sit idle instead of fold core_15 WUs when there is a shortage of core_17 WUs?

When Core_17 WUs are temporarily unavailable, you're comfortable with the cost of not folding at all but reducing that cost by folding with Core_15 is unacceptable? How is that logical?

Post by **bruce** » Wed Apr 02, 2014 1:39 am

Actually, a great deal of what NookieBandit is asking for is already functional. (I have no doubt it could be improved, but not without a significant cost, as has already been said.) 7im has explained a lot of it already but let me approach it from a slightly different perspective.

The Assignment Server does know the difference between the GT 430 (etc) and the 780 (etc.). NOTE:The GT430 cannot run the OpenCL-based Core_17 so it is forced to run the CUDA-based Core_15. Core_15 projects were designed before the 780 was invented and those projects are based on the expected rate of progress of lower-end GPUs (or better) and when an occasional WU is processed by your 780, it's always good for that project.

Note that the priority of Core_17 projects is very, very much higher than the priority of Core_15 projects. Thus the 780 (etc) will be given assigned Core_17 whenever possible and those projects will never be slowed down by being assigned to Tesla GPUs like the 430.

All servers run into difficulties from time-to-time and the AS attempts to make the best of those situations. Recently there have been situations where no Core_17 WUs have been available to be distributed. A couple of weeks ago, they were in very short supply and it took some time while more WUs were being generated (e.g.- by creating 13001). Servers do go off-line and currently there are some serious hardware failures which are being worked on. Other things can happen such as projects finishing, but that's not important to this discussion. For whatever reason, the AS may be unable to give you a Core_17 assignment so it's faced with a choice of choosing to give you a lower priority WU which happens to be Core_15 or telling your 780 that no work is available.

That is a challenging choice for the AS to make: helping speed up a Core_15 project or doing nothing for any project by refusing to make any assignment. In fact, the scientists which run project_15 are happy when your 780 can be given a Core_15 assignment because it speeds up their science at times when there are no more advantageous assignments. Core_17 projects are already being slowed down because they are unavailable. Under conditions like this, it's certain that donors will gripe more, of course, but that's only because they falsely assume that there are always desirable assignments to be given. If they could see the same choices that the AS has to make based on an instantaneous snapshot of dynamic conditions, they MIGHT be more understanding.

sco01 · Post by **sco01** » Fri Apr 04, 2014 5:34 pm

bruce
seeing shortages of WU on 17 core (when no new projects and remembering the news that new jobs for 15 core no longer... I.e. we now complete all the available and the folding on the GPU fails, does it happen so?

Post by **bruce** » Fri Apr 04, 2014 6:31 pm

The current goals include replacing Core_15 and Core_16 with newer code. Follow-on projects will be started using new cores.

The future of Core_17 WUs is bright, in spite of a few hiccups along the way. The number of projects for Core_17 is increasing but it takes time to develop new projects and to phase out older ones. For the most part, there have been plenty of WUs. (Stanford's goal is for that to always be true!)

Science is never static. If some follow-on to the current selection of Cores can be developed that will increase production, it will be (although nothing has been announced except NaCl).

Flaschie · Post by **Flaschie** » Fri Apr 04, 2014 9:52 pm

bruce wrote:Note that the priority of Core_17 projects is very, very much higher than the priority of Core_15 projects.

Could you please explain this one a bit more for me. Because as far as I can see, servers 171.64.65.105 (P76XX) and 171.67.108.142 (P8018) which are core15 servers are both having 10 000 as "Weight" in the server stats, which is actually more than the only currently running core17 server 140.163.4.231 (P13000/P13001) having "only" 5 000/9 000 (advanced/full). Hence, I read this as core17 beeing less prioritized than core15. When I look at other servers which host EOL-cores, say 171.67.108.11 (P57XX @ GPU2), they have a weight of 1.

Anything I'm missing here, or misunderstanding?

Post by **bruce** » Fri Apr 04, 2014 10:14 pm

Hmm. I hadn't noticed those exceptions. (I'm the one that was missing something.)

I'm not sure about p7627. It's reportedly a follow-on to p7625 so there may be some justification for extending the life of FahCore_15. Decisions like that are way above my pay-grade. The announcement says it's restricted to Fermi. My NV gpus are all on Linux and ATI GPUs are on Windows, so I have not seen that project yet.

P8018 seems to be in a similar situation. The last time I received one of them was in September. The only answer I know was given by Dr. Pande:
viewtopic.php?f=74&t=26067&p=261458#p261458

Have you received other projects? ... and on what GPU?

Flaschie · Post by **Flaschie** » Sat Apr 05, 2014 8:48 am

I only have AMD-GPUs, so I only recieve P13000/P13001 at the time beeing. I just know that a lot of people running Nvidia cards (like 780Ti/Titan) are more or less constantly getting core15/P8018, and not core17 unless P9401 was up and running.

But then it seems at least that P8018 is still a highly valued project.

bcavnaugh · Post by **bcavnaugh** » Sat Apr 05, 2014 3:31 pm

What are the Project Server addresses for all the Core 17 Projects?

Project Server for 13000 and 13001
http://140.163.4.231/projects

Also from: http://fah-web.stanford.edu/cgi-bin/fah ... llbebanned

Folding@home project descriptions
Project 9401
Error: A description for this project does not yet exist.

Is this due to the server being down or it was never added to the project descriptions?

Post by **Joe_H** » Sat Apr 05, 2014 3:56 pm

Project 9401 was served from 171.67.108.31 until it went down - viewtopic.php?f=18&t=26065&p=261981#p261971. As for other servers, look for projects using the Zeta core on the Project Summary page. That will show which ones are active and what server they come from.

bcavnaugh · Post by **bcavnaugh** » Sat Apr 05, 2014 4:14 pm

Joe_H wrote:Project 9401 was served from 171.67.108.31 until it went down - viewtopic.php?f=18&t=26065&p=261981#p261971. As for other servers, look for projects using the Zeta core on the Project Summary page. That will show which ones are active and what server they come from.

Thanks I found the link at http://fah-web.stanford.edu/pybeta/serverstat.html

Calcii · Post by **Calcii** » Sun Apr 06, 2014 2:57 am

I hate P8018, take bonus for that project for the new cards or do take anymore, only on 17 core

P8018 wasted time of my videocard

Kurtis200200 · Post by **Kurtis200200** » Sun Apr 06, 2014 11:42 am

Calcii wrote:I hate P8018, take bonus for that project for the new cards or do take anymore, only on 17 core
P8018 wasted time of my videocard

Keep in mind, the data gained from any project has the potential to lead to a major scientific breakthrough

Mactin · Post by **Mactin** » Sun Apr 06, 2014 1:17 pm

Calcii wrote:I hate P8018, take bonus for that project for the new cards or do take anymore, only on 17 core
P8018 wasted time of my videocard

Every WU has to be folded.
The fact it has no "bonus" does not mean it has less scientific value.
Points are just points, they have no real value, what is all this fuss about ?

Calcii · Post by **Calcii** » Sun Apr 06, 2014 2:46 pm

For me points important or remove points at all. The essence of people in the competition

Folding Forum

future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?

Re: future core 17 WU?