Is F@h the 1st DC project to use SMP and PS3?

Rattledagger · Post by **Rattledagger** » Wed Dec 14, 2011 5:54 pm

Jesse_V wrote:
bruce wrote:In the early days of SMP, a quad processor could reduce the elapsed time from N days to ~0.23*N]I'm pretty sure Bruce said 0.23 for a reason. At first I too was a bit confused, but I thought about it a bit more and realized that it might have something to do with overhead and the architecture. In fact, I'll bet that if you ran 4 uniprocessors you wouldn't be exactly 4x as productive. The 0.23 is reasonable considering all the data transfers and core synchonizations that Bruce discussed, and that can slow down production. In that sense, its difficult to get the strong scaling needed to get to 0.25*N. With a quad-core and considering Bruce's discussion, I'm not sure how you could get over 0.25.

He said the elapsed time was N days, using my example let's say 4 days if you run 4 instances of the single-thread application, that would be the common thing to do without SMP.

In the unlikely instance that SMP gives no extra overhead, meaning nothing to syncronize between threads and so on, a quad will use 1/4 as long time, meaning 1 day. If you've got some extra SMP-overhead, you'll use more than 1 day, meanin 4 days * 0.3 (using my example) = 1.2 days.

A number like 0.23 on the other hand indicates the SMP-client uses only 0.92 days, and I find it unlikely that with SMP you'll manage to finish 4 wu's in 3.68 days if the quad-core uses 4 days to run the same 4 wu's with a single-threaded application...

Hmm. Well each DC projects has to balance the development and support of an SMP client against its benefits. In F@h's case where there's serial calculations involved, getting things done quickly is an advantage. As Bruce mentioned, things like SETI@home don't need this, and 4 uniprocessors is probably fine. And by "performance hit" I'm assuming you mean the DC project, which is true. I don't know about you, but when I browse the web I used like 5% of my quad-core's power, which means that I can watch FahCore_a4.exe drop down to 95% usage. Even when I run some CPU-intensive program, F@h drops down to only 75%. The point of these run-in-the-background clients is that people don't use that much CPU all the time, and even when they do there's usually plenty left over.

FAH according to TaskManager drops down to 75% yes. But, if you looks on the FAH-log, you'll see that the "5 minutes between 1%" suddenly has increased to "20 minutes between 1%" or even worse. Losing a core should mean the 5 minutes increases to 6.67 minutes, while losing 2 cores should mean 10 minutes. An increase from 5 minutes to 20 minutes on a quad-core means you're effectively only using a single core for crunching, but you're using electricity for all of them.

Example of this below, not from a quad but i7-920 meaning quad+HT. Now, don't remember exactly that I did, but it's likely something a little more demanding than web-browsing...

Code: Select all

22:29:41:Unit 01:Project: 7905 (Run 13, Clone 47, Gen 0)
22:29:41:Unit 01:
22:29:41:Unit 01:Entering M.D.
22:29:47:Unit 01:Using Gromacs checkpoints
22:29:47:Unit 01:Mapping NT from 8 to 8 
22:29:48:Unit 01:Resuming from checkpoint
22:29:48:Unit 01:Verified 01/wudata_01.log
22:29:48:Unit 01:Verified 01/wudata_01.trr
22:29:48:Unit 01:Verified 01/wudata_01.edr
22:29:48:Unit 01:Completed 133400 out of 500000 steps  (26%)
22:31:22:Unit 01:Completed 135000 out of 500000 steps  (27%)
22:36:11:Unit 01:Completed 140000 out of 500000 steps  (28%)
22:41:00:Unit 01:Completed 145000 out of 500000 steps  (29%)
22:46:41:Unit 01:Completed 150000 out of 500000 steps  (30%)
22:52:23:Unit 01:Completed 155000 out of 500000 steps  (31%)
23:11:22:Unit 01:Completed 160000 out of 500000 steps  (32%)
23:28:53:Unit 01:Completed 165000 out of 500000 steps  (33%)
23:35:14:Unit 01:Completed 170000 out of 500000 steps  (34%)
23:41:39:Unit 01:Completed 175000 out of 500000 steps  (35%)
23:46:30:Unit 01:Completed 180000 out of 500000 steps  (36%)

Post by **bruce** » Wed Dec 14, 2011 6:47 pm

My 0.23 was wrong. When I originally wrote it I had a quad running at (1/3.8) which seemed confusing so I rewrote it and should have said 0.26 rather than 0.23. They're just estimates, but SMP always adds some overhead of time spent syncing each segment of the atoms so a quad can never be quite as fast as 4x faster. It doesn't really matter whether you believe it's 0.26 or 0.3. The point is that 1 day becomes 0.3 days or 0.26 days -- i.e.: significantly faster.

I think you've missed the main point of the differences between the data analyzed by FAH-type projects and Seti-type projects. Again, projects that analyze fully parallel data (e.g. S@H) have no reason to waste development time on SMP or GPU or PS3 programming. Projects (e.g. FAH) that must analyze serial data do. A quad computer is quite capable of running four independent WUs or one SMP WU and for Seti@home, the former is better / for FAH, the latter is better.

I do not understand the case you've given with a project that has 10000 wu's, and you've got 2k active quad-core computers, but let's just say it doesn't happen very often. FAH would establish a project with maybe 3K WUs if there are 2K active computers. I said in the beginning that it's important to have enough WUs but it's a waste of elapsed time to generate a lot of WUs that cannot be actively processed. There is more value in completing a series of regenerated new WUs than having WUs sitting on some server somewhere not being processed.

That's part of what is happening with the January changes to the BigAdv projects FAH sized a few projects for the expected number of powerful computers. Work that was started X-months ago still needs to be done by the most powerful computers. . The number of 8-way computers has increased rapidly. They used to be the most powerful home computers but that's no longer true. Because of the incentives, Donors want to be doing those designated projects and there simply are not enough WUs of those selected projects, nor is there a current scientific need to start new ones. Science dictates that projects with a year or two into generating sequential generations need to be continued, not new ones started. Those projects need to be assigned to a smaller set of the most powerful computers available so that those projects can continue with minimum delay. There is plenty of other work that will make excellent use of 6-way and 8-way computers but those people who own them are not content to work on those other projects.

The real question is not what happens when you have projects that has 10000 wu's, and you've got 2k active quad-core computers, it's what happens when you have 10K active N-core computers (where N is some number >=8) competing for 2k WUs. Clearly, you need to increase N, so that 8K of those active computers will be assigned something else, thereby also increasing the "average" value of N for the 2K active computers that will be working on those 2K WUs.

Projects that analyze pre-collected data (like Seti@home in my previous example) have no reason to ever experience a shortage of WUs. Projects with predictive analysis that must analyze serial data (like FAH in my previous example), are better off when the number of trajectories approximates the number of active computers.

Post by **bruce** » Wed Dec 14, 2011 7:04 pm

Rattledagger wrote:In the unlikely instance that SMP gives no extra overhead, meaning nothing to syncronize between threads and so on, a quad will use 1/4 as long time, meaning 1 day.

Agreed.

The example you showed demonstrates that your goal was NOT to optimize FAH's performance. We can help you with that if you'd like.

That example was an i7 running both FAH with -smp 8 and something heavy like BOINC. When the OS sees a task that asks for resources, it will dispatch it if it has one or more available threads. When FAH is configured to use 8 threads and there are only some smaller number, it will be dispatched even though it can't get all 8 threads that it would like to have. Dispatch priority only applies to the first thread. Other threads are processed if there are free resources. This will not prevent FAH from running, but it can severely impact performance. Say only four processing threads are free. Half of the 8 work packets will wait and process sequentially, plus the synchronization time will be increased so FAH performance will be slower than half as fast as it would have been on an unloaded CPU.

If 4 of your 8 processing threads are free, then -smp 4 will be more efficient than -smp 8. Also, since your example is HyperThreaded, remember that pairs of threads have to compete for the shared FPU. You're really running a quad, and -smp 4 is almost as fast as -smp 8, even on a dedicated machine. Donors who choose to run -smp 8 gain an extra ~15%, at most, and potentially loose performance faster when the machine is used for other things.

Rattledagger · Post by **Rattledagger** » Wed Dec 14, 2011 8:15 pm

bruce wrote:I think you've missed the main point of the differences between the data analyzed by FAH-type projects and Seti-type projects. Again, projects that analyze fully parallel data (e.g. S@H) have no reason to waste development time on SMP or GPU or PS3 programming.

Developing for GPU can be an advantage, since the GPU will be an additional resource on top of the cpu. Even if you in some instances "loses" one cpu-core, if the GPU performs like 10 cpu-cores, this is a good bargain and obviously an advantage to any projects that can get so large performance-increase out of the GPU. Still using quad-core as an example, in this instance quad + gpu will finish 13 single-threaded wu's in the same time as the quad used to finish only 4 wu's.

I do not understand the case you've given with a project that has 10000 wu's, and you've got 2k active quad-core computers, but let's just say it doesn't happen very often. FAH would establish a project with maybe 3K WUs if there are 2K active computers.

Ah, most DC-projects has either a "fixed" amount of wu's at the start of the project, has an "unlimited" number of wu's, or can only generate N wu's per day, so how many active computers there is has little bearing on how much work is available. In most instances the demand for cpu-power greatly exceeds that's available, but there is also projects there cpu-power exceeds available work. One example of this is SIMAP, that normally only has work for a couple days start of each month.

That's part of what is happening with the January changes to the BigAdv projects FAH sized a few projects for the expected number of powerful computers. Work that was started X-months ago still needs to be done by the most powerful computers. . The number of 8-way computers has increased rapidly. They used to be the most powerful home computers but that's no longer true. Because of the incentives, Donors want to be doing those designated projects and there simply are not enough WUs of those selected projects, nor is there a current scientific need to start new ones. Science dictates that projects with a year or two into generating sequential generations need to be continued, not new ones started. Those projects need to be assigned to a smaller set of the most powerful computers available so that those projects can continue with minimum delay. There is plenty of other work that will make excellent use of 6-way and 8-way computers but those people who own them are not content to work on those other projects.

Of course, people doesn't like losing their "carrot"...

Post by **bruce** » Wed Dec 14, 2011 11:17 pm

Rattledagger wrote:Ah, most DC-projects has either a "fixed" amount of wu's at the start of the project, has an "unlimited" number of wu's, or can only generate N wu's per day, so how many active computers there is has little bearing on how much work is available. In most instances the demand for cpu-power greatly exceeds that's available, but there is also projects there cpu-power exceeds available work. One example of this is SIMAP, that normally only has work for a couple days start of each month.

True. Most DC-projects are purely parallel. A small percentage of DC-projects have serial requirements.

If you think of a FAH trajectory as a single WU that needs 20 years of processing and you break it down into successive Generations so the same WU moves from machine to machine but it's still serial and it'll still take 20 years, plus whatever time it spends uploading, sitting waiting to be assigned, and downloading. You can also easily see why SMP or GPUs with lots of shaders can be used productively to reduce that 20 year elapsed time to something more practical.

7im · Post by **7im** » Thu Dec 22, 2011 8:38 pm

Wow, this topic went WAY off topic at the end.

So has JV resolved his doubts of my statement that FAH is first to use GPUs, PS3s, and multi-threaded SMP?

As if...

Aqua@home was the first BOINC project to use multiple threads, i.e. SMP. This is documented on the A@H wikipedia page. A@H launched in July 2009, SMP came after that for them, but since 2009 is long after FAH had SMP, FAH wins.

And A@H is a dead project, and the project Website is gone.

Also, the Internet Wayback machine shows a Linux SMP and OSX SMP beta client on the FAH Download page as early as November 14th, 2006. So when the High Performance FAQ states the beta started in October 2006, it's probably factual.
http://web.archive.org/web/200611141245 ... nload.html

JV, for reference, MPI is the same as SMP in regards to fah using multiple threads. Hopefully you picked that up by now.

Post by **Jesse_V** » Thu Dec 22, 2011 8:58 pm

7im wrote:Wow, this topic went WAY off topic at the end.

So has JV resolved his doubts of my statement that FAH is first to use GPUs, PS3s, and multi-threaded SMP?

As if...

Aqua@home was the first BOINC project to use multiple threads, i.e. SMP. This is documented on the A@H wikipedia page. A@H launched in July 2009, SMP came after that for them, but since 2009 is long after FAH had SMP, FAH wins.

And A@H is a dead project, and the project Website is gone.

Also, the Internet Wayback machine shows a Linux SMP and OSX SMP beta client on the FAH Download page as early as November 14th, 2006. So when the High Performance FAQ states the beta started in October 2006, it's probably factual.
http://web.archive.org/web/200611141245 ... nload.html

JV, for reference, MPI is the same as SMP in regards to fah using multiple threads. Hopefully you picked that up by now.

WOW! I thank you very much for providing that proof. That is extremely helpful, and I very much appreciate you bring this thread back on topic and answering my questions. I was familiar with the November 14th date, because an IP posted that on the Wikipedia page on the 14th. Then someone came along and cited some folding-community.org thread (which is now completely gone and I wasn't able to find it with the Wayback Machine) so that was a little unresolved. I thank you for the Aqua@home link, I'm positive I'll find some supporting evidence of that as well, since it lacks a citation. In short, your statement was valid (I suspected as such, just wanted supporting evidence) and I now have some solid evidence for F@h being the first to use GPU, SMP, and PS3. Thanks!

I had the impression that MPI was different from threads, but accomplished the same thing in the end, and that SMP was a blanket statement that covers both. Dr. Pande's blog post said that F@h was the first DC project to use MPI, but since that's different from thread-based applications that didn't completely answer my SMP question.

7im · Post by **7im** » Thu Dec 22, 2011 9:08 pm

Jesse_V wrote:...
I had the impression that MPI was different from threads, but accomplished the same thing in the end, and that SMP was a blanket statement that covers both. Dr. Pande's blog post said that F@h was the first DC project to use MPI, but since that's different from thread-based applications that didn't completely answer my SMP question.

Yes, different, but the same in regards to fah having a SMP client. That MPI statement is what I was directing my comments towards. MPI was used to make an SMP client. MPI initiated multiple fahcores to fold SMP. Now we have 1 fahcore that runs multiple threads to do SMP folding. So yes, SMP is the over all general term in use. And as I said, in regards to fah and that article, MPI and SMP are interchangeable.

codysluder · Post by **codysluder** » Thu Dec 22, 2011 11:44 pm

MPI is one standard way of providing support for multiple threads. It worked well on Linux but not well on Windows so eventually the Windows client was somehow reworked to contain a reliable way of supporting SMP.

MPI also supports capabilities such as distributing threads across multiple nodes of a cluster where the Multi-Processing capabilities are not necessarily Symmetric. FAH doesn't use those other capabilities so it made sense to develop a subset of MPI to run SMP.

Post by **Jesse_V** » Wed Dec 28, 2011 4:45 am

Please forgive my ignorance here, but I'm still confused as to why MPI and SMP are "interchangeable" if they are in fact different. As both 7im and codysluder have said or implied, SMP covers both MPI and threads. To my knowledge there are no other ways of doing SMP, and if that is true then we only have two subsets to work with here. My questions about GPU and PS3 have been answered, but after I think about it I realize that I'm still a little shaky on SMP. I'm trying to develop, in a completely objective fashion, a sort of proof that make it irrefutable that F@h was the first DC project to have an SMP client. We already know that F@h was the first for MPI. I found a publication (which I added as a citation to that statement in the Aqua@home article) which clearly states that Aqua@home was the first DC project under BOINC to have a SMP client to use threads. We know that F@h's SMP first came out sometime in November 2006, and was definitely out by the 14th. Thus, I still need to find if there were any non-BOINC project which used threads before Nov 1st 2006. Is this search necessary? If MPI == SMP, then that's enough proof as it is. Either way, things are indicating that F@h was the first, but I just want to be absolutely sure. Thanks in advance for any help you can provide.

7im · Post by **7im** » Wed Dec 28, 2011 7:57 am

VJ noted MPI client (backed up by HiPerf FAQ). MPI is used for SMP. MPI == SMP. FAH wins.

On the other end, the only projects with enough horsepower to expand beyond uniprocessor clients 5 years ago (other than FAH) would have been BOINC based. And since Seti was the largest, and didn't have an SMP client, FAH wins. As we've seen, BOINC (A@H) didn't do SMP until more than 3 years after FAH. FAH wins.

Fah probably had the threads based SMP client before everyone as well. v6 has been around a long time. I'd have to check on that one to be sure ...

Post by **Jesse_V** » Wed Dec 28, 2011 8:20 am

7im wrote:VJ noted MPI client (backed up by HiPerf FAQ). MPI is used for SMP. MPI == SMP. FAH wins.

On the other end, the only projects with enough horsepower to expand beyond uniprocessor clients 5 years ago (other than FAH) would have been BOINC based. And since Seti was the largest, and didn't have an SMP client, FAH wins. As we've seen, BOINC (A@H) didn't do SMP until more than 3 years after FAH. FAH wins.

Fah probably had the threads based SMP client before everyone as well. v6 has been around a long time. I'd have to check on that one to be sure ...

Thanks. I've been searching around a bit as well but if you could check on that threads as well I'd sure appreciate it.
I guess I underestimate the difficulty in launching a DC project, and using some standard libraries to get threads going or something like that. It must be the cost-benefit ratio again. There are other big projects out there, like distributed.net. I'm not positive on what to conclude from this, but on http://www.distributed.net/Download_clients the words "multithreaded" and the datestamp 2004-11-13 look fairly close together. It's not really clear to me, so I personally can't say "distributed.net had a multi-threaded client in 2004!" at this moment, but just wanted to point it out.

7im · Post by **7im** » Wed Dec 28, 2011 8:47 am

Interesting. Did that "multi threaded" client use those many threads to speed up the processing of one work unit, or did it simply download multiple work units, one work unit for each thread?

Knowing that answer would either help cement fah as the winner, or reduce it to being the first Windows SMP client.

Rattledagger · Post by **Rattledagger** » Sun Jan 01, 2012 11:07 pm

Jesse_V wrote:I'm trying to develop, in a completely objective fashion, a sort of proof that make it irrefutable that F@h was the first DC project to have an SMP client. We already know that F@h was the first for MPI. I found a publication (which I added as a citation to that statement in the Aqua@home article) which clearly states that Aqua@home was the first DC project under BOINC to have a SMP client to use threads.

It's unfortunate that many mixes-up the difference between a Client and an Application, FAH calls their applications for "cores", so the statement about Aqua@home isn't really accurate. Under BOINC you don't need a separate SMP-client, this is a limitation of FAH's clients that is finally fixed with FAH's v7-client.

We know that F@h's SMP first came out sometime in November 2006, and was definitely out by the 14th. Thus, I still need to find if there were any non-BOINC project which used threads before Nov 1st 2006. Is this search necessary? If MPI == SMP, then that's enough proof as it is. Either way, things are indicating that F@h was the first, but I just want to be absolutely sure. Thanks in advance for any help you can provide.

Well, I'm a windows-only user, so if not mistaken a fairly accurate timeline under Windows would be something like this:

November 2006 - FAH delivers 1st. open beta of their SMP-v5-client + SMP-beta-"core", using MPI.
2007 - FAH delivers various new beta-clients and probably also newer beta-"cores", all using MPI.
2008 - FAH continues with beta-clients.
April 2008 - alpha/beta-clients of BOINC includes features to correctly schedule SMP-applications, same for server-code.
December 2008 - Aqua@home opens-up (*) the project, but at the start has only non-SMP-applications.
16. December 2008 - BOINC releases the v6.4.5-client, the 1st. non-beta-client that also supports SMP-applications (and CUDA).
2009 - FAH continues with beta-clients.
29. May 2009 - No BOINC-projects has used SMP-application yet (except for internal testing). (**)
4. June 2009 - Aqua@home has made their SMP-application "publicly available". (***)
Later 2009 - Aqua@home releases the 1st. non-beta SMP-application.
27. July 2010 - FAH makes the v6.30 beta-client available, does not need to use MPI any longer.
2011 - FAH continues with beta-clients.
2012 - FAH finally releases their 1st. non-beta client that also supports SMP.

(*): According to BoincStats there was 1000+ users joined Aqua@home by 14. December 2008, so atleast to me this indicates an open project.
(**): Source, post by David Anderson, http://www.archivum.info/boinc_projects ... BOINC.html
(***): Source, post by Kamran Karimi from Aqua@home, http://www.archivum.info/boinc_projects ... tions.html

Post by **Jesse_V** » Mon Jan 02, 2012 12:26 am

Thanks for that research! That really clarifies the Aqua@home situation. I think it's reasonable to compare all SMPs to each other, regardless of whether they are in beta or not. Also, please check the publication cited by the Aqua@home Wikipedia article for the exact phrasing of that statement. I'm glad you brought in the context though. We still need to figure out if there were any non-BOINC clients/applications/cores that used threads before Nov 2006. I have yet to further research the distributed.net client I talked about before, but I have full intentions of discovering if it really was a SMP client.

Folding Forum

Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?

Re: Is F@h the 1st DC project to use SMP and PS3?