Folding Forum

Posted: **Mon Feb 07, 2011 8:39 am**

Hi,

I know this is a bit of a tough question, and probably deserves a wiki page or something other than a post, but I'll start here.

OS: Linux
Version of FAH: Latest

I have been requested to look in to the potential of setting up a folding at home cluster, and have an existing grid management too that I need to integrate it with.

What is the current length of time needed to process a work unit on say a 2 cores of a Xenon x5500 series processor? I need to keep the cores in use down per system so I may need to run multiple independent copies of the client to promote work sharing with other cluster jobs.

Are there any things I should be aware of when setting up a large supercomputer type job?

What scheduling systems have you used, what worked and didn't?

Thanks in advance!

Posted: **Mon Feb 07, 2011 5:48 pm**

Welcome to foldingforum.org, mangler.

Yes, it's a tough question. FAH is not designed to run on clusters but inasmuch as each node can be used independently, each node can be considered an independent computer as long as there's some scheduling plan. Stanford does not provide support for scheduling scripts but you might find other forum members who have some experience in that sort of thing, so it's an excellent question.

I suggest you start by reading the other topics mentioning the word "cluster" but I don't remember that theyre is any specific help with that question.

Does your Linux have both 64- and 32-bit libraries?

The duration of a job is a general question, though. Fundamentally there are several classifications of WUs and most run on Linux. The basic question is how many cpus/threads are on a typical node? Nodes which have a uniprocessor are treated quite differently than nodes that have smp-capable CPUs. FAH makes excellent use of an i7/Xeon that can run 8 or more threads locally but does not support inter-node communications. Nodes with dual or quad processors are in an intermediate class. If your cluster is non-uniform, it can get pretty tricky.

Posted: **Mon Feb 07, 2011 6:28 pm**

Hi Bruce:

Yes, 64 and 32bit libraries are available

Unfortunately due to the other things on the cluster, we schedule each core independently if at all possible to keep our task churn rate as high as possible. I am currently using the -smp 2 flag to allow this.

Currently I am using the bigmem workunits, but if the smaller ones would be shorter, I would prefer shorter quicker job sets, as if I have to stop folding on a node for some reason, it may not get back to that node for several weeks, resulting in wasted scientific effort.

Thanks for any advice.

Posted: **Mon Feb 07, 2011 7:05 pm**

Big packets (in the client configuration) is different than -bigadv in the parameter list and I don't know which one you mean by bigmem. Specifying -smp 2 is reasonable if all your nodes are dual processors but can be a problem if some are Hyperthreaded single core/dual thread chips.

Most of the deadlines for -smp are around 4 to 6 days and anything that exceeds the deadline will be discarded by the server. The actual processing speed depends mostly on the GFLOPS capability allocated to the FahCore. If the node is a Quad, for example, limiting it to -smp 2 will only use half of the resources.

The use of the -oneunit flag will end the client whenever a WU is finished rather than downloading a new assignment.

Posted: **Mon Feb 07, 2011 8:49 pm**

You may find that running the uniprocessor client is a better fit. They tend to be much smaller then the smp WU's allowing for a higher task churn rate. Also, the deadlines tend to be on the order of a month or more so if you can't get back to the specific WU for a significant time then the WU is much less likely to have reached the deadline when you do get to it. You can also run multiple uniprocessor clients to fill up a node. The problems are that it is unlikely that you will make as much PPD as the smp WU's and there is a limit of 16 unique machineID's per machine. I'm just saying that it may be worth considering.

Posted: **Mon Feb 07, 2011 11:17 pm**

I agree wit P5, the CPU client is probably better for non-dedicated nodes. And I doubt the points are the biggest concern.

P5-133XL wrote:...and there is a limit of 16 unique machineID's per machine...

No limit on Linux clients.

Posted: **Tue Feb 08, 2011 8:11 am**

You might want to read this topic on the same subject (including a comment from VijayPande):
viewtopic.php?f=55&t=17373

Folding Forum

Scheduling single work units on a cluster

Scheduling single work units on a cluster

Re: Scheduling single work units on a cluster

Re: Scheduling single work units on a cluster

Re: Scheduling single work units on a cluster

Re: Scheduling single work units on a cluster

Re: Scheduling single work units on a cluster

Re: Scheduling single work units on a cluster