Well, right, and the other interesting thing is that while points are 'fun' I don't really care. The cliff of points for 2.43 days vs 2.4 days is not indicative at all of the value of the calculations being done to the group - obviously the science value of calcs being done in 2.39 vs 2.43 days would be the simple percent difference, i.e., something like 1% less valuable because they come in around 1% later. I'm not a point fanatic at all, just want to help the overall process taking place.
That said, I do look at points to see what the group wants from me. So as soon as I stop getting 6903s I'll turn off bigadv because what I'm 'hearing' from the points system is that they don't want a single 2687W processor running 8101s.
As to your question, and the question for the poll, I would say yeah, if a server misses a deadline X times (where X is maybe 3? Once isn't enough, maybe the person paused folidng for some reason or something happened) it shouldn't resend a project of the same magnitude to that computer. But then I would also suggest in the Control app there is a place to hit a button that sort of means 'reset what you think of me' in case someone swaps out a processor or RAM or whatever and they want to get re-graded.
So, I do things along these lines for work, and really this is a simple situation. You want the right hardware for the right job, supercomputer facilities typically have several supercomputers and they each have different advantages and disadvantages all of which appeal to slightly different problems, and the researchers need to try and match problems to hardware as much as is possible. In this situation the issue is how coarse is that calc? So we have: 1) single thread problems, 2) smp, 3) bigadv (note I'm ignoring beta to keep it simple). So if you bin everything into 3 cases, then that is all the resolution you will get in terms of efficiency. Modelling a sine wave with only 3 bins is too coarse. Maybe as time goes on there will be 10 gradations and they will just be called 1,2,3, etc to 10? How about 100? Maybe when someone starts folding there will be an app that runs for 3 - 5 minutes and decides what your number is? People at the borders of categories will always be annoyed if they look into the detail of it and discover they are at the border. Forget folding at home for a minute, just think about the largest distributed computer system in the world and how it should work and then slowly plan a course to get there. I would think it would be able to make use efficiently of all computers between some minimum config and the largest systems that would conceivably be used.
I think the specific question you mention above is no good - too simple - but in the short term we still have to live, eat, and fold. So perhaps we have to assume for now that points are aligned with scientific value? So, for example, my week of running 8101s at 0.03 days past the deadline is of extremely low value - almost totally worthless

. So, short term, yes, you should have the server block these from people who don't make the deadline.
However, that isn't interesting to me. Long term we have to have a more fluid system. Frankly my 3930k 6-core system at 4.5 GHz beats the 2687W at 8 cores, but the 6-core can't even try an 8101 (from what testing I've done comparing the processors it would come in under the deadline ) and *all* of this core-related stuff and other measures are very very basic inexact ways of classifying a system as has been discussed to death on these forums.
I think the longer term solution is a simple 5-minute gromacs run, and a grade from 1-10 or 1-100 or something. Then as the science guys submit jobs they choose a minimum number required to run that job at the speed that gets everything done reasonably. Also, you might internally say, ooh, this particular thing is super-important (new bird flu or something ravaging the world?) so lets run it only on machines that are 95+ only, you are able to really throttle up or down priority within the work queues you have - lots of control there over what is going on.
If you have that granularity you can also run things even much harder than 8101. And if you had a flexible interface where you can internally assign problems to minimum configs, then if some particular job gets slowed down you can just flip a swtich and give it to 95+ machines and catch it back up. You could use the 90+ machines either for the super-hard problems, or for ones where the queue is getting too long. It would sort of be like a cloud interface adding on-demand elasticity to the system. I think we will have a NUMA-like HP DL980 in here in a few months with 8 2690's and tons of RAM. Machines like that - as they come on board - you take a look at the range of available computers in the upper tiers and that would tell you how hard the problems can be - perhaps much harder than 8101, and having a rating on machines from 1 - 10 or 1 - 100 should be a very powerful tool for all kinds of reasons.
And the best part is that threads like on EVGA or wherever it is (haven't seen them in a while) about spoofing cores and VMWare and all that will magically go away and everyone will stop talking about them

.