Page 5 of 11
Re: 9401 fails on 750ti
Posted: Sat Mar 15, 2014 2:48 am
by tofuwombat
I think this proves that there is a need for a CUDA flag, and/or assignment server tweaks for this card.
7im wrote:tofuwombat wrote:Seems the CUDA core was written well enough to keep working with new CUDA hardware.
OpenCL in Fahcore_17 does not appear to be the same.
. . . If that is unclear there, ask here!
My lack of clarity is the point to my babbling today.
Everyone's help and patience is MUCH appreciated.
Re: 9401 fails on 750ti
Posted: Mon Mar 17, 2014 11:59 pm
by Freightanimal
tofuwombat wrote:I think this proves that there is a need for a CUDA flag, and/or assignment server tweaks for this card.
7im wrote:tofuwombat wrote:Seems the CUDA core was written well enough to keep working with new CUDA hardware.
OpenCL in Fahcore_17 does not appear to be the same.
. . . If that is unclear there, ask here!
My lack of clarity is the point to my babbling today.
Everyone's help and patience is MUCH appreciated.
I second that thought (or some other measure that can keep these cards folding). I am not sure how many of us have the new card. Most of us are most likely not folding with it because of the core 17 issues. I keep deleting the gpu in configuration and add it back to have it continue to try for core 15 work units. I am seeing 15k ppd on core 15 units (I can usually do 3 per day) when I can get them. so far none today. I don't care about the points, I care about the project and the help it can do for science and medicine. I don't game at all. Folding is the biggest reason I bought this card instead of one for 1/3 it's cost (was going to get gt520).
Re: 9401 fails on 750ti
Posted: Tue Mar 18, 2014 12:59 am
by bruce
One of my machines has a GPU that is normally happy folding with Core_17. Recently there was a server problem which made it impossible to download WUs for Core_17. During that time, WUs were available for Core_15 or Core_16. Like you, I have no control over the assignment process. Frankly, I'm glad to be able to fold rather than have my GPU sitting idle, waiting for an assignment from a particular group of projects.
The most important job of the assignment process is to give everyone's hardware something it can do even under extenuating situations when a preferred choice is not available.
Re: 9401 fails on 750ti
Posted: Tue Mar 18, 2014 1:30 am
by Zagen30
Anyone with a 750 (Ti) could temporarily install the v6 GPU client, as it cannot get core 17 projects and would therefore only get core 15.
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 4:22 pm
by Freightanimal
Zagen30 wrote:Anyone with a 750 (Ti) could temporarily install the v6 GPU client, as it cannot get core 17 projects and would therefore only get core 15.
Thanks for the info. I personally wouldn't want to do that. Unfortunately I kind of doubt most people will either. Hopefully they can fix core 17 soon.
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 4:33 pm
by bruce
Freightanimal wrote:Hopefully they can fix core 17 soon.
A specific bug in Core_17 has not been identified so without more information, they're not going to fix anything except to provide better support for Maxwell.
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 7:21 pm
by uddarts
3 weeks and we find out they don't have enough info.
win7 64bit, 3770 running 6 cores and 334.89 drivers.
all core 17 wu fail without engaging the gpu.
ud
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 7:37 pm
by bfromcolo
My system with the 750ti is Ubuntu 12.04 64-bit and a 1045T (6 cores). If there is any information I can provide tell me what you need.
Note my Win 7 system has a 8320 (8 cores) and a 7850, and it has completed 9401s without a problem.
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 7:46 pm
by rwh202
bruce wrote:
A specific bug in Core_17 has not been identified so without more information, they're not going to fix anything.
Can we at least state that there is a bug though? Whether it is Core_17 or driver, 750 Tis do not fold core_17 as it stands. This is trivial to reproduce. Users reeling off their setups here isn't going to help - it just needs debugging by Stanford and, if necessary, tickets raising with nVIDIA.
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 8:07 pm
by folding_hoomer
bruce wrote:Freightanimal wrote:Hopefully they can fix core 17 soon.
A specific bug in Core_17 has not been identified so without more information, they're not going to fix anything.
Do you happen to be folding with 7 CPUs? I beginning to suspect that project 9401 fails with 7 cores and its assignments need to be restricted, but I don't have enough information to propose such a change. My 640ti seems to work just fine and it's unlikely that the problem is JUST the 750ti.
Bruce - your suggestion that -smp7 and WU 9401 could be the reason for any issue might be wrong.
I´m folding under Ubuntu 13.04 with -smp7 (no isuue for half a year) and my GTX670 is ATM folding one 9401 after the other - without any issue, too.
IMO it has something to do with the changed structure of the Maxwell-GPU respectivly the (different) handling of Core17.
Re: 9401 fails on 750ti
Posted: Wed Mar 19, 2014 8:47 pm
by bruce
Yes, it might or might not be smp 7. I could also suggest that the problem is overclocking or defective hardware or a bad set of drivers. I so not have any way to know for sure, nor do I have a system which fails, so for me, my statement that I have no problem is just as valid as your statement that you do have a problem. What's different?
Maxwell is a high probability reason. Development is already working on some issues associated with Maxwell and there's nothing you can do until they finish. If there is anything else that you might change to get your system into production, I think those reasons should be explored, but that's not required if you're certain you know what makes you unique.
Re: 9401 fails on 750ti [Maxwell]
Posted: Thu Mar 20, 2014 3:45 am
by Sam-I-Am
Apparently FahBench (w/ OpenMM 5.1) ran successfully on GTX 750 Ti, with both implicit and explicit solvent.
Does anyone know what's different between FahBench and FahCore17, regarding setting up tasks for the
GPU, before calling OpenMM 5.1?
The issue with GTX 750 Ti might be due to the new "Unified Virtual Memory" feature in Maxwell.
I believe this feature allows CPU code and GPU code to reside in the same virtual memory space.
This is not possible with NVIDIA GPU architecture prior to Maxwell.
Re: 9401 fails on 750ti [Maxwell]
Posted: Thu Mar 20, 2014 6:02 am
by bruce
The FahCore that works on Fermi/Keppler doesn't use that new feature. Since you used the word "allows" which implies that software is not required to use unified memory. How could that prevent the existing code from working?
Re: 9401 fails on 750ti [Maxwell]
Posted: Thu Mar 20, 2014 7:55 am
by rwh202
bruce wrote:I so not have any way to know for sure, nor do I have a system which fails, so for me, my statement that I have no problem is just as valid as your statement that you do have a problem.
It might be valid, but is it relevant? "I have two goldfish" is valid, but hardly relevant.
The topic here is that core_17 does not fold on maxwell. No one has got it working. Maybe every maxwell chip is defective and overclocked to the hilt, but seems unlikely. Does someone at Stanford want a stock 750/ti to test? I'm sure newegg/amazon can get one to them for Monday if you give me an address.
Re: 9401 fails on 750ti [Maxwell]
Posted: Thu Mar 20, 2014 8:20 pm
by Sam-I-Am
bruce wrote:The FahCore that works on Fermi/Kappler doesn't use that new feature. Since you used the word "allows" which implies that software is not required to use unified memory. How could that prevent the existing code from working?
From the published GTX 750 Ti benchmark results from Tom's and AnandTech, I think the problem is probably
unrelated to OpenMM or OpenCL, and it's probably related to CPU <-> GPU communication. Specifically, memory
barrier, and memory coherence come to mind.
To quote Dr. Pande, "... because of how our old core 15 and 16 was written, it was in fact easier for us to write
the core (17) from scratch." Perhaps in the new FahCore17 code, certain assumptions are made, about how future
GPU will communicate with CPU, and these assumptions are now no longer valid in Maxwell. However, comparable
assumptions made in FahCore15 are still valid. Hence FahCore15 still runs fine, though with degraded performance
(due to the fact that FahCore15 does not support OpenMM 5.1).