In short, you don't know either?Technically, if the same code work on certain cards but not on others, we can look at the driver or hardware level. However, the core is partly to be responsible of this as well so it's a two-side work to find out what wrong (NVIDIA with the CUDA code and PG with the core). This is what make debugging of this issue very hard.
Project 5801 issues. [Should be Offline]
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 1579
- Joined: Fri Jun 27, 2008 2:20 pm
- Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot - Location: The Netherlands
- Contact:
Re: Project 5801 issues. [Should be Offline]
Lol @ car analogy
Re: Project 5801 issues. [Should be Offline]
Nope, I'm a programmer by trade myself and I'm usually good at narrowing the possible causes but the current issue is so hard to narrow. I tried to find trends but it's so varied that I'm really stumped but the only thing I'm sure is that the 8600 series seems to have more problems than othersn and the GTX 2xx series have zero problems. Beside this, it's hit or miss between these 2 series.
-
- Posts: 1579
- Joined: Fri Jun 27, 2008 2:20 pm
- Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot - Location: The Netherlands
- Contact:
Re: Project 5801 issues. [Should be Offline]
What struck me, was that with the 1.15 core, the efficiency was greatly enhanced but also the eue rate was over the top and I speculated about it being a local cache problem, seeing how cards with slower ram seem to be affected more ( thinking, cache on the simd units isn't filled fast enough, so next instruction treis to fetch data which isn't there yet ).Xilikon wrote:Nope, I'm a programmer by trade myself and I'm usually good at narrowing the possible causes but the current issue is so hard to narrow. I tried to find trends but it's so varied that I'm really stumped but the only thing I'm sure is that the 8600 series seems to have more problems than othersn and the GTX 2xx series have zero problems. Beside this, it's hit or miss between these 2 series.
I'm a hobbyist programmer, certainly not into low level languages so I'm using allot of wet finger work on that assumption.
Re: Project 5801 issues. [Should be Offline]
Your guess is as good as mine but I doubt it's really the case since 9800GTX+ also have problems (almost at the same rate as the 8600GT) and they should be using faster memory. Right now, there is no enough debug message to pinpoint to the exact cause.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Project 5801 issues.
Which hardware does it work on perfectly? I don't remember any one particular model not listed as experiencing problems. Pretty much the entire G8x/G9x line-up appear to have been listed by users as affected at stock clock speeds, which indicates a more systematic than random failure. I'm quite curious to know which boards you were originally doing the testing with, if you are saying the error is not reproducible on them.VijayPande wrote:1.15 passed all of the regression testing on machines at Stanford and NVIDIA and then passed FAH beta testing. There's not much more we can do than that before releasing it. Keep in mind that we now know that for many people (some boards), 1.15 is perfectly fine and stable, whereas for others, it doesn't work at all. If that's the case, my guess is that this is a CUDA or hardware issue. If the code in 1.15 were really broken, it would not work on any hardware, which is definitely not the case. We're working with NVIDIA on this one. The first step is to get the problem reproducible in their labs.shatteredsilicon wrote: Two words: regression testing
This makes it all the more shocking just how broken the nVidia core 1.15 is.
The bottom line here is that it is becoming clear that what works on some CUDA hardware platforms does not universally work on all. We have since gotten a few of the boards that cause problems and have included them in our recent testing.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Project 5801 issues. [Should be Offline]
This is the part where one has to ask - how big is the diff between 1.09 and 1.15. Has the compiler been changed/updated between the two? There is only going to be so much code in there that could be causing the problem to manifest itself.Xilikon wrote:Nope, I'm a programmer by trade myself and I'm usually good at narrowing the possible causes but the current issue is so hard to narrow. I tried to find trends but it's so varied that I'm really stumped but the only thing I'm sure is that the 8600 series seems to have more problems than othersn and the GTX 2xx series have zero problems. Beside this, it's hit or miss between these 2 series.
Re: Project 5801 issues. [Should be Offline]
That's a very very good question...shatteredsilicon wrote:This is the part where one has to ask - how big is the diff between 1.09 and 1.15. Has the compiler been changed/updated between the two? There is only going to be so much code in there that could be causing the problem to manifest itself.Xilikon wrote:Nope, I'm a programmer by trade myself and I'm usually good at narrowing the possible causes but the current issue is so hard to narrow. I tried to find trends but it's so varied that I'm really stumped but the only thing I'm sure is that the 8600 series seems to have more problems than othersn and the GTX 2xx series have zero problems. Beside this, it's hit or miss between these 2 series.
The reason the PG is so eager to kick 1.09 in the curb is because they are putting lots of time on the new Lambda units, which is where real science is done. Those units unfortunately fail a lot with 1.09 so they are struck working on smaller and less useful units. However, the newer core break a lot of small units so I'm sure there is something caused by the compiler or some code change.
-
- Posts: 1579
- Joined: Fri Jun 27, 2008 2:20 pm
- Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot - Location: The Netherlands
- Contact:
Re: Project 5801 issues. [Should be Offline]
Could still be memory related, speed is only once aspect, latency is another. Been waiting to hear something about that. had a discusion about the 9600gso's, you got a 512mb version with gddr2 which only get's <4K ppd while the other variants all get over 5k easy. Same number of stream processors, and when clocked the same still that much diffrence while everyone always said memory does not matter for folding.Xilikon wrote:Your guess is as good as mine but I doubt it's really the case since 9800GTX+ also have problems (almost at the same rate as the 8600GT) and they should be using faster memory. Right now, there is no enough debug message to pinpoint to the exact cause.
Wet finger work I know, don't read to much into it.
-
- Posts: 74
- Joined: Thu Jul 03, 2008 12:43 pm
- Hardware configuration: Home Network:
ADSL 12Mbps - 807 / 14.439
USRobotics 8port 10/100/1000
All computers connected with TP CAT5
HC1:
E8400@4GHz(8*500)
4GB PC2-8000
9800GX2@stock
removed - (8800GTS-512MB@724/1810/972)
Mist 600W rev2
WinXP Pro 32bit SP3
FW180.43
1xGPU2 v6.20 R1 Core 1.18/1.19
1xSMP v6.23 Beta R1
HC2:
AM2+ x4 Phenom 9950@3GHz
2GB Crucial Ballistix PC2-5300
3x8800GS-384MB
Corsair TX 750W
WinXP Pro 32bit SP3
FW178.24
3xGPU2 v6.20 R1 Core 1.18
1xSMP v6.23 Beta R1
HC3: (not folding atm and outdated)
X2 6000+@Stock
HD3850OC-512MB@783MHz
2GB PC6400
PSU 420W
WinXP Pro 32bit SP3
CCC8.6
1*GPU2 v6.12 Beta8 Core 1.04
1*CPU 5.04
Office Network:
SDSL 20Mbps
100/1000
All computers connected with TP CAT5
OC1:
E6750@stock
8800GT-256MB@702/1755/900
4GB PC6400
Tagan 480W
Vista Ultimate 32bit SP1
FW 178.24
1xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1
OC2:
E8400@stock
2x8800GT-512MB@stock
8GB PC3-8500
Corsair HX520
Vista Business 64bit
FW178.24
2xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1 - Location: Norway
Re: Project 5801 issues. [Should be Offline]
Thanks againVijayPande wrote:5801 was just a copy of another project, which did go all the way through QA. Nevertheless, I will have a talk with the responsible parties about this.
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: Project 5801 issues.
GTX260 and GTX280 seem to be running fine. Also, it looks like certain boards in the previous generations do work for certain people. I can't tell what's the difference between these boards and whether this is hardware or drivers. What is clear is that the same CUDA code works perfectly fine (no EUE's, running reliably) on some boards and not at all on others. This is very different than what we'd be dealing with on the CPU side.shatteredsilicon wrote: Which hardware does it work on perfectly? I don't remember any one particular model not listed as experiencing problems. Pretty much the entire G8x/G9x line-up appear to have been listed by users as affected at stock clock speeds, which indicates a more systematic than random failure. I'm quite curious to know which boards you were originally doing the testing with, if you are saying the error is not reproducible on them.
-
- Posts: 34
- Joined: Wed Oct 29, 2008 1:00 am
- Location: UK
Re: Project 5801 issues. [Should be Offline]
To throw my penny in the ring, out of all the cards I fold with which is 2x 8800GT, 9600GT, 9500GT and a 9800GTX+, it's only the 9800GTX+ that has given any errors, so I think as many other 9800GTX+ users have had the same problems I think it's a good candidate to be a test card.Xilikon wrote:Your guess is as good as mine but I doubt it's really the case since 9800GTX+ also have problems (almost at the same rate as the 8600GT) and they should be using faster memory. Right now, there is no enough debug message to pinpoint to the exact cause.
I have the feeling that if the core can be stable on that and the 8600GT/S, then it'll be stable..!, something a bit odd for sure.
Re: Project 5801 issues. [Should be Offline]
Yes, I agree that the PG should get the 8600GT and the 9800GTX+ card in the labs and work to get the core stable with those cards. I bet that when they succeed at this, everything else should work fine.powerarmour wrote:To throw my penny in the ring, out of all the cards I fold with which is 2x 8800GT, 9600GT, 9500GT and a 9800GTX+, it's only the 9800GTX+ that has given any errors, so I think as many other 9800GTX+ users have had the same problems I think it's a good candidate to be a test card.Xilikon wrote:Your guess is as good as mine but I doubt it's really the case since 9800GTX+ also have problems (almost at the same rate as the 8600GT) and they should be using faster memory. Right now, there is no enough debug message to pinpoint to the exact cause.
I have the feeling that if the core can be stable on that and the 8600GT/S, then it'll be stable..!, something a bit odd for sure.
Re: Project 5801 issues. [Should be Offline]
The 512mb 9600gso from Asus has 128 bit memory. All other 9600gso cards that I have seen, 384mb or 768mb, have 192 bit memory.MtM wrote:Could still be memory related, speed is only once aspect, latency is another. Been waiting to hear something about that. had a discusion about the 9600gso's, you got a 512mb version with gddr2 which only get's <4K ppd while the other variants all get over 5k easy. Same number of stream processors, and when clocked the same still that much diffrence while everyone always said memory does not matter for folding.Xilikon wrote:Your guess is as good as mine but I doubt it's really the case since 9800GTX+ also have problems (almost at the same rate as the 8600GT) and they should be using faster memory. Right now, there is no enough debug message to pinpoint to the exact cause.
Wet finger work I know, don't read to much into it.
That is a pretty significant difference and probably accounts for the ppd difference. But memory speed variances on cards with 192 bit memory does not make a big difference. They are all at least in the same league.
-
- Posts: 1579
- Joined: Fri Jun 27, 2008 2:20 pm
- Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot - Location: The Netherlands
- Contact:
Re: Project 5801 issues. [Should be Offline]
But it's still very weird to see such a diffrence when the memory isn't supposed to have an influencegaster wrote:The 512mb 9600gso from Asus has 128 bit memory. All other 9600gso cards that I have seen, 384mb or 768mb, have 192 bit memory.MtM wrote:Could still be memory related, speed is only once aspect, latency is another. Been waiting to hear something about that. had a discusion about the 9600gso's, you got a 512mb version with gddr2 which only get's <4K ppd while the other variants all get over 5k easy. Same number of stream processors, and when clocked the same still that much diffrence while everyone always said memory does not matter for folding.Xilikon wrote:Your guess is as good as mine but I doubt it's really the case since 9800GTX+ also have problems (almost at the same rate as the 8600GT) and they should be using faster memory. Right now, there is no enough debug message to pinpoint to the exact cause.
Wet finger work I know, don't read to much into it.
That is a pretty significant difference and probably accounts for the ppd difference. But memory speed variances on cards with 192 bit memory does not make a big difference. They are all at least in the same league.
Re: Project 5801 issues. [Should be Offline]
It is not about the cards mind you.Xilikon wrote:Yes, I agree that the PG should get the 8600GT and the 9800GTX+ card in the labs and work to get the core stable with those cards. I bet that when they succeed at this, everything else should work fine.
What the exact changes were that went into 1.15 is not known to us. However, I think Prof. Pande pointed out that it is all about getting larger proteins to fold.
If it turns out that some older cards cannot reliably fold larger proteins, and over several hours, then the consequence has to be to stop folding on these cards.
I think this needs to be discussed because should it happen will many people be disappointed and the Pande Group cannot be expected to keep folding only small proteins.