Project 5801 issues. [Should be Offline]
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 74
- Joined: Thu Jul 03, 2008 12:43 pm
- Hardware configuration: Home Network:
ADSL 12Mbps - 807 / 14.439
USRobotics 8port 10/100/1000
All computers connected with TP CAT5
HC1:
E8400@4GHz(8*500)
4GB PC2-8000
9800GX2@stock
removed - (8800GTS-512MB@724/1810/972)
Mist 600W rev2
WinXP Pro 32bit SP3
FW180.43
1xGPU2 v6.20 R1 Core 1.18/1.19
1xSMP v6.23 Beta R1
HC2:
AM2+ x4 Phenom 9950@3GHz
2GB Crucial Ballistix PC2-5300
3x8800GS-384MB
Corsair TX 750W
WinXP Pro 32bit SP3
FW178.24
3xGPU2 v6.20 R1 Core 1.18
1xSMP v6.23 Beta R1
HC3: (not folding atm and outdated)
X2 6000+@Stock
HD3850OC-512MB@783MHz
2GB PC6400
PSU 420W
WinXP Pro 32bit SP3
CCC8.6
1*GPU2 v6.12 Beta8 Core 1.04
1*CPU 5.04
Office Network:
SDSL 20Mbps
100/1000
All computers connected with TP CAT5
OC1:
E6750@stock
8800GT-256MB@702/1755/900
4GB PC6400
Tagan 480W
Vista Ultimate 32bit SP1
FW 178.24
1xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1
OC2:
E8400@stock
2x8800GT-512MB@stock
8GB PC3-8500
Corsair HX520
Vista Business 64bit
FW178.24
2xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1 - Location: Norway
Re: Project 5801 issues. [Should be Offline]
And this is understandable Mr. Pande. Thanks for your feedback
Re: Project 5801 issues. [Should be Offline]
Let's hope the QA process sees some vast improvements
Having issues with 5 GPUs here.
Having issues with 5 GPUs here.
-
- Posts: 74
- Joined: Thu Jul 03, 2008 12:43 pm
- Hardware configuration: Home Network:
ADSL 12Mbps - 807 / 14.439
USRobotics 8port 10/100/1000
All computers connected with TP CAT5
HC1:
E8400@4GHz(8*500)
4GB PC2-8000
9800GX2@stock
removed - (8800GTS-512MB@724/1810/972)
Mist 600W rev2
WinXP Pro 32bit SP3
FW180.43
1xGPU2 v6.20 R1 Core 1.18/1.19
1xSMP v6.23 Beta R1
HC2:
AM2+ x4 Phenom 9950@3GHz
2GB Crucial Ballistix PC2-5300
3x8800GS-384MB
Corsair TX 750W
WinXP Pro 32bit SP3
FW178.24
3xGPU2 v6.20 R1 Core 1.18
1xSMP v6.23 Beta R1
HC3: (not folding atm and outdated)
X2 6000+@Stock
HD3850OC-512MB@783MHz
2GB PC6400
PSU 420W
WinXP Pro 32bit SP3
CCC8.6
1*GPU2 v6.12 Beta8 Core 1.04
1*CPU 5.04
Office Network:
SDSL 20Mbps
100/1000
All computers connected with TP CAT5
OC1:
E6750@stock
8800GT-256MB@702/1755/900
4GB PC6400
Tagan 480W
Vista Ultimate 32bit SP1
FW 178.24
1xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1
OC2:
E8400@stock
2x8800GT-512MB@stock
8GB PC3-8500
Corsair HX520
Vista Business 64bit
FW178.24
2xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1 - Location: Norway
Re: Project 5801 issues. [Should be Offline]
And implement that you always reQA a Project on the latest forced core, before you distribute the project. Record what core the Project was QAed on so you know if you have to reQA it before release.
Re: Project 5801 issues. [Should be Offline]
Precisely... nothing revolutionary... even if just a couple WUs were run, this problem would have been evident and halted before it ever became a problem.theo343 wrote:And implement that you always reQA a Project on the latest forced core, before you distribute the project. Record what core the Project was QAed on so you know if you have to reQA it before release.
Re: Project 5801 issues.
Mr. Pande has the patients of a saint.VijayPande wrote:We keep an eye on the forum, but the first post was just a few hours ago. Due to staff having other responsibilities, our response will typically be on the hours time scale not minutes time scales for issues like this. I wish it could be faster, but that's what we're staffed to do at the moment.
I do have this issue with two GPU (Nvidia) machines. After about 6 hours a project 5506 unit was finally sent out successfully. The UNSTABLE_MACHINE issue with the project 5801 unit persists. Any recommendations?
Re: Project 5801 issues. [Should be Offline]
I've got 15 Nvidia gpu's that I have to restart periodically to dump this wu. I'm ready for things to get back to normal(whatevr that is)
-
- Posts: 179
- Joined: Sun Dec 02, 2007 6:40 am
- Location: Team_XPS ..... OC, S. Calif
Re: Project 5801 issues.
...
To V.P. aka Dr. Pande aka Vijay Pande ... much obliged sir, and God Bless.
Peace
To V.P. aka Dr. Pande aka Vijay Pande ... much obliged sir, and God Bless.
Peace
VijayPande wrote:PS In case you're curious:
This was beta tested before (this was a project # change due to a move onto a new server -- which was done to try to keep work around while the CS servers were down).MoneyGuyBK wrote: I am surprised that:
1) F@H released this WU in such a bad stateWe keep an eye on the forum, but the first post was just a few hours ago. Due to staff having other responsibilities, our response will typically be on the hours time scale not minutes time scales for issues like this. I wish it could be faster, but that's what we're staffed to do at the moment.However, more stumped that:
2) F@H has not chimed in here officially after 7 Pages of comments
T.E.A.M. “Together Everyone Accomplishes Miracles!”
OC, S. California ... God Bless All
OC, S. California ... God Bless All
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Project 5801 issues.
Two words: regression testingVijayPande wrote:PS In case you're curious:This was beta tested before (this was a project # change due to a move onto a new server -- which was done to try to keep work around while the CS servers were down).MoneyGuyBK wrote: I am surprised that:
1) F@H released this WU in such a bad state
This makes it all the more shocking just how broken the nVidia core 1.15 is.
Re: Project 5801 issues. [Should be Offline]
Well I come back home from work & see the p5801's have been pulled, switch on machines, flush all bad work units & we are up & running again.
I have NO server issues at the moment, all units have been returned safely to their servers & I see nothing but green ink in Fahspy.
Congratulations Vijay & co... I can rest easy for now that all my Linux SMP, Windows SMP, my standard clients,my ATI clients & especially my Nvidia clients
are happy for now.
Cheers Teddy
I have NO server issues at the moment, all units have been returned safely to their servers & I see nothing but green ink in Fahspy.
Congratulations Vijay & co... I can rest easy for now that all my Linux SMP, Windows SMP, my standard clients,my ATI clients & especially my Nvidia clients
are happy for now.
Cheers Teddy
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Project 5801 issues. [Should be Offline]
Well ... I think you missed at least one of the QA steps ...VijayPande wrote:Sorry about the really nasty problem on this one. It was definitely strange since these WU's were QA'd before. I think this may be an issue where they were QA'd on an earlier core and 1.15 is causing issues.
p5800 was fully tested through the whole QA process ... but not the p5801
-
- Posts: 74
- Joined: Thu Jul 03, 2008 12:43 pm
- Hardware configuration: Home Network:
ADSL 12Mbps - 807 / 14.439
USRobotics 8port 10/100/1000
All computers connected with TP CAT5
HC1:
E8400@4GHz(8*500)
4GB PC2-8000
9800GX2@stock
removed - (8800GTS-512MB@724/1810/972)
Mist 600W rev2
WinXP Pro 32bit SP3
FW180.43
1xGPU2 v6.20 R1 Core 1.18/1.19
1xSMP v6.23 Beta R1
HC2:
AM2+ x4 Phenom 9950@3GHz
2GB Crucial Ballistix PC2-5300
3x8800GS-384MB
Corsair TX 750W
WinXP Pro 32bit SP3
FW178.24
3xGPU2 v6.20 R1 Core 1.18
1xSMP v6.23 Beta R1
HC3: (not folding atm and outdated)
X2 6000+@Stock
HD3850OC-512MB@783MHz
2GB PC6400
PSU 420W
WinXP Pro 32bit SP3
CCC8.6
1*GPU2 v6.12 Beta8 Core 1.04
1*CPU 5.04
Office Network:
SDSL 20Mbps
100/1000
All computers connected with TP CAT5
OC1:
E6750@stock
8800GT-256MB@702/1755/900
4GB PC6400
Tagan 480W
Vista Ultimate 32bit SP1
FW 178.24
1xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1
OC2:
E8400@stock
2x8800GT-512MB@stock
8GB PC3-8500
Corsair HX520
Vista Business 64bit
FW178.24
2xGPU2 v6.20 R1 Core 1.15
1xSMP v6.23 Beta R1 - Location: Norway
Re: Project 5801 issues. [Should be Offline]
The sad thing about this is that half of my GPU folders(3 of 7 cards in total) will be dead in the water for 24 hours or more as i cannot reach them until tomorrow. (to much chaos on the roads today so im working from the homeoffice).
Those 3 cards are also the most powerful. This P5801 thing was extremly bad timing for me, as Ive been working my arse off with the clients the last couple of weeks to be competetive with a couple of guys on my team. I was just knifing and was ready to pass. I can now say goodbye to that aspect as my PPD statistic will plummit with only half my PPD for more than 24 hours and the other guys have access to all foldingmachines and have lost minimal PPD during these problems.
EDIT:
I also wounder how many Nvidia GPUs that will lay dead in the water for 24 hours or more, in total, because of the P5801 distribution.
I truly hope the QA procedure will get some improvements after this blunder.
Those 3 cards are also the most powerful. This P5801 thing was extremly bad timing for me, as Ive been working my arse off with the clients the last couple of weeks to be competetive with a couple of guys on my team. I was just knifing and was ready to pass. I can now say goodbye to that aspect as my PPD statistic will plummit with only half my PPD for more than 24 hours and the other guys have access to all foldingmachines and have lost minimal PPD during these problems.
EDIT:
I also wounder how many Nvidia GPUs that will lay dead in the water for 24 hours or more, in total, because of the P5801 distribution.
I truly hope the QA procedure will get some improvements after this blunder.
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: Project 5801 issues. [Should be Offline]
5801 was just a copy of another project, which did go all the way through QA. Nevertheless, I will have a talk with the responsible parties about this.toTOW wrote:Well ... I think you missed at least one of the QA steps ...VijayPande wrote:Sorry about the really nasty problem on this one. It was definitely strange since these WU's were QA'd before. I think this may be an issue where they were QA'd on an earlier core and 1.15 is causing issues.
p5800 was fully tested through the whole QA process ... but not the p5801
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: Project 5801 issues.
1.15 passed all of the regression testing on machines at Stanford and NVIDIA and then passed FAH beta testing. There's not much more we can do than that before releasing it. Keep in mind that we now know that for many people (some boards), 1.15 is perfectly fine and stable, whereas for others, it doesn't work at all. If that's the case, my guess is that this is a CUDA or hardware issue. If the code in 1.15 were really broken, it would not work on any hardware, which is definitely not the case. We're working with NVIDIA on this one. The first step is to get the problem reproducible in their labs.shatteredsilicon wrote: Two words: regression testing
This makes it all the more shocking just how broken the nVidia core 1.15 is.
The bottom line here is that it is becoming clear that what works on some CUDA hardware platforms does not universally work on all. We have since gotten a few of the boards that cause problems and have included them in our recent testing.
-
- Posts: 1579
- Joined: Fri Jun 27, 2008 2:20 pm
- Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot - Location: The Netherlands
- Contact:
Re: Project 5801 issues.
So does this mean CUDA isn't compatible with all hardware which is supposed to be compatible with it, or does it point to the implementation of CUDA by the clients isn't compatible with all hardware? Or is it to soon to tell? I would hope it's the last option, as in the first case I'm afraid you don't have the same expedience in getting it sortedVijayPande wrote:The bottom line here is that it is becoming clear that what works on some CUDA hardware platforms does not universally work on all. We have since gotten a few of the boards that cause problems and have included them in our recent testing.
Re: Project 5801 issues.
Technically, if the same code work on certain cards but not on others, we can look at the driver or hardware level. However, the core is partly to be responsible of this as well so it's a two-side work to find out what wrong (NVIDIA with the CUDA code and PG with the core). This is what make debugging of this issue very hard.MtM wrote:So does this mean CUDA isn't compatible with all hardware which is supposed to be compatible with it, or does it point to the implementation of CUDA by the clients isn't compatible with all hardware? Or is it to soon to tell? I would hope it's the last option, as in the first case I'm afraid you don't have the same expedience in getting it sortedVijayPande wrote:The bottom line here is that it is becoming clear that what works on some CUDA hardware platforms does not universally work on all. We have since gotten a few of the boards that cause problems and have included them in our recent testing.
Think of a car engine choking under load. The cause can be multiple from fuel quality, air quality, timing adjustement, ECU programming, mechanical problem or else so it take lots of diagnostic to find out what went wrong.