This may not be the WU fault- 66xx's [NVidia-8400GS]

new08 · Post by **new08** » Tue Aug 24, 2010 3:38 pm

I've installled a new card NV 8400GS which seems to be working ok. NB : It's a PCI [not PCIe] version in use!
When running it sets up ok then fails EUE 5 in a row.
Can anyone say what is the problem- I don't think my m/c is unstable as reported but this card is at the lower end for GPU folding capabilities.
[PS:If there's a better place for this post - please move it.]

Folding@Home GPU Core
[16:06:15] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[16:06:15]
[16:06:15] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:06:15] Build host: amoeba
[16:06:15] Board Type: Nvidia
[16:06:15] Core :

Last attempt....

- Expanded 73800 -> 383588 (decompressed 519.7 percent)
[16:07:12] Called DecompressByteArray: compressed_data_size=73800 data_size=383588, decompressed_data_size=383588 diff=0
[16:07:12] - Digital signature verified
[16:07:12]
[16:07:12] Project: 6600 (Run 9, Clone 892, Gen 213)
[16:07:12]
[16:07:12] Assembly optimizations on if available.
[16:07:12] Entering M.D.
[16:07:18] Tpr hash work/wudata_05.tpr: 1395416845 223547143 2392889429 3914747237 1484695016
[16:07:18]
[16:07:18] Calling fah_main args: 14 usage=100
[16:07:18]
[16:07:20] Working on Protein
[16:07:21] mdrun_gpu returned
[16:07:21] Going to send back what have done -- stepsTotalG=0
[16:07:21] Work fraction=0.0000 steps=0.
[16:07:25] logfile size=9156 infoLength=9156 edr=0 trr=25
[16:07:25] + Opened results file
[16:07:25] - Writing 9694 bytes of core data to disk...
[16:07:25] Done: 9182 -> 3328 (compressed to 36.2 percent)
[16:07:25] ... Done.
[16:07:25] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[16:07:25]
[16:07:25] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:07:28] CoreStatus = 7A (122)
[16:07:28] Sending work to server
[16:07:28] Project: 6600 (Run 9, Clone 892, Gen 213)

[16:07:28] + Attempting to send results [August 24 16:07:28 UTC]
[16:07:28] Gpu type=2 species=0.
[16:07:29] + Results successfully sent
[16:07:29] Thank you for your contribution to Folding@Home.
[16:07:33] EUE limit exceeded. Pausing 24 hours.

sortofageek · Post by **sortofageek** » Tue Aug 24, 2010 3:40 pm

So far you are the only one who has returned any results on this one.

Project: 6600 (Run 9, Clone 892, Gen 213)

new08 · Post by **new08** » Tue Aug 24, 2010 3:49 pm

OK- so it may be co-incidence, but I'll make sure I do a reboot before a retry occurs tomorrow.
This seems to be posted as a cure for some GPU issues -connected with failed units.
It doesn't look that far from running on though- so I remain hopeful, if realistic

PS: My video card driver is version 6.14.12.5896

sortofageek · Post by **sortofageek** » Tue Aug 24, 2010 3:56 pm

The evidence is not yet conclusive as to whether that is a bad WU or not. We need at least three failures to mark it as a bad one and if it is actually a good WU it will take time for somebody else to receive it, complete it and send it home, so we may not know for awhile.

new08 · Post by **new08** » Tue Aug 24, 2010 4:05 pm

OK- There's no way to purge the WU is there? - or many more would bounce back!
I'll sit tight for now and see if any more reports come in on this one.
As I have no history on this GPU work -that's all I can do for now.
I can still run the CPU units- I don't think there's a clash there anywhere, inefficient as they are.

sortofageek · Post by **sortofageek** » Tue Aug 24, 2010 4:09 pm

Yes, the WU is removed when a mod or Pande Group member marks it bad. We need to be certain first, however.

Post by **toTOW** » Tue Aug 24, 2010 8:35 pm

The very low end GPU are known to cause unexplained errors on some WUs ... that's probably what you are seeing ...

new08 · Post by **new08** » Wed Aug 25, 2010 12:42 pm

Yes, I wondered that as I was warned that various functions on the 8400 were marginal. Fortunately, the card is a better performer in general on my PC so it was not wasted money @£30 but a shame if it can't be pulled in for some X10 speed work on folding.
Are there WUs that are kinder, so to speak, for cards with unquantified limitations?
Further, can I check occasionally to see if WUs will run on better, now and then, without predudicing project targets?
Are they relocated right away on GPU work that fails?
I notice that 3 [or so] 6000 series WUs have tried to set up on this, so far, failed trial.

Post by **bruce** » Wed Aug 25, 2010 4:46 pm

sortofageek wrote:So far you are the only one who has returned any results on this one.

Project: 6600 (Run 9, Clone 892, Gen 213)

As expected, the WU was assigned to someone else who completed it successfully.

I don't know any reasonable way to exclude assigning p6600 to your machine -- nor do I know if the problem is more widespread or "just you"

What are the other WUs that you've had trouble with?

new08 · Post by **new08** » Wed Aug 25, 2010 6:32 pm

Good news Bruce!
I read a few items on the 8400 issues on here and decided to optomise my card using NV utility which raised the figures noticeably ie: Core to 795, Mem to 498 and Shader to 1829 ~ about a 30% lift, running on load- F@H at 75 degC.
The work unit that loaded on retry was luckily 5772 [not 66xx at least] and is now running @ 550ppd.

So, It can be done ! albeit with a bit of palm oil...
Of course the unit has yet to finish successfully- but it's looking pretty OK up to now and my weekly work rate gone up at least 5X [if not 10X], but time will tell.
Therefore , the query over 66XX units is still valid if they don't run in future on my m/c.
Dweeb Success for now...

[PS: I understand that the card runs @ about 75 Watts, probably not much more than the NV6200 it replaced]

For tTow- the units that wouldn't run were 6600, 6601 & 6606 - do you need run data?

new08 · Post by **new08** » Fri Aug 27, 2010 11:33 am

Further update on the 66xx saga...I had a hiccup with the GPU client whilst loading other software, which crashed XP.
On return , my half finished unit [10503], which had been doing well, was lost after 10 hours work

500+pts
Subsequently 3 or 4 units of 66xx tried and failed to run. Luckily I then got 5767 -which is nearly done.
From this, it looks like for my m/c, 66xx are no-no's, for whatever reason.
From my early in view, admitted -the vid card seems fine, also PC , capable of doing other units pretty straight off and worth persisting with.
It's a shame users can't be flagged as not to accept certain units- but with these faster turn arounds and big rigs running mega numbers- it's much less a problem than in earlier days, having failed load ups hanging in the system.

bretth603 · Post by **bretth603** » Fri Aug 27, 2010 5:39 pm

As I mentioned in your Hardware thread before you bought the card, the 8400GS is known to crash on certain WUs. I and others have posted on this several times.

Your card will not run 66xx or 101xx WUs. It will run 57xx and 105xx WUs. Overclocking will not change this. It is not a WU problem.

So far the only way I've found that seems to exclude the 66xx and 101xx WUs is to block the work server IP address 171.64.65.61 where they seem to be coming from. I don't condone "cherry picking." This case is more like "poison cherry avoidance."

HTH

new08 · Post by **new08** » Thu Sep 02, 2010 7:35 pm

Thanks Brett- I've taken your hint. The alternative is for me to stop folding frankly!
The output of many has risen so high that my input ,as previous, would be miniscule.
I can live with this 8400 card working like like this -and it will lift me to the top 10 in my team on production if 'good units' keep coming along.
I missed your reply -and just seen it by chance , btw.

Post by **toTOW** » Fri Sep 03, 2010 12:45 pm

I'm curious : does the GPU3 core run better that GPU2 cores on this kind of GPU ?

new08 · Post by **new08** » Fri Sep 03, 2010 1:04 pm

I tried it |Tow- but can't remember the result, but GPU2 was finally successful so I left it at that.
I still have the [Console ver.] GPU3 onboard, so I can try it once the GPU2 client has finished current unit.
[p5783 @783pts :~O ]
I've found that running CPU client on top of GPU sometimes requests an exit[if not a crash] from GPU, like just happened.
Caught the data [@40% done] ok, though!

PS: Thinking back, the GPU3 version did no more than EUEs anyhow, but I can retry. Let me know...

PPS: I sniffed back the log on the GPU3 [Console]attempt- this is typical response..

Code: Select all

[14:13:21] Decompressed FahCore_11.exe (1908736 bytes) successfully
[14:13:26] + Core successfully engaged
[14:13:31] 
[14:13:31] + Processing work unit
[14:13:31] Core required: FahCore_11.exe
[14:13:31] Core found.
[14:13:31] Working on queue slot 01 [August 25 14:13:31 UTC]
[14:13:31] + Working ...
[14:13:32] 
[14:13:32] *------------------------------*
[14:13:32] Folding@Home GPU Core
[14:13:32] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[14:13:32] 
[14:13:32] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[14:13:32] Build host: amoeba
[14:13:32] Board Type: Nvidia
[14:13:32] Core      : 
[14:13:32] Preparing to commence simulation
[14:13:32] - Looking at optimizations...
[14:13:32] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[14:13:32] - Created dyn
[14:13:32] - Files status OK
[14:13:32] - Expanded 73840 -> 383588 (decompressed 519.4 percent)
[14:13:32] Called DecompressByteArray: compressed_data_size=73840 data_size=383588, decompressed_data_size=383588 diff=0
[14:13:32] - Digital signature verified
[14:13:32] 
[14:13:32] Project: 6605 (Run 1, Clone 509, Gen 288)
[14:13:32] 
[14:13:32] Assembly optimizations on if available.
[14:13:32] Entering M.D.
[14:13:38] Tpr hash work/wudata_01.tpr:  484779275 393848482 3624720554 1138574818 1750419468
[14:13:38] 
[14:13:38] Calling fah_main args: 14 usage=100
[14:13:38] 
[14:13:41] Working on Protein
[14:13:42] mdrun_gpu returned 
[14:13:42] Going to send back what have done -- stepsTotalG=0
[14:13:42] Work fraction=0.0000 steps=0.
[14:13:46] logfile size=9155 infoLength=9155 edr=0 trr=25
[14:13:46] + Opened results file
[14:13:46] - Writing 9693 bytes of core data to disk...
[14:13:46] Done: 9181 -> 3348 (compressed to 36.4 percent)
[14:13:46]   ... Done.
[14:13:46] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[14:13:46] 
[14:13:46] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:13:48] CoreStatus = 7A (122)
[14:13:48] Sending work to server
[14:13:48] Project: 6605 (Run 1, Clone 509, Gen 288)


[14:13:48] + Attempting to send results [August 25 14:13:48 UTC]
[14:13:48] Gpu species not recognized.
[14:13:49] + Results successfully sent
[14:13:49] Thank you for your contribution to Folding@Home.
[14:13:53] - Preparing to get new work unit...
[14:13:53] Cleaning up work directory

Final comment, in passing:
Shame about these units not running on the 8400- not a bad facility compared to slower CPUs.

Folding Forum

This may not be the WU fault- 66xx's [NVidia-8400GS]

This may not be the WU fault- 66xx's [NVidia-8400GS]

Re: This may not be the WU fault 6606

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6606

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600

Re: This may not be the WU fault 6600