Page 2 of 2

Re: returning to folding, problems with 680?

Posted: Thu Jan 03, 2013 12:21 am
by bruce
The text string containing the GF114 information is strictly for human convenience. FAH is only using the numbers in GPUs.txt to the left of that information so that, by itself doesn't cause a folding problem.

Obviously there's a possibility that those numbers are incorrect, the possibility of a bug in the drivers (even with WHQL certification), the possibility of a GROMACS bug that's only exposed by certain projects, the possibility of specific projects stressing the GPU more than other projects leading to overclock instability, etc. etc. Isolating which is the actual cause is virtually impossible until somebody reports back what they did to fix it with enough system details for others with similar systems to try the same thing and confirm that the same fix worked for them.

We have had a rash of drivers from both ATI and NV that were less than perfect yet were WHQL and the most common fix that has worked is to revert to older drivers (after running CCLEANTER to remove the vestiges of the newer driver. YMMV.

The FAH lab doesn't test all possible drivers on all possible GPUs on all possible versions of Windows (or WINE). The beta testing process exposes each project to a wider variety of combinations but there still will still be untested combinations, especially with new drivers. Advanced/advmethods testing exposes the projects to an even wider range of combinations, but there still may be inadequate testing on systems which EXACTLY match yours. That's a very good reason to avoiding updating to the latest drivers unless you really need to and a very good reason to avoid setting client-type to anything other than the default value.

Re: returning to folding, problems with 680?

Posted: Thu Jan 03, 2013 12:55 am
by Dark_n_Beyond
Thanks for your quick reply, Bruce. That's good to know about the naming convention, so I can rule that out. As it is now, I've done a clean windows install, clean driver install with the drivers that were working for me up until a few weeks ago, no overclock, and I'm just plain out of ideas. I really think I'm just going to have to shut it down until I can find someone on any forum with a similar problem and a solution. Thing is, I feel really bad about what's happened with returning bad units. I'm willing to keep trying, but I don't know of any other way of testing different things without hurting the project.

Re: returning to folding, problems with 680?

Posted: Thu Jan 03, 2013 2:03 am
by mmonnin
FAH seems to push hardware more than benchmarks so what may seem stable in a benchmark mail fail a Work Unit. The unstable machine error is an indicator of an overclock gone too far or some other hardware problem.

Re: returning to folding, problems with 680?

Posted: Thu Jan 03, 2013 11:17 pm
by codysluder
mmonnin wrote:FAH seems to push hardware more than benchmarks so what may seem stable in a benchmark mail fail a Work Unit. The unstable machine error is an indicator of an overclock gone too far or some other hardware problem.
Absolutely. Do not assume that you can run F@H with an overclock that has been established based on some other program calling it stable. In fact, if F@H is just barely stable with one project that doesn't gurarantee it will be stable with another project.

Re: returning to folding, problems with 680?

Posted: Fri Jan 04, 2013 6:11 am
by ford316
codysluder wrote:
mmonnin wrote:FAH seems to push hardware more than benchmarks so what may seem stable in a benchmark mail fail a Work Unit. The unstable machine error is an indicator of an overclock gone too far or some other hardware problem.
Absolutely. Do not assume that you can run F@H with an overclock that has been established based on some other program calling it stable. In fact, if F@H is just barely stable with one project that doesn't guarantee it will be stable with another project.

This program IS my benchmark tool all because it does push far beyond what other programs do.. LOL :D Also I had a problem with my settings using auto volts in bios and windows kept freezing up after 8 hours or so so I changed the settings and upped the volts everything is perfect after the last 20 or so hours with no freezing. By far this would be the best benchmark tool and to check and see if computer is stable.

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 12:08 am
by Dark_n_Beyond
mmonnin wrote:FAH seems to push hardware more than benchmarks so what may seem stable in a benchmark mail fail a Work Unit. The unstable machine error is an indicator of an overclock gone too far or some other hardware problem.
I agree with you that FAH probably pushes the hardware more than benchmarks. If you go back about 4 posts, you will see that I stated my folding problems are with NO overclock. I certainly wouldn't use a benchmark as an indication of stability, and unless I am actually benchmarking the card, I have no need for any overclock and don't.

Some other hardware problem is a possibility. I had updated the motherboard bios about the time this problem started, perhaps something changed not for the better? Should be easy enough to reflash if I can figure out what the original bios version was. Power supply is another possibility, and I'll be testing that tonight. Memory passes memtest fine, and I even tried a different set. Prime95 runs 24 hours without issue. Corrupted bios in the card is possible, but there are 3 seperate ones, and none work. Could be the card is actually bad, but no other test I can find has any kind of issue. I was getting nvlddmkm errors in the event logs with the newest driver, but since going back to 306.97 they have stopped. Card is watercooled, and maxes out at about 45C. Still get the following:

02:47:46:WU02:FS01:0x15:Run: exception thrown in GuardedRun -- cannot continue further.
02:47:46:WU02:FS01:0x15:Going to send back what have done -- stepsTotalG=40000000
02:47:46:WU02:FS01:0x15:Work fraction=0.2828 steps=40000000.
02:47:50:WU02:FS01:0x15:logfile size=19168 infoLength=19168 edr=0 trr=23
02:47:50:WU02:FS01:0x15:+ Opened results file
02:47:50:WU02:FS01:0x15:- Writing 19704 bytes of core data to disk...
02:47:50:WU02:FS01:0x15:Done: 19192 -> 5380 (compressed to 28.0 percent)
02:47:50:WU02:FS01:0x15: ... Done.
02:47:50:WU02:FS01:0x15:DeleteFrameFiles: successfully deleted file=02/wudata_01.ckp
02:47:50:WU02:FS01:0x15:
02:47:50:WU02:FS01:0x15:Folding@home Core Shutdown: UNSTABLE_MACHINE
02:47:51:WARNING:WU02:FS01:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
02:47:51:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:7623 run:417 clone:4 gen:17 core:0x15 unit:0x00000013664f2dd14fe4facc3cec5a6d
02:47:51:WU02:FS01:Uploading 5.75KiB to 171.64.65.105
02:47:51:WU02:FS01:Connecting to 171.64.65.105:8080
02:47:51:WU02:FS01:Upload complete
02:47:51:WU02:FS01:Server responded WORK_ACK (400)
02:47:51:WU02:FS01:Cleaning up

I really didn't mean to hijack this thread, but if someone knows of some test I could run that would be comparable to folding, please let me know. Until I can figure this out, I've taken the GPU offline.

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 12:13 am
by bollix47
For the CPU there's StressCPU v2. Also listed here.

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 3:57 am
by ford316
Dark_n_Beyond wrote:
mmonnin wrote:FAH seems to push hardware more than benchmarks so what may seem stable in a benchmark mail fail a Work Unit. The unstable machine error is an indicator of an overclock gone too far or some other hardware problem.
Card is watercooled, and maxes out at about 45C
GPU card watercooled and temp is 45C... I don't have watercooled GPU and my temp is 45C for the GPU. Processor is watercooled and it runs about 38.8C on low fan setting. Dark you didn't make a backup of bios before updating? That is the first thing to do before updating bios. Look at the bios screen when computer starts and see what version you are running now then check motherboard home page and see if they have one that is older than what you have now. If not google search it. Good Luck.

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 4:16 am
by Dark_n_Beyond
Yes, 45C. I can run it cooler, but I keep the fans turned down, because I don't like to listen to them. Got the bios flashed back to what the motheboard shipped with. For whatever reason, I started getting the feeling it may be the pcie slot, so I moved the card to another. Doing some searching, that's probably unlikely, but possible. PSU checks out with a multimeter, which I figured it would. Ran MemtestG80 a few times with no errors, although I haven't figured out how to run more than 50 iterations. I'd like to run many more. I'll load up the cpu stress test and let that run.

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 3:33 pm
by ford316
Nvidia just released 310.90 today for the 680 cards... it might be worth checking out or maybe not who knows until someone tries it and posts it... sent ya a pm dark try what I listed and post back here I will check back later on today or in the next few days. I am still working on the problem for ya so don't give up... also dark which version of windows are you running?

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 5:26 pm
by Dark_n_Beyond
ford316 wrote:Nvidia just released 310.90 today for the 680 cards... it might be worth checking out or maybe not who knows until someone tries it and posts it... sent ya a pm dark try what I listed and post back here I will check back later on today or in the next few days. I am still working on the problem for ya so don't give up... also dark which version of windows are you running?
I don't think new drivers are a good idea at this point, I'll stick with what worked for a couple of months without issue. Running Windows 7 Ultimate 64.

After "downgrading" the bios and moving the card to another slot, spent a few hours and ran all the tests I could think of, and found no issue. Crossed my fingers and prayed, added the gpu slot, and made it thru the first work unit (7626 109,0,53). That's further than I've got in 3 weeks, and project 7626 was one that I had problems with, so hopefully this is progress. Now running 7624 78,4,4 and so far so good.

Only other thing of note, and whether it's really significant or not I'm not sure (being based on limited projects), is that the gpu is now folding at 42279 ppd, versus right around 43.5k when I was having issues. I'm also not experiencing any of the video lag and choppiness I was before (and has been talked about at length in other places) while doing other tasks.

Re: returning to folding, problems with 680?

Posted: Sat Jan 05, 2013 7:46 pm
by ford316
Dark_n_Beyond wrote:After "downgrading" the bios and moving the card to another slot, spent a few hours and ran all the tests I could think of, and found no issue. Crossed my fingers and prayed, added the gpu slot, and made it thru the first work unit (7626 109,0,53). That's further than I've got in 3 weeks, and project 7626 was one that I had problems with, so hopefully this is progress. Now running 7624 78,4,4 and so far so good.

Only other thing of note, and whether it's really significant or not I'm not sure (being based on limited projects), is that the gpu is now folding at 42279 ppd, versus right around 43.5k when I was having issues. I'm also not experiencing any of the video lag and choppiness I was before (and has been talked about at length in other places) while doing other tasks.

That reminds me a few weeks back when I did my upgrade of bios I had to go in and reset all the settings because everything went to default so I lost my overclock and how I wanted it to boot with many other settings. So it could be that the pci slot went bad, or bios upgrade changed things, or maybe some other program ya picked up along the way messed it up in 1 way or another. Atleast you got it working now. I am still researching on it since I work on and build computers its one of the little problems that is not fully solved yet and its a challenge. :lol:

Re: returning to folding, problems with 680?

Posted: Sun Jan 06, 2013 3:22 pm
by Dark_n_Beyond
I've made it through 4 work units as of now, without a single error:
7626 109,0,53
7624 78,4,4
7623 594,6,13
7623 48,6,16
The last 2 ran with the most current bios. My suspicion is something to do with the PCIE slot, whether it be bad, something in it, or what will take more investigation than I have time for at the moment. I never would have thought that if it wasn't for finding some posts from 2-3 years ago about changing PCIE slots (I think the reasoning was different then, but I didn't have anything to lose). I'll obviously keep a close eye on it, but unless things get all flaky again, I'll consider this solved.
Thanks bollix47 for pointing out the stress tests, and ford316 for your help!