General Troubleshooting ideas

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: General Troubleshooting ideas

Post by bruce »

ChasingTheDream wrote:I guess my question would be why would the FAHClient not launch as it always does when I launch FAHControl? It is something I can look for in the future though.
PantherX has already answered this question, but let me explain it in a different way.

"Launch" is the wrong word. "Connect" would be more accurate.

FAHClient is designed to operate as a service, meaning it should always be running in the background, hidden from view. It can be monitored and/or controlled with FAHControl (or WEBControl). If you're not interested in eithe monitoring or changing settings, FAHControl need not be running and FAHClient will continue to process in accordance with the last settings you provided. For example, if you want to start or stop folding, you tell FAHControl to send commands to FAHClient. Every system that folds needs FAHClient to be running in the background just like every system that has a printer needs the printer daemon running in the background in case it receives data from some other application that it needs to process.

Moreover, a single copy of FAHControl can manage/monitor a number of copies of FAHClient running on various systems by remote control.
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

Thanks for all the info PantherX and Bruce!
PantherX wrote:BTW, are those systems headless or not?
No. Two the the systems have a monitor plugged directly into them. The other fours systems share a monitor via a HDMI switch.
PantherX wrote:You can use Task Scheduler to reboot the system after every 6 or 12 hours. Maybe it might help you run those systems unattended.
I set up a task to do this via a batch file but found that if too many driver failures happen it won't work just like remote rebooting via TeamViewer. When that happens that only thing that will reboot the machine is pressing the reset button.

Update on the systems:

Yesterday I pulled a GPU out of three different systems and reset the clocks on the GPU's I left in the systems (single GPU's now) back to their default factory overclock settings. So far all the systems have run without any issues at the much higher clock speeds.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: General Troubleshooting ideas

Post by PantherX »

ChasingTheDream wrote:...Two the the systems have a monitor plugged directly into them. The other fours systems share a monitor via a HDMI switch...
Not sure if you find this helpful but it could be worth to try it out:
1) For a dual GPU system, try plugging in two real monitors, one in each GPU and see if the stability improves or not.
2) For the shared systems, remove the HDMI switch to ensure that the switch isn't causing issues with the GPUs. I do know that you can connect to those headless systems via TeamViewer but the screen resolution might be low but sufficient for you to see if the GPUs are working or not (if you have remote monitoring configured, you can use the FAHControl to view the performance).

Some years ago, in order for multiple GPUs to fold in a single system, each GPU needed to have a monitor plugged into it to operate in 3D mode. Later, a dummy plug was used (to trick the GPU into thinking that a monitor was attached) and eventually, the drivers supported GPUs without an active monitor connection so you can fold on a headless system without worrying that the GPU is operating in 2D mode or has stopped functioning.
ChasingTheDream wrote:...Yesterday I pulled a GPU out of three different systems and reset the clocks on the GPU's I left in the systems (single GPU's now) back to their default factory overclock settings. So far all the systems have run without any issues at the much higher clock speeds.
So, with factory default frequencies, 3 systems are folding without issues, each folding on the CPU and GPU, right?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

PantherX wrote: Not sure if you find this helpful but it could be worth to try it out:
1) For a dual GPU system, try plugging in two real monitors, one in each GPU and see if the stability improves or not.
2) For the shared systems, remove the HDMI switch to ensure that the switch isn't causing issues with the GPUs. I do know that you can connect to those headless systems via TeamViewer but the screen resolution might be low but sufficient for you to see if the GPUs are working or not (if you have remote monitoring configured, you can use the FAHControl to view the performance).

Some years ago, in order for multiple GPUs to fold in a single system, each GPU needed to have a monitor plugged into it to operate in 3D mode. Later, a dummy plug was used (to trick the GPU into thinking that a monitor was attached) and eventually, the drivers supported GPUs without an active monitor connection so you can fold on a headless system without worrying that the GPU is operating in 2D mode or has stopped functioning.
I had done this some time ago and found no difference in stability. In fact, the machines that behave the best are actually the machines that use the HDMI switch.

I had to use dummy plugs on some old 7850's and actually thought I was going to have to with the R9 290X TRI-X's. Imagine my surprise when I discovered they were digital only and couldn't use all the neat resistors I had bought in preparation for the dummy plugs... :oops:

I can try two monitors on the same machine to see if it makes a difference but it wouldn't be very practical to use that way (I would need 8 more monitors) and the most stable machine I have that uses two GPU's (the one with different hardware from the six machines I've been talking about here) uses one monitor. It also has crossfire enabled because I do game with that machine at times.

I've tried crossfire on the six machines that I've been having trouble with as well but it made no difference so I disabled it again.
PantherX wrote: So, with factory default frequencies, 3 systems are folding without issues, each folding on the CPU and GPU, right?
Yep that is right. Now of the three machines that are still using two GPU's that have been very stable for a several days I've got one that is locking up all the time again. It started doing that today. So I went to set a GPU to finish to pull it and noticed that now this machine is working on project 10466. Hmmm... That is the same project where one of my other machines simply could not finish it. It failed constantly.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: General Troubleshooting ideas

Post by bruce »

Crossfire isn't used by FAH and setting/unsetting it shouldn't matter.

How many of the machines have pairs of matchine/unmatched GPUs and is there any correlation with the troubles?

You don't need a lot of extra monitors. Just move one or two around for long enough to establish a pattern. (I doubt this will matter, but it doesn't hurt to try.)
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

bruce wrote: How many of the machines have pairs of matchine/unmatched GPUs and is there any correlation with the troubles?
I'm not sure what you mean by the pairs of matched / unmatched GPU's. All the GPU's were exactly the same. Bought at the same time. There are six machines that appear to have the issues and each of them runs fine with a single GPU and CPU folding enabled at full factory clock speeds. With two GPU's installed they all become highly unstable but not all at once. They have their moments so that it appears things are getting better and then they will fail repeatedly.

Right now I've removed the second GPU from 4 of the 6 machines and none of the machines have had an issue since they went down to one GPU.

As the two remaining machines run into their issues and they will. It is just a matter of time, I'll remove their second GPU as well.

Still not sure what I'm going to do with the spare GPU's. I may just hold them as spares since their prices have dropped so much. Not much point in selling them at currently prices.

I do have another machine that also runs two of the same GPU's but it has been stable from the start although I did underclock the GPU's in this system as well. It has completely different hardware than the six systems I've been talking about. The only common component among them would be an SSD drive.
bruce wrote: You don't need a lot of extra monitors. Just move one or two around for long enough to establish a pattern. (I doubt this will matter, but it doesn't hurt to try.)
Understood. I just meant if that really was the issue I would need to buy a lot of monitors and it wouldn't be practical. I tried all sorts of things early on though and I'm reasonably sure I already used two monitors on one of the machines and it didn't make any difference. I may give it a go when I get some free time and deal with the stability issues while trying to complete the WU if I add a second GPU back into a system.

One update about what I had mentioned earlier. Project 10466 was the WU that still needed to process when I pull the GPU out of the machine today and it completed without incident with only a single GPU in the system. This machine also became very unstable when that WU was processing with two GPU's in the system although it could just be a coincidence.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: General Troubleshooting ideas

Post by bruce »

If FAH is configured for two GPUs, there will be two slots and each one will download a WU. If you remove a GPU, FAH will reconfigure itself by deleting one slot. Any WU which was previously assigned to that slot will be moved to another slot, provided there is another slot that has the same configuration -- which would apply to you since all GPUs are identical. If there is no other slot that's similar, the WU will be deleted.

When testing two GPUs one at a time, I do not recommend removing a GPU. You should get the same results by simply suspending processing on one slot or the other.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: General Troubleshooting ideas

Post by PantherX »

ChasingTheDream wrote:...I had done this some time ago and found no difference in stability. In fact, the machines that behave the best are actually the machines that use the HDMI switch.

I had to use dummy plugs on some old 7850's and actually thought I was going to have to with the R9 290X TRI-X's. Imagine my surprise when I discovered they were digital only and couldn't use all the neat resistors I had bought in preparation for the dummy plugs... :oops:

I can try two monitors on the same machine to see if it makes a difference but it wouldn't be very practical to use that way (I would need 8 more monitors) and the most stable machine I have that uses two GPU's (the one with different hardware from the six machines I've been talking about here) uses one monitor. It also has crossfire enabled because I do game with that machine at times...
Okay, in that case, see if not connecting the monitor to either GPUs has any effect or not. You may reboot the system while headless and see if the stability is effected by it or not.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

bruce wrote: When testing two GPUs one at a time, I do not recommend removing a GPU. You should get the same results by simply suspending processing on one slot or the other.
Understood Bruce. I actually did what you suggested for the first GPU rotation test and each GPU ran for days without issues while folding alone roughly a week ago. They simply won't run very long together. What caused me to actually pull a GPU was when one of the machines started getting driver failures literally within seconds of starting to fold. In 36 hours it process less than 4% of a WU. I wrote a long post about it earlier in this thread.

So now I've started moving toward stability because my time will be much more limited next week. I've removed a GPU from 4 of the 6 systems I've been talking about. I had one Windows crash and recovery on one of the machine since I've been folding with a single GPU and the CPU. There have been no GPU hangs. No driver failures etc.

In any event, I'm not sure what else we are going to try. So far nothing has made any difference at all and that is after 5 weeks of non-stop experimenting. I'm guessing these machines will just have to run with a single GPU. I can always add one back in for testing if there are new ideas.
PantherX wrote: Okay, in that case, see if not connecting the monitor to either GPUs has any effect or not. You may reboot the system while headless and see if the stability is effected by it or not.
I actually can't run headless on these motherboards. The BIOS will throw a D6 error code because it requires a console output device to be detected. Each machine also has a keyboard because without one the BIOS will throw a D7 error code.

I called EVGA support at the time because I couldn't find a way to override it in the BIOS. The reason I couldn't find it was because there was no way to override it in the BIOS. At least not in the version I had at the time. I have since upgraded but I haven't looked through all the BIOS settings to see if they added an override.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: General Troubleshooting ideas

Post by PantherX »

ChasingTheDream wrote:...I actually can't run headless on these motherboards. The BIOS will throw a D6 error code because it requires a console output device to be detected. Each machine also has a keyboard because without one the BIOS will throw a D7 error code.

I called EVGA support at the time because I couldn't find a way to override it in the BIOS. The reason I couldn't find it was because there was no way to override it in the BIOS. At least not in the version I had at the time. I have since upgraded but I haven't looked through all the BIOS settings to see if they added an override.
Humm, that's unexpected since I expected it to boot without any external devices. If your new BIOS doesn't allow you to run headless, what happens if you disconnect the monitor once the system is booted but F@H hasn't started folding and then start folding without the monitor attached?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

I have had the most peaceful few days ever since I started folding. I haven't made any changes. The machines just seem to run with one GPU. Two of the six machines still have two GPU's. No rhythm or reason why two of them will run and four of them won't but I think I'm going to leave it alone at this point.

I can unplug the monitors etc just to see what happens but in all honesty I don't see the point. Assuming the machines ran more stable "headless" (which I have no idea why it would) it isn't an option long term because a monitor has to be attached to the system for the machine to reboot. So it would just mean I have to do manual intervention any time a reboot is required which is what I'm trying to avoid.

I'll just leave them as they are for now. At least they don't require much time to maintain at this point. I certainly have spare hardware as a result of this but it is what it is. I appreciate all the information and ideas on this matter. Unfortunately, unless I'm willing to change motherboards I don't think we are going to resolve this issue.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: General Troubleshooting ideas

Post by P5-133XL »

Sorry we could not help more...
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: General Troubleshooting ideas

Post by bruce »

You may have already tried this, but I have a vague recollection of reports that if one GPU is working and if you add a second GPU, you need to reinstall the drivers -- even if the GPUs happen to be identical. That's a step that might easily have been overlooked.
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

bruce wrote:You may have already tried this, but I have a vague recollection of reports that if one GPU is working and if you add a second GPU, you need to reinstall the drivers -- even if the GPUs happen to be identical. That's a step that might easily have been overlooked.
Thanks Bruce, yeah unfortunately I tried that many times. I've probably reinstalled drivers dozens of times at this point. Sometimes back to back because I had run into the issue you mention while I was mining as well.

I have found one of the two machines that was running with two GPU's doesn't seem to want to now. I expected that. I haven't pulled the second GPU from that machine yet but if it fails more than once a day I will remove it.

I may try overvolting the CPU just to try it but I don't see how the CPU can be the problem when I had the same issue while the CPU was under very little load when I had the CPU folding disabled.

I suspect if I really want to find the cause of the issue I'm going to have to disassemble the machines and try a new motherboard. A candidate would be using the same ASUS motherboard I use in the machine that has always run, but I don't want to put that much time into it at this point. Not to mention I'm done putting more money into the project. I'll just have to run with what I got.
ChasingTheDream
Posts: 56
Joined: Mon Jun 02, 2014 10:56 pm

Re: General Troubleshooting ideas

Post by ChasingTheDream »

It appears that the issues this whole thread was about is now solved with the release of AMD's CCC 14.7rc3 drivers. Literally all the issues I was having on all my machine have now gone away. Machines that would not run two R9 290X TRI-X cards are now running three without incident. I finally had a chance to try the new drivers when I needed to a wipe a machine. They appeared more stable and provided higher PPD so I put the new beta drivers on all the machines. Then I started adding cards. So it appears it was a driver issue all along and since so much time was spent on it I wanted to report what seems to have fixed it.
Post Reply