ChasingTheDream wrote:...I didn't leave a machine sitting on the BIOS screen for several hours. I suppose I can. The issue isn't keep the machines up. The issue is keeping the machine up and running the F@H client. As I've mentioned previously these machines ran for weeks unattended while Scrypt mining. They only have trouble when I try to use them for F@H and I'm not sure why. I'm aware the memory usage is quite different with F@H...
Okay, so if you can have the system idle for several hours without issue, then you know that the issue surfaces when there is load on the system. Do note that not only the memory usage is different between Scrypt mining and F@H, but also different GPU components can be used which may explain why you aren't seeing an issue with Scrypt mining but the issue occurs while folding.
ChasingTheDream wrote:...I guess the issue is I'm not exactly sure what else can to done to troubleshoot. I've swapped GPU's, memory, fresh Windows installs, fresh AMD driver installs, under clocked the GPU's, under clocked the system memory, moved power cables, switched GPU slots, updated BIOS on all machines, switched to PCI2, and enabled crossfire. None of it has made any difference at all...
It seems that you have ruled out a significant portion of the hardware components. Below are the ones you may consider:
1) Motherboard -> Perhaps you were unlucky and got a batch of defective ones. Thus, swapping them around to see if they are indeed defective or not.
2) HDD/SSD -> Maybe the drive is wonky and causing issues. On one of my systems, it refused to boot-up and the issue was a faulty drive.
3) PSU -> Swapping it with a good system to see if it makes any difference
4) Power cables -> Are you using extenders? If so, try without any extenders and see if it solves your issue or not.
5) Minimum hardware -> Physically remove all excess hardware and only use the minimum basic hardware to fold, i.e. 1 RAM stick, CPU, 1 GPU, 1 HDD.
6) CPU -> Swap the CPU with a good system to see if the CPU is got damaged or not.
ChasingTheDream wrote:...The only thing left I can even think of it running the memory stress test and maybe CPU stess test. I will be stunned if all the memory in all the computers is bad though and same applies to the CPU...
Here is some F@H specific tools:
CPU -> StressCPU (
http://folding.stanford.edu/home/downlo ... ties#ntoc1)
GPU -> FAHBench (
http://fahbench.com/) - The website is down currently and I have informed the appropriate personal so you may want to check it out once its available.
ChasingTheDream wrote:...I would be more inclined to think it is a hardware defect if it was one machine. It's hard to imagine I have the same defect in 6 nearly identical machines...
That would suggest that something common is the issue but unfortunately, it hasn't been discovered yet.
ChasingTheDream wrote:...The machines were up and running and I could even remote in to them but both GPU's on each machine were out of sync with the client and Windows was unable to do a restart on it's own. I had to intervene with a reset...
How did you remote in; Microsoft Remote Desktop, TeamViewer, etc? Was there any driver reset information in the Windows Event Log?
ChasingTheDream wrote:...The only thing with the auto restarts is that I would rather do it when an out of sync condition is detected and I'm not sure Windows will be stable enough to actually complete the restart on its own...
Assuming that checkpoints are written correctly, restarting the system shouldn't cause any loss of WUs, only a certain amount of work will be lost. If Windows can't restart without issues, then it could indicate an unstable OS. Windows should be able to restart without any issues.
ChasingTheDream wrote:...I've got about two more weeks before my available time drops to next to nothing. At that point detailed troubleshooting just isn't going to be an option because I'll only be near the machines a couple hours a day at most so the machines will need to be able to run unattended. So far they have shown zero ability to do that while folding...
You currently have two options, neither are standard and are considered experimental:
1) You can install Ubuntu 12.04 64-bit with the AMD proprietary driver. Then, you install V7.4.4. set it to client-type advanced and allow it to fold FahCore_17 WUs. Help can be provided if you need it.
2) You could install Ubuntu 12.04 with the AMD proprietary driver and attempt to run ocores (viewtopic.php?f=66&t=26218). However, do note that since you aren't a Beta Team Member, help can't be provided in this Forum as part of the Forum policy but you can get help on the IRC.