Problems with 155.247.166.220?

Moderators: Site Moderators, FAHC Science Team

Post Reply
patermann
Posts: 17
Joined: Mon Apr 21, 2008 9:39 am

Problems with 155.247.166.220?

Post by patermann »

I have two slots, one of which is assigned to work server 155.247.166.220 . It finished a work unit yesterday but failed to upload it to the work server, so it sent it to the collection server 155.247.166.219 instead. However, since then, it has been sitting idle, doing nothing, and there are no messages for that slot in the log after the work unit was uploaded. (FAHClient shows the slot as "Ready" and status as "Download" but everything else is 0/Unknown and it has been that way for more than 24 hours now.) The other slot is assigned to work server 171.64.65.99 and is folding away merrily. Everything looks ok on the server status page so I do not understand why it is not working. Any ideas?

Thanks,
patermann
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Problems with 155.247.166.220?

Post by Joe_H »

There is a known problem where the client sometimes fails to detect and then retry a failed download or upload attempt. Not a frequent issue, but once this happens the only known way to get out of this is to restart the FAHClient process. Easiest way to do this is a system reboot, or on Windows stop and restart the FAHClient process from task bar icon after pausing folding. Restarting the FAHClient process without a reboot is a bit more involved on Linux and OS X.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problems with 155.247.166.220?

Post by bruce »

Server 155.247.166.220 seems to be operating normally, so you must have encountered the bug Joe_H is talking about.
Ricky
Posts: 474
Joined: Sat Aug 01, 2015 1:34 am
Hardware configuration: 1. 2 each E5-2630 V3 processors, 64 GB RAM, GTX980SC GPU, and GTX980 GPU running on windows 8.1 operating system.
2. I7-6950X V3 processor, 32 GB RAM, 1 GTX980tiFTW, and 2 each GTX1080FTW GPUs running on windows 8.1 operating system.
Location: New Mexico

Re: Problems with 155.247.166.220?

Post by Ricky »

Some how I have developed this problem with one of my two windows machine. I don't necessarily believe it is having to do with any particular server. If I reboot, I can go about a day running multiple WUs on the CPU and two GPU slots. After about a day of folding, the CPU slot will stall on a download, and never restart without a reboot. This problem started about a week ago.

When I reboot, one GPU work unit gets dumped most of the time for bad platform size. I believe this is from the client not reassigning the right WU to the right GPU on restart. One card is a GTX980 and the other is a GTX980ti. So, I have to finish the GPU work units before reboot to guarantee there is nothing to dump. This issue reduces the folding efficiency of the computer substantially. I would be best to give up on folding the CPU slot if I can't resolve the stalled download problem with the CPU slot.

UPDATE:

I had a GPU stall a download while the CPU still had its stalled download. I turned off the Widows firewall. The GPU immediately started its download and completed it. I then rebooted to get the CPU to download, but lost the GTX980ti work unit. There are two copies of FAH listed in the firewall monitor. One was private and the other was public. I have now set the FAHcontrol in the firewall to both pubic and private.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problems with 155.247.166.220?

Post by bruce »

Ricky wrote:I believe this is from the client not reassigning the right WU to the right GPU on restart. One card is a GTX980 and the other is a GTX980ti.
That's not what's happening. The GTX980 and the GTX980 ti are both classified as GPU type 2, subtype 5 (see GPUs.txt) so FAH treats those two GPUs identically. Even if they're reassigned as you suggest, the WU can be completed by the other GPU.

Internally, they are, in fact, different: a GM204 and a GM200 but the drivers present the same OpenCL 1.2 interface so FAH doesn't distinguish between the two of them. What is not clear is what internal differences might be found within the drivers, but that's a question for NVidia. (I expect that they also present the same CUDA interface, but I have not confirmed that. Not that it matters, though since it isn't currently being used by the FAHCores.)

UPDATE: When a firewall misbehaves, it can get tricky to fix it -- other than moving to a different firewall and hope it doesn't act the same way.
Post Reply