Multiple WU's Fail downld/upld to 155.247.166.*

Moderators: Site Moderators, FAHC Science Team

bollix47
Posts: 2958
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by bollix47 »

I had a similar problem last weekend ... here's what worked for me:

Open FAHControl
Click on Pause
Exit FAHControl
Re-boot computer
Open FAHControl
Click on Fold
Frisa
Posts: 26
Joined: Fri Jan 18, 2019 6:34 am

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by Frisa »

As a CPU only folder, i found out theres one interesting thing is the A7 project never failed to downloaded/uploaded.
currently most A7 WUs are assigned from A7 only server 128.252.203.9, but occasionally i got WUs from 155.247.166.219, sometime i got transfer failure from 128.252.203.9 but never got SINGLE transfer failure from 219 server for one month
bollix47
Posts: 2958
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by bollix47 »

My failure on the weekend was an a7 project ... unfortunately it happens to all projects regardless of the core. It does not happen every time and some will not experience it for weeks at a time or maybe never, but it's certainly a 'pain' when it happens. The fact is that the core has nothing to do with the download/upload sequences ... that's all done by the client.
biodoc
Posts: 20
Joined: Sun Jan 06, 2008 10:15 am

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by biodoc »

bollix47 wrote:I had a similar problem last weekend ... here's what worked for me:

Open FAHControl
Click on Pause
Exit FAHControl
Re-boot computer
Open FAHControl
Click on Fold
This worked. thanks!

I tried stopping the fahclient and killing any remaining processes for user fahclient and then restarting the client but that didn't work in this case. Rebooting linux is not a satisfying solution for me but it worked.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by bruce »

There's a known bug in FAHCore_A7. It has been fixed in a new version of that FAHCore and that version is being beta tested so it should be ready to release soon. The bug causes extra (unnecessary) information to be added to the result, making the file too large to upload. Excessively large uploads are being rejected by the servers. (Yours is 68 MiB and it should be maybe 10 MiB)

I would probably discard that file, but it will eventually expire and delete itself from your system.

Most likely you have been processing that WU with the "on idle" setting. I recommend you discontinue using that setting until the new CPU FAHCore_a7 is released.
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by Catalina588 »

November 6 1600 EST - Temple server .219 is failing to download GPU work units.
DocJonz
Posts: 244
Joined: Thu Dec 06, 2007 6:31 pm
Hardware configuration: Folding with: 4x RTX 4070Ti, 1x RTX 4080 Super
Location: United Kingdom
Contact:

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by DocJonz »

Catalina588 wrote:November 6 1600 EST - Temple server .219 is failing to download GPU work units.
I concur - looks like the download issues are back with the 155.247.166.* server.
Folding Stats (HFM.NET): DocJonz Folding Farm Stats
JimF
Posts: 651
Joined: Thu Jan 21, 2010 2:03 pm

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by JimF »

I am down on two out of four Folding machines. I will keep them down until/unless someone can give the "all clear".
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by bruce »

What messages are you seeing when .219 doesn't issue a WU?

=====

The bug in the CPU FAHCore_A7 has been fixed so all CPU Wus going out now will no longer be inflated ... consuming extra bandwidth. Over the next couple of weeks, those CPU WUs that are being processed will be completed and the congestion problem will gradually be reduced.
dfgirl12
Posts: 38
Joined: Fri Aug 21, 2009 8:34 am

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by dfgirl12 »

Same. I've been getting hung folding slots for the past 4+ hours. Downloads fail from *.219, and just stop like this:

Code: Select all

2019-11-06:23:46:53:WU01:FS01:0x21:Completed 25000000 out of 25000000 steps (100%)
2019-11-06:23:46:53:WU01:FS01:0x21:Saving result file logfile_01.txt
2019-11-06:23:46:53:WU01:FS01:0x21:Saving result file checkpointState.xml
2019-11-06:23:46:53:WU01:FS01:0x21:Saving result file checkpt.crc
2019-11-06:23:46:53:WU01:FS01:0x21:Saving result file log.txt
2019-11-06:23:46:53:WU01:FS01:0x21:Saving result file positions.xtc
2019-11-06:23:46:54:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
2019-11-06:23:46:54:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
2019-11-06:23:46:54:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14191 run:17 clone:13 gen:89 core:0x21 unit:0x0000007f0002894c5d5d742b6b992b05
2019-11-06:23:46:54:WU01:FS01:Uploading 9.30MiB to 155.247.166.220
2019-11-06:23:46:54:WU01:FS01:Connecting to 155.247.166.220:8080
2019-11-06:23:46:54:WU02:FS01:Connecting to 65.254.110.245:8080
2019-11-06:23:46:54:WU02:FS01:Assigned to work server 155.247.166.219
2019-11-06:23:46:54:WU02:FS01:Requesting new work unit for slot 01: READY gpu:1:GP104 [GeForce GTX 1080] 8873 from 155.247.166.219
2019-11-06:23:46:54:WU02:FS01:Connecting to 155.247.166.219:8080
2019-11-06:23:46:55:WU02:FS01:Downloading 27.50MiB
2019-11-06:23:47:00:WU01:FS01:Upload 11.43%
2019-11-06:23:47:02:WU02:FS01:Download 1.59%
2019-11-06:23:47:06:WU01:FS01:Upload 21.51%
2019-11-06:23:47:09:WU02:FS01:Download 2.27%
2019-11-06:23:47:13:WU01:FS01:Upload 30.92%
2019-11-06:23:47:17:WU02:FS01:Download 2.73%
2019-11-06:23:47:19:WU01:FS01:Upload 43.02%
2019-11-06:23:47:24:WU02:FS01:Download 3.18%
2019-11-06:23:47:25:WU01:FS01:Upload 56.46%
2019-11-06:23:47:31:WU01:FS01:Upload 67.88%
2019-11-06:23:47:33:WU02:FS01:Download 3.86%
2019-11-06:23:47:37:WU01:FS01:Upload 84.02%
2019-11-06:23:47:42:WU02:FS01:Download 4.32%
2019-11-06:23:47:43:WU01:FS01:Upload 96.79%
2019-11-06:23:47:45:WU01:FS01:Upload complete
2019-11-06:23:47:45:WU01:FS01:Server responded WORK_ACK (400)
2019-11-06:23:47:45:WU01:FS01:Final credit estimate, 192064.00 points
2019-11-06:23:47:45:WU01:FS01:Cleaning up
2019-11-06:23:48:21:WU02:FS01:Download 4.55%
2019-11-06:23:48:29:WU02:FS01:Download 4.77%
2019-11-06:23:48:35:WU02:FS01:Download 5.23%
2019-11-06:23:51:11:WU02:FS01:Download 5.45%
2019-11-06:23:51:12:ERROR:WU02:FS01:Exception: Transfer failed
2019-11-06:23:51:13:WU02:FS01:Connecting to 65.254.110.245:8080
2019-11-06:23:51:13:WU02:FS01:Assigned to work server 155.247.166.219
2019-11-06:23:51:13:WU02:FS01:Requesting new work unit for slot 01: READY gpu:1:GP104 [GeForce GTX 1080] 8873 from 155.247.166.219
2019-11-06:23:51:13:WU02:FS01:Connecting to 155.247.166.219:8080
2019-11-06:23:51:13:WU02:FS01:Downloading 27.45MiB
2019-11-06:23:51:21:WU02:FS01:Download 0.46%
2019-11-06:23:51:28:WU02:FS01:Download 1.59%
2019-11-06:23:51:54:WU02:FS01:Download 2.28%
2019-11-06:23:52:20:WU02:FS01:Download 2.96%
2019-11-06:23:52:26:WU02:FS01:Download 3.87%
2019-11-06:23:52:33:WU02:FS01:Download 4.55%
2019-11-06:23:52:39:WU02:FS01:Download 5.46%
2019-11-06:23:52:45:WU02:FS01:Download 6.15%
2019-11-06:23:52:56:WU02:FS01:Download 7.06%
2019-11-06:23:53:03:WU02:FS01:Download 7.51%
2019-11-06:23:53:10:WU02:FS01:Download 8.42%
2019-11-06:23:53:20:WU02:FS01:Download 8.65%
2019-11-06:23:53:26:WU02:FS01:Download 8.88%
2019-11-06:23:53:35:WU02:FS01:Download 9.56%
2019-11-06:23:53:42:WU02:FS01:Download 9.79%
2019-11-06:23:53:49:WU02:FS01:Download 10.47%
2019-11-06:23:53:56:WU02:FS01:Download 10.93%
2019-11-06:23:54:02:WU02:FS01:Download 11.15%
2019-11-06:23:54:26:WU02:FS01:Download 11.38%
2019-11-06:23:54:32:WU02:FS01:Download 12.29%
2019-11-06:23:54:40:WU02:FS01:Download 12.98%
2019-11-06:23:54:46:WU02:FS01:Download 13.43%
2019-11-06:23:54:53:WU02:FS01:Download 14.34%
2019-11-06:23:55:14:WU02:FS01:Download 14.57%
2019-11-06:23:55:20:WU02:FS01:Download 15.94%
2019-11-06:23:55:26:WU02:FS01:Download 16.85%
2019-11-06:23:55:32:WU02:FS01:Download 17.76%
2019-11-06:23:55:38:WU02:FS01:Download 19.35%
2019-11-06:23:55:45:WU02:FS01:Download 20.72%
2019-11-06:23:55:52:WU02:FS01:Download 21.63%
2019-11-06:23:55:58:WU02:FS01:Download 22.31%
2019-11-06:23:56:05:WU02:FS01:Download 23.22%
2019-11-06:23:56:11:WU02:FS01:Download 24.13%
2019-11-06:23:56:17:WU02:FS01:Download 25.04%
2019-11-06:23:56:36:WU02:FS01:Download 25.95%
2019-11-06:23:56:45:WU02:FS01:Download 26.18%
2019-11-06:23:56:53:WU02:FS01:Download 26.41%
Or, this one that fixed itself. Downloads failed from *.219, but are OK from *.220:

Code: Select all

2019-11-06:23:47:36:WU00:FS00:Connecting to 65.254.110.245:8080
2019-11-06:23:47:36:WU00:FS00:Assigned to work server 155.247.166.219
2019-11-06:23:47:36:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 155.247.166.219
2019-11-06:23:47:36:WU00:FS00:Connecting to 155.247.166.219:8080
2019-11-06:23:47:37:ERROR:WU00:FS00:Exception: Server did not assign work unit
2019-11-06:23:47:37:WU00:FS00:Connecting to 65.254.110.245:8080
2019-11-06:23:47:38:WU00:FS00:Assigned to work server 155.247.166.219
2019-11-06:23:47:38:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 155.247.166.219
2019-11-06:23:47:38:WU00:FS00:Connecting to 155.247.166.219:8080
2019-11-06:23:47:38:WU00:FS00:Downloading 27.49MiB
2019-11-06:23:47:52:WU00:FS00:Download 0.68%
2019-11-06:23:47:59:WU00:FS00:Download 0.91%
2019-11-06:23:48:09:WU00:FS00:Download 1.36%
2019-11-06:23:48:17:WU00:FS00:Download 2.05%
2019-11-06:23:48:25:WU00:FS00:Download 2.73%
2019-11-06:23:48:47:WU00:FS00:Download 2.96%
2019-11-06:23:49:45:WU00:FS00:Download 3.08%
2019-11-06:23:49:45:ERROR:WU00:FS00:Exception: Transfer failed
2019-11-06:23:49:45:WU00:FS00:Connecting to 65.254.110.245:8080
2019-11-06:23:49:46:WU00:FS00:Assigned to work server 155.247.166.220
2019-11-06:23:49:46:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 155.247.166.220
2019-11-06:23:49:46:WU00:FS00:Connecting to 155.247.166.220:8080
2019-11-06:23:49:46:WU00:FS00:Downloading 15.58MiB
2019-11-06:23:49:52:WU00:FS00:Download 89.88%
2019-11-06:23:49:52:WU00:FS00:Download complete
absolutefunk
Posts: 22
Joined: Mon Mar 10, 2014 1:41 am

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by absolutefunk »

Both of my 1070s were hung on 'download' this morning. 155.247.166.219 needs to be pulled until the underlying problem can be fixed. It's been over a month intermittently now. This is not a good look for the project.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by bruce »

absolutefunk wrote:Both of my 1070s were hung on 'download' this morning. 155.247.166.219 needs to be pulled until the underlying problem can be fixed. It's been over a month intermittently now. This is not a good look for the project.
is

I don't think there's any chance that the server will be pulled. vav3.ocis.temple.edu is currently supporting about 25% of FAH''s activity. Taking that capacity off-line would make a huge disruption in your ability to get an assignment when you need one. I understand it looks bad, and is an inconvenience for you but that's unrealistic and makes a moderate problem into a big one. New hardware is being ordered to handle the recent increase in production, but provisioning for that increase takes time and money.

Besides, the first step has been completed (fixing FAHCore_a7 software) and rolling out that fix takes time because it's cannot be called "completed" until all WUs currently in the field are refreshed with new ones, no matter how slow the Donor hardware happens to be.
absolutefunk
Posts: 22
Joined: Mon Mar 10, 2014 1:41 am

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by absolutefunk »

Wow, 25%, I thought the load was more distributed than that. These issues don't bother me that much, but the hanging downloads require manual intervention on our behalf, and for folders which don't (or can't) check their systems periodically, it results in lost output. I'm hoping the next client release supports a hard timeout on downloads, which would help a lot.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by bruce »

My 25% number came from https://apps.foldingathome.org/serverstats.

There are a lot of servers currently off-line, and don't know why. (Possibly related to a recent critical upgrade of the server software)

The essential part of my post is that the problems are understood and are being addressed --- and it takes a lot of "red tape" to get enough signatures to spend as much money as it takes to get a new server(s).
Joe_H
Site Admin
Posts: 7936
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Post by Joe_H »

Also probably related to servers no longer being operated out of Stanford, the last one was shut off in the last couple months. That leaves servers at WUSTL, Temple and MSKCC.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply