Page 2 of 3
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 1:08 am
by Vester
I went to close the log and found this work unit downloading. Download speed was about 3 Mbps. My internet speed is about 50 Mbps.
Code: Select all
00:53:15:WU01:FS03:Connecting to 65.254.110.245:8080
00:53:15:WU01:FS03:Assigned to work server 140.163.4.231
00:53:15:WU01:FS03:Requesting new work unit for slot 03: READY gpu:2:Tahiti [Radeon HD 7900 Series] from 140.163.4.231
00:53:15:WU01:FS03:Connecting to 140.163.4.231:8080
00:54:15:ERROR:WU01:FS03:Exception: 10002: Received short response, expected 512 bytes, got 0
00:54:52:WU01:FS03:Connecting to 65.254.110.245:8080
00:54:53:WU01:FS03:Assigned to work server 140.163.4.231
00:54:53:WU01:FS03:Requesting new work unit for slot 03: READY gpu:2:Tahiti [Radeon HD 7900 Series] from 140.163.4.231
00:54:53:WU01:FS03:Connecting to 140.163.4.231:8080
00:55:53:ERROR:WU01:FS03:Exception: 10002: Received short response, expected 512 bytes, got 0
00:57:08:FS03:Paused
00:57:16:FS03:Unpaused
00:57:16:WU01:FS03:Connecting to 65.254.110.245:8080
00:57:17:WU01:FS03:Assigned to work server 140.163.4.241
00:57:17:WU01:FS03:Requesting new work unit for slot 03: READY gpu:2:Tahiti [Radeon HD 7900 Series] from 140.163.4.241
00:57:17:WU01:FS03:Connecting to 140.163.4.241:8080
00:58:17:ERROR:WU01:FS03:Exception: 10002: Received short response, expected 512 bytes, got 0
00:58:54:WU01:FS03:Connecting to 65.254.110.245:8080
00:58:54:WU01:FS03:Assigned to work server 140.163.4.241
00:58:54:WU01:FS03:Requesting new work unit for slot 03: READY gpu:2:Tahiti [Radeon HD 7900 Series] from 140.163.4.241
00:58:54:WU01:FS03:Connecting to 140.163.4.241:8080
01:00:05:WU01:FS03:Downloading 50.83MiB
01:00:11:WU01:FS03:Download 2.46%
01:00:17:WU01:FS03:Download 5.90%
01:00:23:WU01:FS03:Download 7.62%
01:00:29:WU01:FS03:Download 9.34%
01:00:35:WU01:FS03:Download 10.45%
01:00:41:WU01:FS03:Download 12.17%
01:00:47:WU01:FS03:Download 14.63%
01:00:53:WU01:FS03:Download 16.97%
01:00:59:WU01:FS03:Download 18.57%
01:01:05:WU01:FS03:Download 20.04%
01:01:11:WU01:FS03:Download 22.13%
01:01:17:WU01:FS03:Download 23.61%
01:01:23:WU01:FS03:Download 26.43%
01:01:29:WU01:FS03:Download 28.52%
01:01:35:WU01:FS03:Download 29.88%
01:01:41:WU01:FS03:Download 32.09%
01:01:47:WU01:FS03:Download 34.06%
01:01:53:WU01:FS03:Download 36.76%
01:01:59:WU01:FS03:Download 40.82%
01:02:05:WU01:FS03:Download 45.00%
01:02:11:WU01:FS03:Download 46.97%
01:02:17:WU01:FS03:Download 48.93%
01:02:23:WU01:FS03:Download 50.78%
01:02:29:WU01:FS03:Download 52.62%
01:02:35:WU01:FS03:Download 55.33%
01:02:41:WU01:FS03:Download 56.56%
01:02:47:WU01:FS03:Download 58.28%
01:02:53:WU01:FS03:Download 58.77%
01:03:01:WU01:FS03:Download 59.63%
01:03:07:WU01:FS03:Download 61.35%
01:03:13:WU01:FS03:Download 62.46%
01:03:19:WU01:FS03:Download 64.30%
01:03:25:WU01:FS03:Download 66.64%
01:03:31:WU01:FS03:Download 68.73%
01:03:37:WU01:FS03:Download 71.56%
01:03:43:WU01:FS03:Download 73.40%
01:03:49:WU01:FS03:Download 75.98%
01:03:55:WU01:FS03:Download 78.32%
01:04:01:WU01:FS03:Download 79.43%
01:04:07:WU01:FS03:Download 80.04%
01:04:15:WU01:FS03:Download 80.90%
01:04:21:WU01:FS03:Download 81.52%
01:04:27:WU01:FS03:Download 82.25%
01:04:33:WU01:FS03:Download 83.48%
01:04:43:WU01:FS03:Download 84.34%
01:04:57:WU01:FS03:Download 86.31%
01:05:04:WU01:FS03:Download 86.93%
01:05:42:WU01:FS03:Download 87.66%
01:05:48:WU01:FS03:Download 87.91%
01:05:54:WU01:FS03:Download 90.00%
01:06:00:WU01:FS03:Download 92.09%
01:06:06:WU01:FS03:Download 94.79%
01:06:13:WU01:FS03:Download 97.62%
01:06:19:WU01:FS03:Download 99.83%
01:06:23:WU01:FS03:Download complete
01:06:23:WU01:FS03:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11744 run:0 clone:7784 gen:0 core:0x22 unit:0x000000008ca304f15e6bc41225bc32cb
01:06:23:WU01:FS03:Starting
01:06:23:WU01:FS03:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\veste\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/beta/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 11128 -checkpoint 3 -gpu-vendor amd -opencl-platform 0 -opencl-device 5 -gpu 5
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 1:13 am
by Joe_H
The "Received short response" messages are more often connected to connections being blocked by firewall or anti-malware software. The FAHClient process needs the same permissions to use HTTP over ports 8080 and/or 80 as a web browser. Many anti-malware software will allow this for "known" browsers such as Chrome, Firefox or Opera; but FAHClient is not one of the recognized ones.
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 1:16 am
by toTOW
Same answer as for the other server (.241) : viewtopic.php?p=312450#p312450
Don't worry, it is not blocking you ... but with 8 GPUs, the odds that you'll have troubles are higher.
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 3:07 am
by JiiPee
Is anyone else seeing lot of failured with WU's like right after download or after some folding?
Code: Select all
02:59:31:WU01:FS01:Download 99.73%
02:59:33:WU01:FS01:Download complete
02:59:33:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11746 run:0 clone:2935 gen:5 core:0x22 unit:0x000000078ca304f15e6aa66583b9500f
02:59:33:WU01:FS01:Starting
02:59:33:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 4088 -checkpoint 30 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:59:33:WU01:FS01:Started FahCore on PID 4600
02:59:33:WU01:FS01:Core PID:14868
02:59:33:WU01:FS01:FahCore 0x22 started
02:59:34:WU01:FS01:0x22:*********************** Log Started 2020-03-14T02:59:33Z ***********************
02:59:34:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:59:34:WU01:FS01:0x22: Type: 0x22
02:59:34:WU01:FS01:0x22: Core: Core22
02:59:34:WU01:FS01:0x22: Website: https://foldingathome.org/
02:59:34:WU01:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
02:59:34:WU01:FS01:0x22: Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:59:34:WU01:FS01:0x22: <rafal.wiewiora@choderalab.org>
02:59:34:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 705 -lifeline 4600 -checkpoint 30
02:59:34:WU01:FS01:0x22: -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:59:34:WU01:FS01:0x22: Config: <none>
02:59:34:WU01:FS01:0x22:************************************ Build *************************************
02:59:34:WU01:FS01:0x22: Version: 0.0.2
02:59:34:WU01:FS01:0x22: Date: Dec 6 2019
02:59:34:WU01:FS01:0x22: Time: 21:30:31
02:59:34:WU01:FS01:0x22: Repository: Git
02:59:34:WU01:FS01:0x22: Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
02:59:34:WU01:FS01:0x22: Branch: HEAD
02:59:34:WU01:FS01:0x22: Compiler: Visual C++ 2008
02:59:34:WU01:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:59:34:WU01:FS01:0x22: Platform: win32 10
02:59:34:WU01:FS01:0x22: Bits: 64
02:59:34:WU01:FS01:0x22: Mode: Release
02:59:34:WU01:FS01:0x22:************************************ System ************************************
02:59:34:WU01:FS01:0x22: CPU: AMD Ryzen 7 1800X Eight-Core Processor
02:59:34:WU01:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
02:59:34:WU01:FS01:0x22: CPUs: 16
02:59:34:WU01:FS01:0x22: Memory: 15.93GiB
02:59:34:WU01:FS01:0x22:Free Memory: 4.33GiB
02:59:34:WU01:FS01:0x22: Threads: WINDOWS_THREADS
02:59:34:WU01:FS01:0x22: OS Version: 6.2
02:59:34:WU01:FS01:0x22:Has Battery: false
02:59:34:WU01:FS01:0x22: On Battery: false
02:59:34:WU01:FS01:0x22: UTC Offset: 2
02:59:34:WU01:FS01:0x22: PID: 14868
02:59:34:WU01:FS01:0x22: CWD: C:\ProgramData\FAHClient\work
02:59:34:WU01:FS01:0x22: OS: Windows 10 Pro
02:59:34:WU01:FS01:0x22: OS Arch: AMD64
02:59:34:WU01:FS01:0x22:********************************************************************************
02:59:34:WU01:FS01:0x22:Project: 11746 (Run 0, Clone 2935, Gen 5)
02:59:34:WU01:FS01:0x22:Unit: 0x000000078ca304f15e6aa66583b9500f
02:59:34:WU01:FS01:0x22:Reading tar file core.xml
02:59:34:WU01:FS01:0x22:Reading tar file integrator.xml
02:59:34:WU01:FS01:0x22:Reading tar file state.xml
02:59:34:WU01:FS01:0x22:Reading tar file system.xml
02:59:35:WU01:FS01:0x22:Digital signatures verified
02:59:35:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:59:35:WU01:FS01:0x22:Version 0.0.2
02:59:52:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
02:59:52:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
02:59:52:WU01:FS01:0x22:Saving result file science.log
02:59:52:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:59:52:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:59:52:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11746 run:0 clone:2935 gen:5 core:0x22 unit:0x000000078ca304f15e6aa66583b9500f
02:59:52:WU01:FS01:Uploading 8.00KiB to 140.163.4.241
I think if this is happening others also, then it may put lot of stress to servers because clients are constantly downloading and uploading.
Sometimes I get folding going, but it will usually end like this
Code: Select all
02:24:53:WU00:FS01:0x22:Project: 11744 (Run 0, Clone 8349, Gen 0)
02:24:53:WU00:FS01:0x22:Unit: 0x000000008ca304f15e6bc406da3f5e39
02:24:53:WU00:FS01:0x22:Reading tar file core.xml
02:24:53:WU00:FS01:0x22:Reading tar file integrator.xml
02:24:53:WU00:FS01:0x22:Reading tar file state.xml
02:24:53:WU00:FS01:0x22:Reading tar file system.xml
02:24:54:WU00:FS01:0x22:Digital signatures verified
02:24:54:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:24:54:WU00:FS01:0x22:Version 0.0.2
02:25:09:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
02:25:09:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:26:27:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
02:27:44:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
02:29:04:WU00:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
02:30:28:WU00:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
02:31:54:WU00:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
02:33:20:WU00:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
02:34:40:WU00:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
02:35:58:WU00:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
02:37:15:WU00:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
02:38:32:WU00:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
02:39:52:WU00:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
02:41:11:WU00:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
02:42:27:WU00:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
02:43:43:WU00:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
02:45:02:WU00:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
02:46:19:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:46:19:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
02:46:34:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:46:34:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
02:46:50:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:46:50:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
02:46:50:WU00:FS01:0x22:ERROR:114: Max Retries Reached
02:46:50:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
02:46:50:WU00:FS01:0x22:Saving result file badstate-0.xml
02:46:50:WU00:FS01:0x22:Saving result file badstate-1.xml
02:46:50:WU00:FS01:0x22:Saving result file badstate-2.xml
02:46:51:WU00:FS01:0x22:Saving result file checkpointState.xml
02:46:51:WU00:FS01:0x22:Saving result file checkpt.crc
02:46:51:WU00:FS01:0x22:Saving result file positions.xtc
02:46:51:WU00:FS01:0x22:Saving result file science.log
02:46:51:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:46:51:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:46:51:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11744 run:0 clone:8349 gen:0 core:0x22 unit:0x000000008ca304f15e6bc406da3f5e39
I even went default GPU profile because I was thinking it not may like mem OC, but it won't make any difference. I have got done only few thouse new WU's.
Mod note: pleas use Code tags on log files, not Quote
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 4:20 am
by Joe_H
Code: Select all
02:59:52:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
This message usually is associated with a problem with the OpenCL setup. But since your system can also proceed at other times, do you happen to be folding on a system that is a laptop that switched between using an integrated GPU for low power and a discrete GPU when more power is needed?
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 4:26 am
by JiiPee
Joe_H wrote:Code: Select all
02:59:52:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
This message usually is associated with a problem with the OpenCL setup. But since your system can also proceed at other times, do you happen to be folding on a system that is a laptop that switched between using an integrated GPU for low power and a discrete GPU when more power is needed?
No, this is desktop system with single GPU
Work Server 140.163.4.231 Unresponsive?
Posted: Sat Mar 14, 2020 7:08 am
by schertt
06:59:38:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
06:59:59:ERROR:WU01:FS01:Exception: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:00:21:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
07:00:42:ERROR:WU01:FS01:Exception: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
CPU is pulling jobs fine but GPU is failing from this server. Host traces out to the internet but dies at the far end; telnet on both ports is unresponsive. GPU is a Vega 64
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 7:11 am
by Texno
I have also been experiencing issues with 140.163.4.231 and 140.163.4.241.
Code: Select all
05:46:10:WU02:FS01:Assigned to work server 140.163.4.231
05:46:10:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106 [GeForce GTX 1060 3GB] 3935 from 140.163.4.231
05:46:10:WU02:FS01:Connecting to 140.163.4.231:8080
05:46:31:ERROR:WU02:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
Code: Select all
05:57:16:WU02:FS01:Assigned to work server 140.163.4.231
05:57:16:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106 [GeForce GTX 1060 3GB] 3935 from 140.163.4.231
05:57:16:WU02:FS01:Connecting to 140.163.4.231:8080
05:57:37:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
05:57:37:WU02:FS01:Connecting to 140.163.4.231:80
05:57:58:ERROR:WU02:FS01:Exception: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Code: Select all
06:08:21:WU02:FS01:Assigned to work server 140.163.4.241
06:08:21:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GP106 [GeForce GTX 1060 3GB] 3935 from 140.163.4.241
06:08:21:WU02:FS01:Connecting to 140.163.4.241:8080
06:10:13:ERROR:WU02:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
Re: Work Server 140.163.4.231 Unresponsive
Posted: Sat Mar 14, 2020 7:12 am
by Texno
I am having the same issue with both 140.163.4.231 and 140.163.4.241
Re: Work Server 140.163.4.231 Unresponsive?
Posted: Sat Mar 14, 2020 7:20 am
by jonault
Yes, it's a known issue, there are a couple other discussion threads about this. If you leave F@H running it will keep trying to connect & may eventually get through. If you do get through, the downloads may be slow. I've been seeing download times as long as an hour in some cases, but they do eventually complete & start folding.
Re: Work Server 140.163.4.231 Unresponsive?
Posted: Sat Mar 14, 2020 7:28 am
by schertt
Thanks. Looks like it just managed to pull a job finally
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 11:57 am
by JacobKlein
For the connection problem in the first posts...
I'm having the same problem and I have some questions:
- Is this problem typical? I'm new to Folding@home, and am not used to having idle GPUs!
- How long until this is typically resolved?
- Have there been any/many cases where the entire project is out of GPU work?
Thanks,
Jacob
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 12:32 pm
by Nathan_P
Hi
Its not unheard of to have an idle gpu and there have been occasions in the past when the project has been low or out of work. However in this instance the cause is a good one, usually its a server that's crashed. A couple of big cloud computing companies have dedicated serious resources to the project. By serious I mean 6,000 gpu's from one and over 60,000 idle cpu cores from another. These are causing unprecedented demand on the server infrastructure. F@H usually issues around ~4k WU per hour on average, this morning its at over ~27k. The teams are spinning up new servers and additional projects as fast as they can but it all takes time. please be patient and a WU will come your way eventually. If you want to try and and hurry the process up try pausing then unpausing your slots - you *MAY* get a WU quicker but no guarantees
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 12:38 pm
by JacobKlein
Thanks for the response. I understand that current turbulence for Folding@home is something the team is trying to handle. I'm just trying to figure out the best configuration for BOINC and Folding@home for my resources, and I really hate to see GPUs that I've reallocated to be dedicated to Folding@home, now go idle.
I am hopeful for more transparency about these connection problems, in the form of announcements.
Re: Connection problems to 140.163.4.231
Posted: Sat Mar 14, 2020 1:50 pm
by blandyuk
Both 140.163.4.231 and 140.163.4.241 are up and down like a yoyo. I wish we could select what work our GPUs could work on. Currently, most of my GPUs are not working now. Waste of time! Sort it please.