Work Unit Upload Failure

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
tau2pi4u
Posts: 2
Joined: Thu Mar 19, 2020 8:15 pm

Work Unit Upload Failure

Post by tau2pi4u »

A work unit the CPU on one of my machines completed has been failing to upload. I waited to see if it'd fix itself eventually, but it's been over a day. Other work units have successfully completed and been uploaded since this one finished.

I'm on version 7.5.1 and this machine has an i7 2600K and a GTX 1060 6GB.

The work unit information is as follows:

Code: Select all

PRCG 11758 (0, 2069, 0)
Slot ID: 1
Work ID: 02
Status: Send
Progress: 100%
FahCore 0x22
Waiting on: Send results
Attempts: 29
Assigned 2020-03-17T17:12:45Z
Timeout 2020-03-18T17:12:45Z
Expiration 2020-03-25T22:00:44Z
Work Server: 155.247.164.213
Collection Server: 155.247.164.214
This should be the relevant bit of the log beneath. Full log is too long to post.

Code: Select all

21:14:21:WU00:FS00:0xa7:*********************** Log Started 2020-03-17T21:14:21Z ***********************
21:14:21:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
21:14:21:WU00:FS00:0xa7:       Type: 0xa7
21:14:21:WU00:FS00:0xa7:       Core: Gromacs
21:14:21:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 14416 -checkpoint 15 -np
21:14:21:WU00:FS00:0xa7:             7
21:14:21:WU00:FS00:0xa7:************************************ CBang *************************************
21:14:21:WU00:FS00:0xa7:       Date: Oct 26 2019
21:14:21:WU00:FS00:0xa7:       Time: 01:38:25
21:14:21:WU00:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
21:14:21:WU00:FS00:0xa7:     Branch: master
21:14:21:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
21:14:21:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:14:21:WU00:FS00:0xa7:   Platform: win32 10
21:14:21:WU00:FS00:0xa7:       Bits: 64
21:14:21:WU00:FS00:0xa7:       Mode: Release
21:14:21:WU00:FS00:0xa7:************************************ System ************************************
21:14:21:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
21:14:21:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
21:14:21:WU00:FS00:0xa7:       CPUs: 8
21:14:21:WU00:FS00:0xa7:     Memory: 15.97GiB
21:14:21:WU00:FS00:0xa7:Free Memory: 10.17GiB
21:14:21:WU00:FS00:0xa7:    Threads: WINDOWS_THREADS
21:14:21:WU00:FS00:0xa7: OS Version: 6.2
21:14:21:WU00:FS00:0xa7:Has Battery: false
21:14:21:WU00:FS00:0xa7: On Battery: false
21:14:21:WU00:FS00:0xa7: UTC Offset: 0
21:14:21:WU00:FS00:0xa7:        PID: 13208
21:14:21:WU00:FS00:0xa7:        CWD: C:\Users\Will\AppData\Roaming\FAHClient\work
21:14:21:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
21:14:21:WU00:FS00:0xa7:    Version: 0.0.18
21:14:21:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:14:21:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
21:14:21:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
21:14:21:WU00:FS00:0xa7:       Date: Oct 26 2019
21:14:21:WU00:FS00:0xa7:       Time: 01:52:30
21:14:21:WU00:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
21:14:21:WU00:FS00:0xa7:     Branch: master
21:14:21:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
21:14:21:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:14:21:WU00:FS00:0xa7:   Platform: win32 10
21:14:21:WU00:FS00:0xa7:       Bits: 64
21:14:21:WU00:FS00:0xa7:       Mode: Release
21:14:21:WU00:FS00:0xa7:************************************ Build *************************************
21:14:21:WU00:FS00:0xa7:       SIMD: avx_256
21:14:21:WU00:FS00:0xa7:********************************************************************************
21:14:21:WU00:FS00:0xa7:Project: 14328 (Run 4, Clone 5506, Gen 6)
21:14:21:WU00:FS00:0xa7:Unit: 0x000000079bf7a4d65e6d0fdd6b722c0a
21:14:21:WU00:FS00:0xa7:Reading tar file core.xml
21:14:21:WU00:FS00:0xa7:Reading tar file frame6.tpr
21:14:21:WU00:FS00:0xa7:Digital signatures verified
21:14:21:WU00:FS00:0xa7:Reducing thread count from 7 to 6 to avoid domain decomposition by a prime number > 3
21:14:21:WU00:FS00:0xa7:Calling: mdrun -s frame6.tpr -o frame6.trr -cpt 15 -nt 6
21:14:22:WU00:FS00:0xa7:Steps: first=1500000 total=250000
[deleted to reduce length]
22:46:44:WU02:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
22:46:55:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
22:46:55:WU02:FS01:0x22:Saving result file checkpointState.xml
22:46:56:WU02:FS01:0x22:Saving result file checkpt.crc
22:46:56:WU02:FS01:0x22:Saving result file positions.xtc
22:46:56:WU02:FS01:0x22:Saving result file science.log
22:46:56:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
22:46:57:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
22:46:57:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:2069 gen:0 core:0x22 unit:0x000000029bf7a4d55e6d7714cf5c1f2e
22:46:57:WU02:FS01:Uploading 55.24MiB to 155.247.164.213
22:46:57:WU02:FS01:Connecting to 155.247.164.213:8080
22:46:58:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
22:46:58:WU02:FS01:Trying to send results to collection server
22:46:58:WU02:FS01:Uploading 55.24MiB to 155.247.164.214
22:46:58:WU02:FS01:Connecting to 155.247.164.214:8080
22:46:58:ERROR:WU02:FS01:Exception: Transfer failed
22:46:58:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:2069 gen:0 core:0x22 unit:0x000000029bf7a4d55e6d7714cf5c1f2e
22:46:58:WU02:FS01:Uploading 55.24MiB to 155.247.164.213
22:46:58:WU02:FS01:Connecting to 155.247.164.213:8080
22:46:59:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
It's been continually failing in the same way, this is from today (2020-03-19)

Code: Select all

20:03:54:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:2069 gen:0 core:0x22 unit:0x000000029bf7a4d55e6d7714cf5c1f2e
20:03:54:WU02:FS01:Uploading 55.24MiB to 155.247.164.213
20:03:54:WU02:FS01:Connecting to 155.247.164.213:8080
20:03:55:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
20:03:55:WU02:FS01:Trying to send results to collection server
20:03:55:WU02:FS01:Uploading 55.24MiB to 155.247.164.214
20:03:55:WU02:FS01:Connecting to 155.247.164.214:8080
20:03:55:ERROR:WU02:FS01:Exception: Transfer failed
If you do need the full log I have it saved but I'd need to send it over multiple posts because it's ~3.5x the character limit.
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Work Unit Upload Failure

Post by Jesse_V »

Yeah, I'm guessing that's due to the flood of new users and the high demand on the servers at the moment. The developers and research teams are currently focused on getting the work servers back up and capable of meeting demand. I expect that this will help sort out these issues with uploading workunits.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
Joe_H
Site Admin
Posts: 8002
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Work Unit Upload Failure

Post by Joe_H »

There is also a known issue with these servers and several projects hosted there, they are aware and looking into it.
Image
tau2pi4u
Posts: 2
Joined: Thu Mar 19, 2020 8:15 pm

Re: Work Unit Upload Failure

Post by tau2pi4u »

Thanks for the help! If it's already known then I'll just leave the machine running, which I was planning on doing anyway. Good job by all the devs/researchers to scale up and deal with all of this - it's gotta be a lot of work.
Post Reply