54 failed attempts to upload to 66.170.111.50
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 54 failed attempts to upload to 66.170.111.50
Could you post the log showing attempts to send each .. interested in seeing timings (how long each takes to fail on multiple attempts - is it always the same time/size transmitted for each WU) and size (so I can work out if it consistently fails at a certain size transmitted over both WUs) .. so far neither has been completed or so the WU Status indicates - can you confirm that all attempts have failed for each WU not uploaded successfully but not deleted as bruce is concerned about ... have you had WUs from the same projects upload successfully to this server? (before and since) ... Happy to keep trying to diagnose this is you have the time - we need to get to the heart of it as regular issues such as you are suffering is just not a great user experience for you
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 54 failed attempts to upload to 66.170.111.50
total longshot ... but could you try disabling any firewall av shortly before a resend attempt is due and see if that makes any difference ... this would discount some form of oddity in av heuristics tripping over some part of the WU file package ... is a real long shot but imo worth a try if only to discount it.
Also currently checking if something like TCPView or Wireshark will allow monitoring of upload attempts and identify which end of the connection is dropping - looking to ascertain if it is something the server is doing either just dropping the connection or sending a failed transfer message or something your client is doing - or something somewhere in between ... I really am not a networking specialist - actually just a very sub amateur tinkerer - but I reckon tere ought to be some tool that can help identify this ... It may be buried somewhere in system logs on either you machine and/or the server but wouldn't know where to look hence trying to monitor as it actually happens.
Also currently checking if something like TCPView or Wireshark will allow monitoring of upload attempts and identify which end of the connection is dropping - looking to ascertain if it is something the server is doing either just dropping the connection or sending a failed transfer message or something your client is doing - or something somewhere in between ... I really am not a networking specialist - actually just a very sub amateur tinkerer - but I reckon tere ought to be some tool that can help identify this ... It may be buried somewhere in system logs on either you machine and/or the server but wouldn't know where to look hence trying to monitor as it actually happens.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 49
- Joined: Sat Aug 15, 2020 5:43 pm
- Location: Pacific Northwest, USA
Re: 54 failed attempts to upload to 66.170.111.50
@Neil-B
Not sure that I understand the question about the logs. The logs I've uploaded are the ones shown in the F@HControl, I just filter for the unit and copy the log. The times between the attempts should be consistent and set by the client, I have no idea in the world how to manually trigger an attempt to upload. I have noted the the time between attempts does get longer with each attempt and that makes sense to me and I assume that there is an pattern that is in the client to do this. To date all of the WU's for this server eventually go away even after failing to do so for a long time, but I do not know if it is because they uploaded finally or if they timed out. I have not figured out how to read the logs to see what happened to the ones after they disappear. I see several different logs on my system when I look in the F@H Directory, but the naming convention is indecipherable, and when I look for a particular WU in different log files I'm not finding the ones with trouble. I do not understand the way the files are produced and named which may be the problem - my ignorance.
I have just figured out how to install HFM to help me figure out how many WU's I've processed per project so I can answer the question on WU's for the same project. But HFM shows the status as a "FINISHED UNIT" even though the FAH WU STATUS page shows "Not Found" for the same WU. I may (probably) not know how to interpret HFM data and have not found a good explanation of how to read the information - something I find endemic to all F@H tools BTW, at least I cannot find good step by step instructions for doing anything other than installing the client: Information on troubleshooting, understanding what to expect, when to get excited about apparent failures etc is hard/impossible to find and the learning curve is steep, steep enough that if I wasn't so passionate about trying to help with the science behind defeating Covid I would have stopped ages ago. I guess I could just stop looking at the status of the client multiple times a day and just look once a week, but that just isn't me. And yes I know you and Bruce and other admins'/mods are volunteers and I really appreciate the effort you and Bruce are exerting to help noob's like me get up to speed.
As I stated in a previous post, I do not know how (see whine about hard to find information in previous paragraph) to manually delete a WU from the system, so I am not manually deleting them. The only thing I delete is the entries in the log showing the % increase towards completion, in an effort to minimize the length of the post per the instructions @PantherX posted on how to submit code in a post.
Not sure that I understand the question about the logs. The logs I've uploaded are the ones shown in the F@HControl, I just filter for the unit and copy the log. The times between the attempts should be consistent and set by the client, I have no idea in the world how to manually trigger an attempt to upload. I have noted the the time between attempts does get longer with each attempt and that makes sense to me and I assume that there is an pattern that is in the client to do this. To date all of the WU's for this server eventually go away even after failing to do so for a long time, but I do not know if it is because they uploaded finally or if they timed out. I have not figured out how to read the logs to see what happened to the ones after they disappear. I see several different logs on my system when I look in the F@H Directory, but the naming convention is indecipherable, and when I look for a particular WU in different log files I'm not finding the ones with trouble. I do not understand the way the files are produced and named which may be the problem - my ignorance.
I have just figured out how to install HFM to help me figure out how many WU's I've processed per project so I can answer the question on WU's for the same project. But HFM shows the status as a "FINISHED UNIT" even though the FAH WU STATUS page shows "Not Found" for the same WU. I may (probably) not know how to interpret HFM data and have not found a good explanation of how to read the information - something I find endemic to all F@H tools BTW, at least I cannot find good step by step instructions for doing anything other than installing the client: Information on troubleshooting, understanding what to expect, when to get excited about apparent failures etc is hard/impossible to find and the learning curve is steep, steep enough that if I wasn't so passionate about trying to help with the science behind defeating Covid I would have stopped ages ago. I guess I could just stop looking at the status of the client multiple times a day and just look once a week, but that just isn't me. And yes I know you and Bruce and other admins'/mods are volunteers and I really appreciate the effort you and Bruce are exerting to help noob's like me get up to speed.
As I stated in a previous post, I do not know how (see whine about hard to find information in previous paragraph) to manually delete a WU from the system, so I am not manually deleting them. The only thing I delete is the entries in the log showing the % increase towards completion, in an effort to minimize the length of the post per the instructions @PantherX posted on how to submit code in a post.
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
RTX 2080 Super
AMD Ryzen 9 3900X
Re: 54 failed attempts to upload to 66.170.111.50
On a standard Windows installation, FAH working files are in %AppData%\FAHClient (just copy this in the address line of an Explorer windows: it leads to C:\Users\Colonel_Klink\AppData\Roaming\FAHClient).
In there, you'll find a folder "work" containing one or several folders (00, 01, 02, nn) corresponding to the configured slots in Advanced Control.
In order to delete them, you have to quit Advanced Control - you can do that using the contextual menu that appears when you click the molecule icon in the right part of your taskbar, under ^. It is best to first finish all WUs before quitting, or at least to pause all WUs you don't want to delete, but it is safer the first time to finish everything. (Just pausing the concerned WU should work too, actually. You won't be able to delete everything, but it should be enough for FAH to consider the WU as failed and move on.)
Then you can delete the 00, 01, 02, or nn concerned folder within "work", or the whole "work" folder if you have only that one slot.
After that you restart FAH using the Folding@home icon on your desktop. And FAH will restart the slot with a new WU.
Wait a bit before trying: in case I'm mistaken someone sure will correct me.
In there, you'll find a folder "work" containing one or several folders (00, 01, 02, nn) corresponding to the configured slots in Advanced Control.
In order to delete them, you have to quit Advanced Control - you can do that using the contextual menu that appears when you click the molecule icon in the right part of your taskbar, under ^. It is best to first finish all WUs before quitting, or at least to pause all WUs you don't want to delete, but it is safer the first time to finish everything. (Just pausing the concerned WU should work too, actually. You won't be able to delete everything, but it should be enough for FAH to consider the WU as failed and move on.)
Then you can delete the 00, 01, 02, or nn concerned folder within "work", or the whole "work" folder if you have only that one slot.
After that you restart FAH using the Folding@home icon on your desktop. And FAH will restart the slot with a new WU.
Wait a bit before trying: in case I'm mistaken someone sure will correct me.
-
- Posts: 49
- Joined: Sat Aug 15, 2020 5:43 pm
- Location: Pacific Northwest, USA
Re: 54 failed attempts to upload to 66.170.111.50
@ajm
Thanks for the information.
I've copied one of the logs, and wonder why this does not show any of the attempts to upload. But I do see the attempts to upload in the log displayed in F@HContol. Confusion level is spiking.......
I'll wait a few days before trying to manually delete using this method.
Thanks for the information.
I've copied one of the logs, and wonder why this does not show any of the attempts to upload. But I do see the attempts to upload in the log displayed in F@HContol. Confusion level is spiking.......
I'll wait a few days before trying to manually delete using this method.
Code: Select all
*********************** Log Started 2020-10-02T22:03:48Z ***********************
************************** Gromacs Folding@home Core ***************************
Type: 0xa7
Core: Gromacs
Args: -dir 00 -suffix 01 -version 706 -lifeline 7332 -checkpoint 15 -np
21
************************************ CBang *************************************
Date: Nov 27 2019
Time: 03:40:09
Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
Branch: master
Compiler: Visual C++ 2008
Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
Platform: win32 10
Bits: 64
Mode: Release
************************************ System ************************************
CPU: AMD Ryzen 9 3900X 12-Core Processor
CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
CPUs: 24
Memory: 31.93GiB
Free Memory: 27.36GiB
Threads: WINDOWS_THREADS
OS Version: 6.2
Has Battery: false
On Battery: false
UTC Offset: -7
PID: 6524
CWD: C:\Users\ejoep\AppData\Roaming\FAHClient\work
******************************** Build - libFAH ********************************
Version: 0.0.19
Author: Joseph Coffland <joseph@cauldrondevelopment.com>
Copyright: 2019 foldingathome.org
Homepage: https://foldingathome.org/
Date: Nov 25 2019
Time: 17:12:41
Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
Branch: master
Compiler: Visual C++ 2008
Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
Platform: win32 10
Bits: 64
Mode: Release
************************************ Build *************************************
SIMD: avx_256
********************************************************************************
Project: 17408 (Run 0, Clone 46, Gen 40)
Unit: 0x0000002d42aa6f325f618431680c3624
Reading tar file core.xml
Reading tar file frame40.tpr
Digital signatures verified
Calling: mdrun -s frame40.tpr -o frame40.trr -x frame40.xtc -cpt 15 -nt 21
Steps: first=5000000 total=125000
Completed 1 out of 125000 steps (0%)
Completed 1250 out of 125000 steps (1%)
Completed 2500 out of 125000 steps (2%)
Completed 3750 out of 125000 steps (3%)
Completed 120000 out of 125000 steps (96%)
Completed 121250 out of 125000 steps (97%)
Completed 122500 out of 125000 steps (98%)
Completed 123750 out of 125000 steps (99%)
Completed 125000 out of 125000 steps (100%)
Saving result file ..\logfile_01.txt
Saving result file frame40.trr
Saving result file frame40.xtc
Saving result file md.log
Saving result file science.log
Folding@home Core Shutdown: FINISHED_UNIT
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
RTX 2080 Super
AMD Ryzen 9 3900X
Re: 54 failed attempts to upload to 66.170.111.50
Early in your log, I see the following entry
CWD: C:\Users\ejoep\AppData\Roaming\FAHClient\work
I've never see that on anybody else's log. Mine reads
CWD: C:\Users\ejoep\AppData\Roaming\FAHClient
Did you do something to cause that to happen? That could be enough to have FAH create two different logs and the entries you are not finding may be in the other one. How do you start FAH?
CWD: C:\Users\ejoep\AppData\Roaming\FAHClient\work
I've never see that on anybody else's log. Mine reads
CWD: C:\Users\ejoep\AppData\Roaming\FAHClient
Did you do something to cause that to happen? That could be enough to have FAH create two different logs and the entries you are not finding may be in the other one. How do you start FAH?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 54 failed attempts to upload to 66.170.111.50
@Colonel_Klink ... my bad ... I had only spotted the post immediately preceding mine which has the WU Status links - should have looked at your previous post.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 54 failed attempts to upload to 66.170.111.50
OK .. so looking at the logs there may be a bit of a pattern having looked at four of the PRCGs from the logs posted.
The failures appear to all be failing roughly 100-120 seconds into transfer attempt - and this was for four different WUs of different sizes so my gut tells me this is some sort of timeout dropping of connection rather than a size related cause - however all WUs had progressed about the same amount into the upload by size (3.5-4.0MiB) before stalling so size may be linked into why it is stalling ... the one that failed a few times uploaded in less than 10 seconds and obviously managed to get past the size related point ... this might mean size is a red herring and is simply linked to a slow connection speed that is similar for both WUs and therefore fails at the same point after 100-120 seconds.
One thing that is noticeable for the failures is that within a single WU PRCG times to failure do vary a bit (sometimes 15 seconds or so) but the reported upload percentage seems to remain surprisingly consistent (this may however just be an artefact of the reporting intervals for upload percentages).
I guess firstly we need to work out what is causing the upload of something that isn't large to take that long - especially when the one that completed proves it can upload much quicker - as sorting that may solve the issues ... Either the upload is stalling (maybe size related) and the upload then timesout, or the upload rate is for some reason throttled and the timeout curtails the upload.
Then it would be useful to track down what is causing the timeout - is it server side dropping a stalled connection - or is it client/folding kit side for the same reasons.
The use of something like TCPView would allow you to see if the failing upload connections are initially quick and then stall for a period before the connections timeout or whether they are simply very slow and timeout for some other reason curtailing the upload.
I am sure one/all of the mods will chip in if they recall seeing this type of pattern before ... I know there have been "hung connection" issues which may be where the timeout has come from - but why it is hanging part way into the WU upload is tricky, especially since it does sometimes clear, and from what I believe you have indicated doesn't happen with every WU to that server.
The failures appear to all be failing roughly 100-120 seconds into transfer attempt - and this was for four different WUs of different sizes so my gut tells me this is some sort of timeout dropping of connection rather than a size related cause - however all WUs had progressed about the same amount into the upload by size (3.5-4.0MiB) before stalling so size may be linked into why it is stalling ... the one that failed a few times uploaded in less than 10 seconds and obviously managed to get past the size related point ... this might mean size is a red herring and is simply linked to a slow connection speed that is similar for both WUs and therefore fails at the same point after 100-120 seconds.
One thing that is noticeable for the failures is that within a single WU PRCG times to failure do vary a bit (sometimes 15 seconds or so) but the reported upload percentage seems to remain surprisingly consistent (this may however just be an artefact of the reporting intervals for upload percentages).
I guess firstly we need to work out what is causing the upload of something that isn't large to take that long - especially when the one that completed proves it can upload much quicker - as sorting that may solve the issues ... Either the upload is stalling (maybe size related) and the upload then timesout, or the upload rate is for some reason throttled and the timeout curtails the upload.
Then it would be useful to track down what is causing the timeout - is it server side dropping a stalled connection - or is it client/folding kit side for the same reasons.
The use of something like TCPView would allow you to see if the failing upload connections are initially quick and then stall for a period before the connections timeout or whether they are simply very slow and timeout for some other reason curtailing the upload.
I am sure one/all of the mods will chip in if they recall seeing this type of pattern before ... I know there have been "hung connection" issues which may be where the timeout has come from - but why it is hanging part way into the WU upload is tricky, especially since it does sometimes clear, and from what I believe you have indicated doesn't happen with every WU to that server.
Last edited by Neil-B on Sat Oct 03, 2020 9:29 pm, edited 3 times in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 54 failed attempts to upload to 66.170.111.50
the log from the work unit work folder logs the processing of the wu and so doesn't have the log events for the upload attempts that the client does - it is simply about the core processing.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 49
- Joined: Sat Aug 15, 2020 5:43 pm
- Location: Pacific Northwest, USA
Re: 54 failed attempts to upload to 66.170.111.50
@bruce
I installed F@H using the installer on the F@H site, and did not change anything as I do not normally changes things during installs that I do not understand. As a matter of fact I had never looked in the /work sub-directory until @ajm mentioned it today. prior to this I had only looked at the files in the /logs sub-directory to look for logs. Then I could not determine what the file name meant. as an example the file named log-200200820-160126 has what appears to be a date in the first part of the numerical part of the name and in this example the 160126 is nonsensical to me. The contents appear to be the basis for the log viewer in F2HControl without the filtering.
I have looked at all of the logs I have access to and everyone of them has the /work form of the URL you mention. No idea where it came from.
Since I've only been collecting logs for WU's that to not upload, I can only assume that the logs for the ones that did upload contain the same URL. But I see that there must be some client activity that deletes the log entries for the ones that successfully upload, because I cannot find logs for the ones that successfully upload. Do they exist somewhere?
I as I wrote this a WU for another server uploaded before I could see it's log, but when I check the status link you gave me earlier I see I have credit for it https://apps.foldingathome.org/wu#proje ... 377&gen=74 so I am confused as to why most of the WU's assigned to me process an appear to upload properly, but for this particular server now, there seems to be something unique that is causing a problem.
I installed F@H using the installer on the F@H site, and did not change anything as I do not normally changes things during installs that I do not understand. As a matter of fact I had never looked in the /work sub-directory until @ajm mentioned it today. prior to this I had only looked at the files in the /logs sub-directory to look for logs. Then I could not determine what the file name meant. as an example the file named log-200200820-160126 has what appears to be a date in the first part of the numerical part of the name and in this example the 160126 is nonsensical to me. The contents appear to be the basis for the log viewer in F2HControl without the filtering.
I have looked at all of the logs I have access to and everyone of them has the /work form of the URL you mention. No idea where it came from.
Since I've only been collecting logs for WU's that to not upload, I can only assume that the logs for the ones that did upload contain the same URL. But I see that there must be some client activity that deletes the log entries for the ones that successfully upload, because I cannot find logs for the ones that successfully upload. Do they exist somewhere?
I as I wrote this a WU for another server uploaded before I could see it's log, but when I check the status link you gave me earlier I see I have credit for it https://apps.foldingathome.org/wu#proje ... 377&gen=74 so I am confused as to why most of the WU's assigned to me process an appear to upload properly, but for this particular server now, there seems to be something unique that is causing a problem.
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
RTX 2080 Super
AMD Ryzen 9 3900X
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: 54 failed attempts to upload to 66.170.111.50
log-20200920-200831 means this:Colonel_Klink wrote:...Then I could not determine what the file name meant. as an example the file named log-200200820-160126 has what appears to be a date in the first part of the numerical part of the name and in this example the 160126 is nonsensical to me...
log - file name
2020 - the year
09 - the month
20 - the date
- - divider for date and time (in UTC)
20 - that is houtrs (8:00 PM)
08 - that is minutes (8:08 PM)
31 - that is seconds
In your case, I assumed a typo since it isn't 2002, but 2020 for the year.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 54 failed attempts to upload to 66.170.111.50
the /logs folder contains (by default) the last 16 logfiles (in sequence) using the naming structure as explained by PantherX ... The logfiles should contain all WUs processed - ones that uploaded first time and those that have had issues ... you should find that this date/time stamp for the logs listed advances date-wise - inside each logs are all the log entries from the preceding log to the date-time stamp of the log name.
The individual core WU logs in the /work directories get deleted once the wu gets uploaded - as does the whole of the WU sub folder ... the main logs should contain all wus whether they have problems or not.
The individual core WU logs in the /work directories get deleted once the wu gets uploaded - as does the whole of the WU sub folder ... the main logs should contain all wus whether they have problems or not.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 49
- Joined: Sat Aug 15, 2020 5:43 pm
- Location: Pacific Northwest, USA
Re: 54 failed attempts to upload to 66.170.111.50
I've run several trace routes on the IP for this server over the past 2 hours. It appears to me that there is a network connectivity issue near the server that may be contributing to this issue.
Code: Select all
C:\Users\ejoep>tracert 66.170.111.50
Tracing route to fah-w1.vmware.com [66.170.111.50]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 192.168.0.1
2 7 ms 6 ms 7 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 8 ms 9 ms 8 ms 174.127.182.64
4 14 ms 17 ms 7 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 9 ms 9 ms 10 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 10 ms 9 ms 9 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 15 ms 11 ms 11 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 12 ms 13 ms 18 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 11 ms 11 ms 11 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 11 ms 11 ms 11 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 13 ms 12 ms 12 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 12 ms 11 ms 11 ms ae-2-52.ear3.Seattle1.Level3.net [4.69.203.169]
13 16 ms 15 ms 17 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 55 ms 51 ms 55 ms fah-w1.vmware.com [66.170.111.50]
Trace complete.
C:\Users\ejoep>tracert 66.170.111.50
Tracing route to fah-w1.vmware.com [66.170.111.50]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 192.168.0.1
2 8 ms 7 ms 7 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 10 ms 8 ms 9 ms 174.127.182.64
4 10 ms 9 ms 8 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 10 ms 11 ms 9 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 10 ms 10 ms 9 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 11 ms 11 ms 11 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 16 ms 12 ms 12 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 11 ms 11 ms 11 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 13 ms 12 ms 12 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 12 ms 13 ms 11 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 * * * Request timed out.
13 16 ms 18 ms 18 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 54 ms 51 ms 51 ms fah-w1.vmware.com [66.170.111.50]
Trace complete.
C:\Users\ejoep>tracert 66.170.111.50
Tracing route to fah-w1.vmware.com [66.170.111.50]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 192.168.0.1
2 7 ms 11 ms 8 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 9 ms 8 ms 10 ms 174.127.182.64
4 8 ms 9 ms 7 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 10 ms 9 ms 9 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 10 ms 9 ms 10 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 12 ms 12 ms 11 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 12 ms 12 ms 12 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 10 ms 13 ms 13 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 10 ms 11 ms 11 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 12 ms 11 ms 11 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 * * * Request timed out.
13 25 ms 24 ms 15 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 52 ms 50 ms 51 ms fah-w1.vmware.com [66.170.111.50]
Trace complete.
C:\Users\ejoep>tracert 66.170.111.50
Tracing route to fah-w1.vmware.com [66.170.111.50]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 192.168.0.1
2 8 ms 6 ms 7 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 8 ms 8 ms 11 ms 174.127.182.64
4 14 ms 8 ms 8 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 10 ms 9 ms 9 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 16 ms 9 ms 10 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 12 ms 11 ms 11 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 11 ms 12 ms 11 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 11 ms 74 ms 10 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 12 ms 11 ms 11 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 15 ms 17 ms 15 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 * 11 ms 11 ms ae-2-52.ear3.Seattle1.Level3.net [4.69.203.169]
13 15 ms 16 ms 16 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 51 ms 50 ms 51 ms fah-w1.vmware.com [66.170.111.50]
Trace complete.
C:\Users\ejoep>tracert 66.170.111.50
Tracing route to fah-w1.vmware.com [66.170.111.50]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 192.168.0.1
2 8 ms 8 ms 9 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 9 ms 7 ms 9 ms 174.127.182.64
4 8 ms 8 ms 14 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 9 ms 10 ms 14 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 9 ms 10 ms 9 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 11 ms 12 ms 14 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 11 ms 11 ms 11 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 11 ms 11 ms 11 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 11 ms 11 ms 11 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 12 ms 11 ms 11 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 * * * Request timed out.
13 16 ms 15 ms 15 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 50 ms 51 ms 52 ms fah-w1.vmware.com [66.170.111.50]
Trace complete.
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
RTX 2080 Super
AMD Ryzen 9 3900X
Re: 54 failed attempts to upload to 66.170.111.50
1 <1 ms <1 ms <1 ms 192.168.0.1
2 7 ms 6 ms 7 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 8 ms 9 ms 8 ms 174.127.182.64
4 14 ms 17 ms 7 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 9 ms 9 ms 10 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 10 ms 9 ms 9 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 15 ms 11 ms 11 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 12 ms 13 ms 18 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 11 ms 11 ms 11 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 11 ms 11 ms 11 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 13 ms 12 ms 12 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 12 ms 11 ms 11 ms ae-2-52.ear3.Seattle1.Level3.net [4.69.203.169]
13 16 ms 15 ms 17 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 55 ms 51 ms 55 ms fah-w1.vmware.com [66.170.111.50]
Nodes 1-3 are between your client and your ISP. Nodes 4-10 are the backend transport layer used by your ISP to get to Seattle. Nodes 11-12 are the routers to get to VNware's gateway. Node 13-16 are the routers inside VMware-INC. I wouldn't worry much about nodess 14 and 15. There are often local routers to get to the actual image of the server ("fah-w1") doing FAH's workload.
Any router can be configured to drop pings. It's a commonly used technique designed to let the router concentrate on doing efficient routing and not wasting processing cycles responding to the pings. I expect that 15 is a hyperviser managing a number of vmware images.
2 7 ms 6 ms 7 ms 67-218-102-103.cust.layer42.net [67.218.102.103]
3 8 ms 9 ms 8 ms 174.127.182.64
4 14 ms 17 ms 7 ms be1.cr1-ptahe-b.bb.as11404.net [174.127.136.189]
5 9 ms 9 ms 10 ms be12.cr1-ptw-b.bb.as11404.net [174.127.150.176]
6 10 ms 9 ms 9 ms be10.cr1-ptw-a.bb.as11404.net [174.127.149.228]
7 15 ms 11 ms 11 ms be11.cr3-sea-b.bb.as11404.net [174.127.150.174]
8 12 ms 13 ms 18 ms be10.cr3-sea-a.bb.as11404.net [65.50.198.62]
9 11 ms 11 ms 11 ms be55.cr2-sea-a.bb.as11404.net [65.50.198.67]
10 11 ms 11 ms 11 ms be4.cr2-tuk2.bb.as11404.net [174.127.136.21]
11 13 ms 12 ms 12 ms 9-1-2.ear2.Seattle1.Level3.net [4.16.175.9]
12 12 ms 11 ms 11 ms ae-2-52.ear3.Seattle1.Level3.net [4.69.203.169]
13 16 ms 15 ms 17 ms VMWARE-INC.ear3.Seattle1.Level3.net [4.16.169.194]
14 * * * Request timed out.
15 * * * Request timed out.
16 55 ms 51 ms 55 ms fah-w1.vmware.com [66.170.111.50]
Nodes 1-3 are between your client and your ISP. Nodes 4-10 are the backend transport layer used by your ISP to get to Seattle. Nodes 11-12 are the routers to get to VNware's gateway. Node 13-16 are the routers inside VMware-INC. I wouldn't worry much about nodess 14 and 15. There are often local routers to get to the actual image of the server ("fah-w1") doing FAH's workload.
Any router can be configured to drop pings. It's a commonly used technique designed to let the router concentrate on doing efficient routing and not wasting processing cycles responding to the pings. I expect that 15 is a hyperviser managing a number of vmware images.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 49
- Joined: Sat Aug 15, 2020 5:43 pm
- Location: Pacific Northwest, USA
Re: 54 failed attempts to upload to 66.170.111.50
@bruce,
Thanks for the comments on the Tracert. I'd forgotten that some sysadmins turn off responses to pings and tracert - been too long since I messed around with TCP/IP, pings and tracerts were very useful in the early-mid 80's.
BTW all of the WU's that were not uploading late yesterday afternoon uploaded overnight and have been replaced by a few new ones assigned by the same server. Noticing this pattern has repeated a few times I started looking at the the Server status page and have noticed that the two servers that do this most often for me (66.170.111.50 and 140.163.4.231) both do not have a collection server. Following the note at the bottom of the server status table that says to hover of the CS field for further information, I hovered over the NO in the HAS CS column for both severs and I see the comment "Failed: aws3foldingathome.org" is common to both.
Can you explain the comment "Failed: aws3foldingathome.org" to me, and what the relationship between having a collection server versus not having a collection server might have to do with this issue, or point me to a tutorial on how the collection and assignment servers work? I wonder if not having a collection server, means that collecting finished WU's has a lower priority than assigning WU's and the upload process times out on the server before an upload is complete. If this is what is happening, I will stop worrying about WU's failing to upload until they are rather near to the timeout date/time.
Thanks for the comments on the Tracert. I'd forgotten that some sysadmins turn off responses to pings and tracert - been too long since I messed around with TCP/IP, pings and tracerts were very useful in the early-mid 80's.
BTW all of the WU's that were not uploading late yesterday afternoon uploaded overnight and have been replaced by a few new ones assigned by the same server. Noticing this pattern has repeated a few times I started looking at the the Server status page and have noticed that the two servers that do this most often for me (66.170.111.50 and 140.163.4.231) both do not have a collection server. Following the note at the bottom of the server status table that says to hover of the CS field for further information, I hovered over the NO in the HAS CS column for both severs and I see the comment "Failed: aws3foldingathome.org" is common to both.
Can you explain the comment "Failed: aws3foldingathome.org" to me, and what the relationship between having a collection server versus not having a collection server might have to do with this issue, or point me to a tutorial on how the collection and assignment servers work? I wonder if not having a collection server, means that collecting finished WU's has a lower priority than assigning WU's and the upload process times out on the server before an upload is complete. If this is what is happening, I will stop worrying about WU's failing to upload until they are rather near to the timeout date/time.
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
RTX 2080 Super
AMD Ryzen 9 3900X