Page 1 of 2
Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Thu Feb 11, 2021 10:00 pm
by Der_mit_dem_Hund
Hello everyone,
I have learnt that, recently, acknowledgement of delivered work units (WUs) seems retarded. Credits seem to get lost but may suddenly appear some time after opening a topic in this forum. This is why I would like to ask what may have happened to this WU, delivered five days ago:
Code: Select all
19:31:40:WU01:FS00:0xa7:*********************** Log Started 2021-02-06T19:31:39Z ***********************
19:31:40:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
...
20:02:40:WU01:FS00:0xa7:Completed 485000 out of 500000 steps (97%)
20:36:18:WU01:FS00:0xa7:Completed 490000 out of 500000 steps (98%)
21:10:07:WU01:FS00:0xa7:Completed 495000 out of 500000 steps (99%)
21:44:32:WU01:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
21:44:34:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
21:44:34:WU01:FS00:0xa7:Saving result file frame91.trr
21:44:34:WU01:FS00:0xa7:Saving result file md.log
21:44:34:WU01:FS00:0xa7:Saving result file science.log
21:44:34:WU01:FS00:0xa7:Saving result file traj_comp.xtc
21:44:34:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
21:44:35:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:44:35:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:16927 run:18 clone:323 gen:91 core:0xa7 unit:0x000001430000005b0000421f00000012
21:44:35:WU01:FS00:Uploading 6.40MiB to 129.32.209.201
21:44:35:WU01:FS00:Connecting to 129.32.209.201:8080
Until now, I do get
No results. on opening the respective webpage, https://apps.foldingathome.org/wu#project=16927&run=18&clone=323&gen=91.
Any ideas why?
BTW: Another WU delivered after that on 2021-02-08, 19:44:34, has been credited reliantly. I will stop running Version: 7.6.13 and install Version: 7.6.21 asap.
Thank you very much.
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Thu Feb 11, 2021 10:43 pm
by bruce
There have been reports of problems with 129.32.209.201 and other servers at temple.edu. (Perhaps the ice storm.) Since it was acknowledged, it's probably just a matter of being patient.
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Sun Feb 14, 2021 9:17 am
by PantherX
Please note that as long as the WU has been successfully uploaded to the Server, the science carries on. Just because you don't see the points in the Stats system, doesn't mean that you lost work, it just means that the stats system hasn't yet been updated. In 99% of the cases, you will eventually see your WUs being credited sooner or later depending on how widespread/laborious the process is.
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Thu Feb 18, 2021 9:57 am
by Der_mit_dem_Hund
Thank you,
PantherX,
the problem seem to persist, though. Due to weather condition in the US? Anyway, the WU is still NOT FOUND under
https://apps.foldingathome.org/wu#proje ... 323&gen=91 after nearly two weeks.
Here's the log:
Code: Select all
19:31:40:WU01:FS00:0xa7:*********************** Log Started 2021-02-06T19:31:39Z ***********************
..
21:10:07:WU01:FS00:0xa7:Completed 495000 out of 500000 steps (99%)
21:44:32:WU01:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
21:44:34:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
21:44:34:WU01:FS00:0xa7:Saving result file frame91.trr
21:44:34:WU01:FS00:0xa7:Saving result file md.log
21:44:34:WU01:FS00:0xa7:Saving result file science.log
21:44:34:WU01:FS00:0xa7:Saving result file traj_comp.xtc
21:44:34:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
21:44:35:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:44:35:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:16927 run:18 clone:323 gen:91 core:0xa7 unit:0x000001430000005b0000421f00000012
21:44:35:WU01:FS00:Uploading 6.40MiB to 129.32.209.201
21:44:35:WU01:FS00:Connecting to 129.32.209.201:8080
There is another WU which seems to have gone lost:
https://apps.foldingathome.org/wu#proje ... =786&gen=1 (project:14188 run:3 clone:786 gen:1 core:0xa7 unit:0x00000312000000010000376c00000003)
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Thu Feb 18, 2021 11:31 am
by Neil-B
You log posts only show the start of the uploads - upload attempts can fail for a variety of reasons - can you confirm there was a upload complete sequence for those WUs later in the log with work acknowledged and an estimated points figure?
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Thu Feb 18, 2021 12:11 pm
by Der_mit_dem_Hund
Good question, Neil-B.
I'm afraid that my logs in /var/lib/fahclient/logs/ do not go back to Feb 2 or 6 any longer. Most recent logs are of Feb 11 (your time).
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Thu Feb 18, 2021 5:10 pm
by Neil-B
There is a flag that can be set log-rotate-max that allows more than 16 logs to be kept ... doesn't solve issue these two WUs but might help for future ... In windows I use advanced control configure option, selecting the expert tab and adding log-rotate-max and then a value ... I believe using 0 will keep all logs but never actually used that as I use 99 and move my logs off my folding kit onto another machine every now and then for manipulating/analysis ... If you use linux then someone will be able to weigh in and help.
For these two all I can suggest is that if they were uploaded ok then at some point (maybe even months) downline the points will turn up ... I have yet to not receive points for any WU that has been correctly uploaded and accepted as valid ... actually I use the lack of points in the stats system for a WU I know was on my kit as a flag to go and see what the issue was in case I need to adjust my kit !!
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Fri Feb 26, 2021 11:08 pm
by Der_mit_dem_Hund
Hi,
Neil-B,
I think the parameter
log-rotate-max seems to be the same in all systems, also on Linux.
You may be right with the assumption that especially WUs for the Voelz Lab,
129.32.209.201, are slowly processed. I have another one whitch has not been credited yet.
Code: Select all
12:04:51:FS00:Initialized folding slot 00: cpu:1
12:04:51:WU01:FS00:Starting
12:04:51:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-sse2/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 953 -checkpoint 10 -np 1
12:04:51:WU01:FS00:Started FahCore on PID 962
12:04:51:WU01:FS00:Core PID:966
12:04:51:WU01:FS00:FahCore 0xa7 started
12:04:52:WU01:FS00:0xa7:*********************** Log Started 2021-02-25T12:04:51Z ***********************
...
12:04:52:WU01:FS00:0xa7:Calling: mdrun -s frame35.tpr -o frame35.trr -cpi state.cpt -cpt 10 -nt 1
12:04:53:WU01:FS00:0xa7:Steps: first=17500000 total=500000
12:04:55:WU01:FS00:0xa7:Completed 497762 out of 500000 steps (99%)
12:19:58:WU01:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
12:20:00:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
12:20:00:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:16927 run:21 clone:1264 gen:35 core:0xa7 unit:0x000004f0000000230000421f00000015
12:20:00:WU01:FS00:Uploading 6.42MiB to 129.32.209.201
12:20:00:WU01:FS00:Connecting to 129.32.209.201:8080
12:20:06:WU01:FS00:Upload 50.63%
12:20:12:WU01:FS00:Upload 99.32%
12:20:12:WU01:FS00:Upload complete
12:20:13:WU01:FS00:Server responded WORK_ACK (400)
12:20:13:WU01:FS00:Final credit estimate, 2984.00 points
12:20:13:WU01:FS00:Cleaning up
It reads "Work_Ack" and "Final credit estimate, 2984 points". Let's see...
Good evening!
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Fri Feb 26, 2021 11:17 pm
by Neil-B
That lab has another server down (Vince posted in another thread today about that one) so they might be up against it trying to sort out science related issues which will take precedence over sorting out stats catch up
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Fri Feb 26, 2021 11:25 pm
by bruce
Yes. I think this post applies here, too
viewtopic.php?f=19&t=36716&p=349536#p349536
Anyway, that lab is working on their servers.
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Sat Feb 27, 2021 9:50 pm
by J.C.Roeloffzen
I get no credits for project 16927. This project is demotivating people from working FAH so please sort this out or remove this project from FAH.
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Sun Feb 28, 2021 12:17 am
by Neil-B
Welcome to the Forums
The delay on stats generation for this project is a known issue (a quik search of the forums will show this) ... Folders will get the points once it is sorted out until then patience is required.
Priorities on FaH Projects are based upon the science and results - they are not driven by folders wishes on points ... If FaH Projects were put on hold every time there is a stats issue the science would be badly impeded ... From another post on this forum I surmise that in order to complete research on this project it was necessary to release some additional WUs - unfortunately the server releasing these is not communicating to the stats servers properly - Investigations are underway, Points will be resolved, but to stop a research project from being completed because points are delayed is exactly why I personally dislike the gamification and points system so much - folders lose sight that the real reason for FaH existing is to increase scientific discovery and knowledge - not to "gain points"
Stats delays happen - folders learn to be patient
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Mon Mar 01, 2021 8:03 pm
by Der_mit_dem_Hund
Up to now, the WU (project:16927 run:21 clone:1264 gen:35 core:0xa7 unit:0x000004f0000000230000421f00000015) has not been credited.
I currently work on another WU of the Voelz group although "Covid-19" is my Cause Preference. I kind of wonder why i am assigned WUs of the group when they have server problems.
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Mon Mar 01, 2021 8:19 pm
by bruce
Der_mit_dem_Hund wrote:I kind of wonder why i am assigned WUs of the group when they have server problems.
I look at it differently. I feel sorry for the FAH lab that has to fuddle with the "red tape" to get the campus net-admins to fix the network that they're forced to use. (Note: I'm not singling out the network administrators on that campus. That problem appears in lots of places.)
Re: Any problems with project:16927 run:18 clone:323 gen:91?
Posted: Mon Mar 01, 2021 8:28 pm
by Der_mit_dem_Hund
Hi
bruce,
English is not my mother tongue and in written form it may be perceived even more differently as when spoken. This is why I've written "kind of wonder". If the team cannot handle finished WUs I would assume - in order to avoid work in vain - they stopped distributing WU for a while, get things fixed, and start freshly when everything is in order.
That's it. I do not mind having my one CPU calculating for non-Covid causes. However, as long as my machine works for nirvana I cannot spend its power working for teams which can use the WUs for their science.
I remain patient. What choice I have?