Stats backlog
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Stats backlog
A few thoughts.
From past experience (and paying a bit more attention to serverstats this morning) I get the following impressions (which I accept may be mistaken):
1) the stats system works on a "last in, first out" basis, ie recently uploaded WUs get processed before any backlog is addressed.
2) it stops processing when some limit is reached (time, space, WU count, I have no idea)
3) in the "contest" between clients adding uploaded WUs to the queue and the stats system removing them, the stats system isn't winning.
If I'm right, the parameter in 2) needs to be increased… comments?
From past experience (and paying a bit more attention to serverstats this morning) I get the following impressions (which I accept may be mistaken):
1) the stats system works on a "last in, first out" basis, ie recently uploaded WUs get processed before any backlog is addressed.
2) it stops processing when some limit is reached (time, space, WU count, I have no idea)
3) in the "contest" between clients adding uploaded WUs to the queue and the stats system removing them, the stats system isn't winning.
If I'm right, the parameter in 2) needs to be increased… comments?
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Stats backlog
1) Seems plausible.
2) When the Stats System is out of sync with the WS/CS or manually taken offline. This happens due to network issues, maintenance work, etc. Generally the Stats system is quite robust and any issues happen infrequently.
3) Clients will always upload results to the WS. In some cases, if the WS isn't online (keeping the WS online is very important), the CS is used (not all Projects use this). The Stats system takes the data from the WS/CS and then processes it. It is usually does in a batch process but it can take more than one attempt to clear the backlog. Not sure what the reasons are to split up the backlog into more than one batch job. Every few months, a manual "sync" takes place so if any WUs which weren't added to the Stats Server, are now added.
I do realize that the start of the year has been rocky from the Server side but do keep in mind that there were some significant changes to the infrastructure which may have resulted in unexpected issues.
2) When the Stats System is out of sync with the WS/CS or manually taken offline. This happens due to network issues, maintenance work, etc. Generally the Stats system is quite robust and any issues happen infrequently.
3) Clients will always upload results to the WS. In some cases, if the WS isn't online (keeping the WS online is very important), the CS is used (not all Projects use this). The Stats system takes the data from the WS/CS and then processes it. It is usually does in a batch process but it can take more than one attempt to clear the backlog. Not sure what the reasons are to split up the backlog into more than one batch job. Every few months, a manual "sync" takes place so if any WUs which weren't added to the Stats Server, are now added.
I do realize that the start of the year has been rocky from the Server side but do keep in mind that there were some significant changes to the infrastructure which may have resulted in unexpected issues.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Re: Stats backlog
Poor phrasing on my part, I meant that it stops processing the current run, even when everything is running correctly. Also see next.PantherX wrote:2) When the Stats System is out of sync with the WS/CS or manually taken offline. This happens due to network issues, maintenance work, etc. Generally the Stats system is quite robust and any issues happen infrequently.
That's more or less what I'm saying- whatever determines the size of the batch isn't making it big enough!PantherX wrote:3) … It is usually does in a batch process but it can take more than one attempt to clear the backlog.
I don't have any problem with that- for any job that runs at fixed intervals with varying (and unpredictable) amounts of input data it makes sense to ensure that a specific run won't overlap the start time for the next. It can be coded so that it won't cause problems if it should happen, but it's easier just to make sure it won'tPantherX wrote:Not sure what the reasons are to split up the backlog into more than one batch job.

In this particular case it also means that there are predictable periods when the stats will be "static" (even if not entirely up to date) which is convenient for those who want to build their own stats from them, be they individual donors or third-party "aggregation" sites.
Perhaps the Operating Procedure should be amended to "Every few months and after a significant outage…" ?PantherX wrote:Every few months, a manual "sync" takes place so if any WUs which weren't added to the Stats Server, are now added.
I think it would make sense that after an outage such as this one that the backlog is cleared in a single giant batch before the hourly update goes live again. The extra delay would, I think, be more acceptable to donors than "they'll be up to date eventually… give it a few months". Especially if they suspect that one or more uploaded WUs may (or may not) have gone missing.
(Yes, I know the mods can check if requested, but at the moment I'm 4 WUs light out of about 25, but I can't identify which ones. I don't really want to present the mods with a list of all 25 to check any more than they are likely to want to check them, and I'm fairly sure I'm not the only one who could come up with such a list…)
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Stats backlog
Please post the basis for your conclusions, because the stats server has never been unable to dig out of a backlog within a day or two of all the servers coming back online.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Re: Stats backlog
Just keep an eye on the WUs Rcv total on the server stats page- if it's going down at all it's doing it very slowly.
The feeling that each run of the stats update is taking less time (ie processing fewer WUs) than before is, I admit, little more than that- I've no hard evidence. But I think I'm right.
The feeling that each run of the stats update is taking less time (ie processing fewer WUs) than before is, I admit, little more than that- I've no hard evidence. But I think I'm right.
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Re: Stats backlog
OK, a few figures (later ones edited in)- some dug out from odd notes I've made, some exact from the stats:7im wrote:Please post the basis for your conclusions, because the stats server has never been unable to dig out of a backlog within a day or two of all the servers coming back online.
Code: Select all
Time WU Rcv
09:00Z ~12.6K
09:30Z >10K
12:00Z ~14K
14:26Z 11147
15:05Z 13920
15:26Z 11262
16:05Z 14257
16:22Z 11362
16:41Z 12587
17:01Z 14035
17:22Z 11369
(Sorry about the code tags, it was the only way I could find to keep the formatting)
-
- Posts: 320
- Joined: Sat May 23, 2009 4:49 pm
- Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970 - Location: Mississippi near Memphis, Tn
Re: Stats backlog
My thoughts:
1. A batch file is scheduled to run at H+59 or possibly H+00 each and every hour.
2. Catchup batch file runs until pre-empted by the batch file in item 1 which takes priority.
3. Catchup batch file begins where it left off after batch file in item 1 finishes.
This continues until catchup batch file has no more data to process.
Files containing data on finished wu's are limited in size by some means; from 0000Z to 0059Z or by some size limitations.
The longer the system is down the longer it will take the catchup batch file to process them.
I'm sure that Pande Group has a printout of files that have not been processed, and compare that list to files as they are completed to ensure that any missed files are resubmitted.
It works, eventually everything will be processed, maybe not as fast as some would like, but eventually.
Rick
1. A batch file is scheduled to run at H+59 or possibly H+00 each and every hour.
2. Catchup batch file runs until pre-empted by the batch file in item 1 which takes priority.
3. Catchup batch file begins where it left off after batch file in item 1 finishes.
This continues until catchup batch file has no more data to process.
Files containing data on finished wu's are limited in size by some means; from 0000Z to 0059Z or by some size limitations.
The longer the system is down the longer it will take the catchup batch file to process them.
I'm sure that Pande Group has a printout of files that have not been processed, and compare that list to files as they are completed to ensure that any missed files are resubmitted.
It works, eventually everything will be processed, maybe not as fast as some would like, but eventually.
Rick
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Stats backlog
After the Stats System comes back online, I have seen that either the following happens:billford wrote:...I meant that it stops processing the current run, even when everything is running correctly...
1) There is a massive delay where almost all backlog WUs are processed. It takes significant amount of time when compared to a normal run. The current run is effected by this.
2) The current run happens normally. However, the backlog of WUs is processed "slowly" i.e. it takes a couple of updates to clear the backlog.
Since the manual update requires more resources than an automatic one, if the researcher is very busy, it is unlikely. However, if the researcher has time, it could be done. Moreover, bruce has informed PG about a possible backlog (viewtopic.php?p=256057#p256057) so you just might see your points.billford wrote:...Perhaps the Operating Procedure should be amended to "Every few months and after a significant outage…" ?
I think it would make sense that after an outage such as this one that the backlog is cleared in a single giant batch before the hourly update goes live again. The extra delay would, I think, be more acceptable to donors than "they'll be up to date eventually… give it a few months". Especially if they suspect that one or more uploaded WUs may (or may not) have gone missing...
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Re: Stats backlog
I don't dispute the accuracy of anything I'm being told, but this morning at 09:00Z the server stats gave the total of unprocessed WUs as ~12,600, currently (22:00Z) it's 14,622.PantherX wrote:However, the backlog of WUs is processed "slowly" i.e. it takes a couple of updates to clear the backlog.
Shouldn't it be going down, not up?
edit- I'm assuming that "received" implies "not yet processed into stats". If I'm wrong then I apologise for the noise I've created and I'll shut up.
Re: Stats backlog
Not necessarily so ... each time a WU is returned a new one is created. So that alone will keep the count fairly constant. Also, there are new projects added which would in fact increase the number of WUs.billford wrote:Shouldn't it be going down, not up?
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Re: Stats backlog
The number of received WUs?bollix47 wrote:Also, there are new projects added which would in fact increase the number of WUs.
Re: Stats backlog
Sorry, I posted before I saw your edit.
The number of received WUs is only since the last stats update(i.e. normally the number of WUs returned since shortly after the top of the hour). It could be more or less than the last time the stats were run. Not all servers actually report those figures so the validity of the total is a bit questionable. Although it is correct for the figures shown, detail for some servers is missing so the Totals figure for that column does not represent a true total of all the work units received ... just for what is showing. For example, if you look at 171.64.65.69 you won't see a figure in that column but we know there are a lot of core_17 work units returned every hour to that server. On a server where this feature is working you'll see the number increase throughout the hour following the last stats update until a new stats update ... then the count starts over. Here is an example from the log of 171.64.65.124:
At 10 minutes past the hour the count is 236, at 30 minutes past the hour it is 613 and finally at 50 minutes the count is 978. If you look at the log you'll see the pattern repeating and, depending on the timing, the total could be higher or lower than the last time you looked.
The number of received WUs is only since the last stats update(i.e. normally the number of WUs returned since shortly after the top of the hour). It could be more or less than the last time the stats were run. Not all servers actually report those figures so the validity of the total is a bit questionable. Although it is correct for the figures shown, detail for some servers is missing so the Totals figure for that column does not represent a true total of all the work units received ... just for what is showing. For example, if you look at 171.64.65.69 you won't see a figure in that column but we know there are a lot of core_17 work units returned every hour to that server. On a server where this feature is working you'll see the number increase throughout the hour following the last stats update until a new stats update ... then the count starts over. Here is an example from the log of 171.64.65.124:
Code: Select all
Sat Jan 11 15:10:11 PST 2014 171.64.65.124 vspg14e sryckbos SMP full Accepting 4.18 42 4 50541 42976 0 5895 5895 5895 63 - - - - 236 1 - - 1 - 0 1 WL; X; 10000, 10000 6.34, 7.00 5, 5 10000, 10000 64, 64 - 1, 1 - F, A, F, A 8080, 8080
Sat Jan 11 15:30:11 PST 2014 171.64.65.124 vspg14e sryckbos SMP full Accepting 4.26 41 4 50541 42976 0 5932 5932 5932 67 - - - - 613 1 - - 1 - 0 1 WL; X; 10000, 10000 6.34, 7.00 5, 5 10000, 10000 64, 64 - 1, 1 - F, A, F, A 8080, 8080
Sat Jan 11 15:50:11 PST 2014 171.64.65.124 vspg14e sryckbos SMP full Accepting 3.36 44 9 50541 42975 1 5968 5968 5968 62 - - - - 978 1 - - 1 - 0 1 WL; X;
-
- Posts: 1003
- Joined: Thu May 02, 2013 8:46 pm
- Hardware configuration: Full Time:
2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)
Retired:
3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop - Location: Near Oxford, United Kingdom
- Contact:
Re: Stats backlog
Ah, OK… serves me right for believing what I read on the internet, I should know better by now.bollix47 wrote:Not all servers actually report those figures so the validity of the total is a bit questionable.
I'll shut up forthwith

-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Stats backlog
The validity is NOT in question, only that not all the factors have been considered when drawing conclusions, as I tried to say earlier.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Stats backlog
I don't think our Points or Status page has updated in almost 3 days.
From http://folding.extremeoverclocking.com/ ... =&u=638326
Form http://fah-web2.stanford.edu/cgi-bin/ma ... num=111065
This looks correct so maybe it is only the Team page above.
From http://fah-web.stanford.edu/cgi-bin/mai ... =bcavnaugh
Looks like it is fixed now.
From http://folding.extremeoverclocking.com/ ... =&u=638326
Code: Select all
24 264 1,439,478 1,298,147 246,666 246,666 186,306,868 9,187 04.27.13
Code: Select all
25 bcavnaugh 182730886 9092
From http://fah-web.stanford.edu/cgi-bin/mai ... =bcavnaugh
Code: Select all
Date of last work unit 2014-01-12 08:07:09
Total score 186342277
Overall rank (if points are combined) 243 of 1718776
Active clients (within 50 days) 85
Active clients (within 7 days) 24
Last edited by bcavnaugh on Sun Jan 12, 2014 11:39 pm, edited 1 time in total.
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.