I wanted to toss a few numbers out to show that while the server in question has had a lot of downtime, the affect on my folding over the last month has been minimal.
Hats off to PG and to the flexibility of the system.
I'm running one GTX 560Ti, still with v6, so I can, and do, track all of the wu-by-wu stats in the HFM WU history log. I typically run DatAdmin 3 to export the MySQL DB to a CSV file for massaging with Excel.
Bottom line. During the month of June, I have 9 instances out of 226 completed GPU WUs where the "turn around time" has been anomalous. That is the time from the completion of one WU until the start of the next.
158 of the 226 June WUs were P6801, which involve 171.64.65.64. In the last couple of weeks, more and more of the WUs are from other projects.
96% of my WUs in June turned around in 10-30 seconds. The 9 anomalous instances range from 1:02 (mm:ss) to 17:35, with one outlier at over 4 hours. This is the period where so many of the 171.64.65.64 REJECT periods occurred.
The anomalous turn arounds were mostly correlated with either 171.64.65.64 REJECT periods, or with heavy "WU Received" periods, as reported in the Server Stats. And, some of these WU RCV loads have been heavy - see charts at bottom of post
That outlier occurred during one of the server REJECT periods. I can't remember exactly, but I may have stopped the GPU folding for a while to play with my SMP settings.
My 96% quick turn around could have been a little better in v7, since v6 won't attempt to get a new WU until after several failed attempts to upload the just-completed WU. v7 separates the upload/download, so if the assignment server recognizes not to assign me to the downed work server, then I could pick up a new WU possibly sooner.
Bottom line, for me at least.
Good job to PG. The SYSTEM seems to be working well. I was surprised, once I looked at the actual data, how good the overall turn-around for my series of Core 16 Nvidia GPU projects was.
Code: Select all
"anomalous" finish-to-start times
hh:mm:ss
00:05:09
00:04:53
00:13:04
00:12:34
00:01:02
04:13:30
00:01:59
00:02:36
00:17:25
While I was looking at the server stats, I ran a couple of excel spreadsheets and charts. These two charts show how busy this server has been. No deep message or analysis here, just some interesting collateral information. The timescale is each individual half-hour update to the stats page when I pulled it a couple of days ago
http://fah-web.stanford.edu/logs/171.64.65.64.log.html
NETLOAD tells how busy the server is by netstat (i.e. how many current connections the server is handling). Too many connections means that the server is heavily loaded. How many are "too many" depends on the server, but most of our servers can now handle a couple hundred connections without a problem.
NOTE LOG SCALE
WUS RCVD shows how many WUs have been received since the last time the servers WUs were updated into the stats. This shows the relative number of WUs being received on the different servers (if all is fine) or which servers are not being inputted into the stats db if there is some problem.
NOTE LINEAR SCALE