Page 1 of 1
Something seemingly serious wrong with receiving credit
Posted: Sun Mar 08, 2009 12:40 am
by Leonardo
Let me preface this by saying this is not a whine or blame thread. I am merely looking for a solution.
Problem: daily credited points is about 10,000 less than what it apparently should be, as measured by FAHMon
Considerations: I've been folding since the beginning. Currently I'm running a total 20 clients, mix of GPU2 Nvidia and quad core SMP
Last week I dismantled one of my platforms, selling most of the parts, but moving over a dual GPU (9800GX2) to another computer with an open PCI-e slot. So the net loss in production should have been only that of the CPU that is no longer in my Folding array, a Q6600 that was producing 3000-4000PPD. Instead, I am now down about 12-13000 from where I was before. Actual credited production is consistently 10,000PPD or more lower than what FAHMon shows productions rates to be. I have observed this actual:credited production disparity now for a week.
Troubleshooting and analysis:
1. I have checked the configuration each client, ensuring that user name and team number are correct for each. It is correct.
2. I have scoured the clients' logs for botched work units. Yes, there have been a number of failed Nvidia 575x work units, but only enough to account for maybe 1500 PPD, certainly not 10,000K.
What is happening? I've looked at this closely enough to believe it's not in my head.
What other troubleshooting/investigational steps can I take.
If any moderators or PG staff are reading, and you think checking the 'server' would be advisable, I am Folder Leonardo, Team 93. All my clients are registered so.
Re: Something seemingly serious wrong with receiving credit
Posted: Sun Mar 08, 2009 12:49 am
by MtM
Leonardo wrote:Let me preface this by saying this is not a whine or blame thread. I am merely looking for a solution.
Problem: daily credited points is about 10,000 less than what it apparently should be, as measured by FAHMon
Considerations: I've been folding since the beginning. Currently I'm running a total 20 clients, mix of GPU2 Nvidia and quad core SMP
Last week I dismantled one of my platforms, selling most of the parts, but moving over a dual GPU (9800GX2) to another computer with an open PCI-e slot. So the net loss in production should have been only that of the CPU that is no longer in my Folding array, a Q6600 that was producing 3000-4000PPD. Instead, I am now down about 12-13000 from where I was before. Actual credited production is consistently 10,000PPD or more lower than what FAHMon shows productions rates to be. I have observed this actual:credited production disparity now for a week.
Troubleshooting and analysis:
1. I have checked the configuration each client, ensuring that user name and team number are correct for each. It is correct.
2. I have scoured the clients' logs for botched work units. Yes, there have been a number of failed Nvidia 575x work units, but only enough to account for maybe 1500 PPD, certainly not 10,000K.
What is happening? I've looked at this closely enough to believe it's not in my head.
What other troubleshooting/investigational steps can I take.
If any moderators or PG staff are reading, and you think checking the 'server' would be advisable, I am Folder Leonardo, Team 93. All my clients are registered so.
Question: how do you monitor them? Last 3 frames, all frames or effective rate? Effective rate should show the most realistic figures as it takes the actual time it takes to complete a workunit from download to completion.
There have been server issues as well, I trust you looked for stuck wu's as well ( wu's in queue not uploaded are offcourse not credited as well and a wu worth allot of points is going to create a large diffrence in actuall ppd ).
Re: Something seemingly serious wrong with receiving credit
Posted: Sun Mar 08, 2009 1:05 am
by Leonardo
Monitoring: I've been using the last frame method in FAHMon for a year, so my measuring methodology is a constant, and not a variable. I will though, try a different monitoring means, just in case something strange is going on.
I'm monitoring the 'farm' on two different computers. I have now set one FAHMon instance to monitor for current and the other to monitor for effective rate.
Work units stuck in the queue and not released to Stanford servers? I don't think so. I've performed spot checks on different clients and so far has observed that all completed work units were "successfully sent" to the server. I will perform more checks on this. For the life of me, I can't think of anything concerning client configurations that is different than before. The only differences, at least that I know of, are that there is one less CPU and that two GPUs in a former machine are now consolidated in another machine.
Re: Something seemingly serious wrong with receiving credit
Posted: Sun Mar 08, 2009 1:17 am
by MtM
There is a problem with the current update as well, that can't influence average ppd for a whole week though
If you got allot off nvidia gpu2's on advmethods you might have gotten allot of 5900 wu's for which ppd measurements using last frame are way off the mark ( see the thread about it, I'm pretty worn out and was about to turn in so I'm afraid I don't have the stamina to link you ).
For the stuck wu's, don't look through the logs but use -queueinfo it's much less time consuming?
Re: Something seemingly serious wrong with receiving credit
Posted: Tue Mar 10, 2009 11:25 pm
by Leonardo
The problem seems to have resolved itself. I wrote 'resolved itself' as I do not know the cause of the resolution or the cause of the original problem.
The amount of credit awarded for the completed units is now very close to the estimated production - both effective and 'last frame' calculation - as given shown by FAHMon. Whatever the problem was, just BING, one day the documented production credit went right back up to where it theoretically should have been. Only thing I can think of was that there were server problems (the catch-all scapegoat, I believe, no?) or client queues weren't releasing finished work units.
For the stuck wu's, don't look through the logs but use -queueinfo it's much less time consuming?
Indeed, I will make use of that in the future.