Page 12 of 13
Re: List of SMP WUs with the "1 core usage" issue
Posted: Fri Sep 11, 2009 4:44 am
by GTron
Sigh, one of my folders has now been served the following WU for the 5th time, and it hung deleting again!
Project: 2671 (Run 50, Clone 97, Gen 91)
Prof. Pande: Rather than "whack a mole" based on user reports, how about scanning the ws for projects 2671 and 2677 and remove (or at least not serve as aggressively/repeatedly) WUs with a payload/compressed size less than 1.5MB? Longer term, it would help if they could be automatically reported back to the ws/cs by the core/client rather than depending on numerous manual reports in this forum. Hope you all come up with a better approach soon.
Greg
Re: List of SMP WUs with the "1 core usage" issue
Posted: Fri Sep 11, 2009 9:36 am
by martiou
One more WU with CoreStatus = FF (255) :
Project: 2677 (Run 22, Clone 42, Gen 33)
Re: List of SMP WUs with the "1 core usage" issue
Posted: Fri Sep 11, 2009 9:52 am
by bollix47
Project: 2671 (Run 50, Clone 97, Gen 91)
compressed_data_size=1492868
Code: Select all
[02:31:16] Entering M.D.
[02:31:45] CoreStatus = FF (255)
[02:31:45] Sending work to server
[02:31:45] Project: 2671 (Run 50, Clone 97, Gen 91)
[02:31:45] - Error: Could not get length of results file work/wuresults_04.dat
[02:31:45] - Error: Could not read unit 04 file. Removing from queue.
Project: 2671 (Run 12, Clone 40, Gen 89)
compressed_data_size=1506827
Project: 2675 (Run 2, Clone 114, Gen 130)
compressed_data_size=1531457
In each case the client tried to do the same WU a couple times then moved on to a different WU. None produced a 'hang' or any other obvious problem.
Re: List of SMP WUs with the "1 core usage" issue
Posted: Fri Sep 11, 2009 1:55 pm
by VijayPande
Here's an update from Kasson (who is away at a CECAM conference presenting the results, but still working on it while away). I think we're close.
For the 1-core WU's, we're close to the fix. The new core was step 1: it errors out on the bad WU's, but it also errors out when a bad WU is being *generated*. Server-side I had written about half the script to "unroll" the bad WU's and regenerate good ones before I left for this conference (I'm on my way back but got delayed). I would anticipate having this running early next week at the latest
(hopefully sooner).
I'll send your suggestions regarding client upgrades to help with this to Joe, who is working on the next gen v7 client.
Re: List of SMP WUs with the "1 core usage" issue
Posted: Fri Sep 11, 2009 2:08 pm
by bollix47
Thank you for the update .... nice to see a light at the end of the tunnel
Re: List of SMP WUs with the "1 core usage" issue
Posted: Fri Sep 11, 2009 4:00 pm
by JackOfAll
VijayPande wrote:Here's an update from Kasson (who is away at a CECAM conference presenting the results, but still working on it while away). I think we're close.
For the 1-core WU's, we're close to the fix. The new core was step 1: it errors out on the bad WU's, but it also errors out when a bad WU is being *generated*. Server-side I had written about half the script to "unroll" the bad WU's and regenerate good ones before I left for this conference (I'm on my way back but got delayed). I would anticipate having this running early next week at the latest
(hopefully sooner).
I'll send your suggestions regarding client upgrades to help with this to Joe, who is working on the next gen v7 client.
Vijay, thank you for the update. But just to be clear on this, v2.10 of the core has more negatives than positives. Yes, it does error on the bad units. But if it causes a hang after the error (while deleting files). ie. the client does not move on to folding a new WU then it is no better than the previous v2.08 behaviour, just different. Science is still not being done! Also, the fact that PPD is decreased by up to 50% for users running 2 cores on native Linux or 2 cores in a Linux VM, is a major negative. There seem to be rather a lot of people running with a 2 core configuration in a VM. Perhaps it will be possible to revert the requirement for the v2.10 core as soon as Kasson has finished the work on the server side script? If these bad WU's are no longer being assigned then there is no need for a client side "fix". Or at least the core could be reverted until such a time as the other issues with the v2.10 core are resolved. I'm sure if people could delete their local v2.10 core and auto download v2.08 on the next WU it would be a step forward until the speed (PPD loss) issue is resolved.
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sat Sep 12, 2009 12:52 am
by bollix47
Project: 2671 (Run 32, Clone 41, Gen 88) .... repeat
compressed_data_size=1498670
no hang
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sat Sep 12, 2009 12:35 pm
by bollix47
Project: 2671 (Run 37, Clone 79, Gen 79)
compressed_data_size=1513359
no hang
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sat Sep 12, 2009 1:18 pm
by ikerekes
Project: 2677 (Run 22, Clone 42, Gen 33)
Project: 2677 (Run 3, Clone 75, Gen 38)
Project: 2677 (Run 38, Clone 62, Gen 48)
and 2 others what I have lost when my NF running natively from a usb stick hang deleting for 3 hours.
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sat Sep 12, 2009 4:34 pm
by bollix47
Project: 2671 (Run 50, Clone 97, Gen 92)
compressed_data_size=1492883
no hang
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sat Sep 12, 2009 8:35 pm
by GTron
Two more WUs on my folders to report, both involving multiple hangs deleting (and hours lost):
Project: 2671 (Run 37, Clone 79, Gen 79)
Project: 2669 (Run 7, Clone 51, Gen 110)
Greg
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sun Sep 13, 2009 1:46 am
by ikerekes
Project: 2677 (Run 3, Clone 10, Gen 36)
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sun Sep 13, 2009 4:50 am
by GTron
One of my folders was served:
Project: 2671 (Run 37, Clone 79, Gen 79)
yet again, and it hung deleting yet again. Then it was served
Project: 2669 (Run 14, Clone 57, Gen 199)
6 times, hanging times 5 and 6.
Greg
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sun Sep 13, 2009 10:39 am
by bollix47
Project: 2671 (Run 51, Clone 50, Gen 89)
compressed_data_size=1497732
no hang
Re: List of SMP WUs with the "1 core usage" issue
Posted: Sun Sep 13, 2009 5:18 pm
by bollix47
Project: 2671 (Run 52, Clone 43, Gen 83)
compressed_data_size=1506351
no hang