List of SMP WUs with the "1 core usage" issue
Moderators: Site Moderators, FAHC Science Team
Re: List of SMP WUs with the "1 core usage" issue
Sigh, one of my folders has now been served the following WU for the 5th time, and it hung deleting again!
Project: 2671 (Run 50, Clone 97, Gen 91)
Prof. Pande: Rather than "whack a mole" based on user reports, how about scanning the ws for projects 2671 and 2677 and remove (or at least not serve as aggressively/repeatedly) WUs with a payload/compressed size less than 1.5MB? Longer term, it would help if they could be automatically reported back to the ws/cs by the core/client rather than depending on numerous manual reports in this forum. Hope you all come up with a better approach soon.
Greg
Project: 2671 (Run 50, Clone 97, Gen 91)
Prof. Pande: Rather than "whack a mole" based on user reports, how about scanning the ws for projects 2671 and 2677 and remove (or at least not serve as aggressively/repeatedly) WUs with a payload/compressed size less than 1.5MB? Longer term, it would help if they could be automatically reported back to the ws/cs by the core/client rather than depending on numerous manual reports in this forum. Hope you all come up with a better approach soon.
Greg
Re: List of SMP WUs with the "1 core usage" issue
One more WU with CoreStatus = FF (255) :
Project: 2677 (Run 22, Clone 42, Gen 33)
Project: 2677 (Run 22, Clone 42, Gen 33)
Re: List of SMP WUs with the "1 core usage" issue
Project: 2671 (Run 50, Clone 97, Gen 91)
compressed_data_size=1492868
Project: 2671 (Run 12, Clone 40, Gen 89)
compressed_data_size=1506827
Project: 2675 (Run 2, Clone 114, Gen 130)
compressed_data_size=1531457
In each case the client tried to do the same WU a couple times then moved on to a different WU. None produced a 'hang' or any other obvious problem.
compressed_data_size=1492868
Code: Select all
[02:31:16] Entering M.D.
[02:31:45] CoreStatus = FF (255)
[02:31:45] Sending work to server
[02:31:45] Project: 2671 (Run 50, Clone 97, Gen 91)
[02:31:45] - Error: Could not get length of results file work/wuresults_04.dat
[02:31:45] - Error: Could not read unit 04 file. Removing from queue.
compressed_data_size=1506827
Project: 2675 (Run 2, Clone 114, Gen 130)
compressed_data_size=1531457
In each case the client tried to do the same WU a couple times then moved on to a different WU. None produced a 'hang' or any other obvious problem.
Last edited by bollix47 on Fri Sep 11, 2009 5:34 pm, edited 1 time in total.
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: List of SMP WUs with the "1 core usage" issue
Here's an update from Kasson (who is away at a CECAM conference presenting the results, but still working on it while away). I think we're close.
I'll send your suggestions regarding client upgrades to help with this to Joe, who is working on the next gen v7 client.For the 1-core WU's, we're close to the fix. The new core was step 1: it errors out on the bad WU's, but it also errors out when a bad WU is being *generated*. Server-side I had written about half the script to "unroll" the bad WU's and regenerate good ones before I left for this conference (I'm on my way back but got delayed). I would anticipate having this running early next week at the latest
(hopefully sooner).
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Re: List of SMP WUs with the "1 core usage" issue
Thank you for the update .... nice to see a light at the end of the tunnel
Re: List of SMP WUs with the "1 core usage" issue
Vijay, thank you for the update. But just to be clear on this, v2.10 of the core has more negatives than positives. Yes, it does error on the bad units. But if it causes a hang after the error (while deleting files). ie. the client does not move on to folding a new WU then it is no better than the previous v2.08 behaviour, just different. Science is still not being done! Also, the fact that PPD is decreased by up to 50% for users running 2 cores on native Linux or 2 cores in a Linux VM, is a major negative. There seem to be rather a lot of people running with a 2 core configuration in a VM. Perhaps it will be possible to revert the requirement for the v2.10 core as soon as Kasson has finished the work on the server side script? If these bad WU's are no longer being assigned then there is no need for a client side "fix". Or at least the core could be reverted until such a time as the other issues with the v2.10 core are resolved. I'm sure if people could delete their local v2.10 core and auto download v2.08 on the next WU it would be a step forward until the speed (PPD loss) issue is resolved.VijayPande wrote:Here's an update from Kasson (who is away at a CECAM conference presenting the results, but still working on it while away). I think we're close.I'll send your suggestions regarding client upgrades to help with this to Joe, who is working on the next gen v7 client.For the 1-core WU's, we're close to the fix. The new core was step 1: it errors out on the bad WU's, but it also errors out when a bad WU is being *generated*. Server-side I had written about half the script to "unroll" the bad WU's and regenerate good ones before I left for this conference (I'm on my way back but got delayed). I would anticipate having this running early next week at the latest
(hopefully sooner).
Folding on Linux - Fedora 11 x86_64 / nVidia 180.60 driver / CUDA 2.1
Re: List of SMP WUs with the "1 core usage" issue
Project: 2671 (Run 32, Clone 41, Gen 88) .... repeat
compressed_data_size=1498670
no hang
compressed_data_size=1498670
no hang
Re: List of SMP WUs with the "1 core usage" issue
Project: 2671 (Run 37, Clone 79, Gen 79)
compressed_data_size=1513359
no hang
compressed_data_size=1513359
no hang
-
- Posts: 94
- Joined: Thu Nov 13, 2008 4:18 pm
- Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2 - Location: Calgary, Canada
Re: List of SMP WUs with the "1 core usage" issue
Project: 2677 (Run 22, Clone 42, Gen 33)
Project: 2677 (Run 3, Clone 75, Gen 38)
Project: 2677 (Run 38, Clone 62, Gen 48)
and 2 others what I have lost when my NF running natively from a usb stick hang deleting for 3 hours.
Project: 2677 (Run 3, Clone 75, Gen 38)
Project: 2677 (Run 38, Clone 62, Gen 48)
and 2 others what I have lost when my NF running natively from a usb stick hang deleting for 3 hours.
Re: List of SMP WUs with the "1 core usage" issue
Project: 2671 (Run 50, Clone 97, Gen 92)
compressed_data_size=1492883
no hang
compressed_data_size=1492883
no hang
Re: List of SMP WUs with the "1 core usage" issue
Two more WUs on my folders to report, both involving multiple hangs deleting (and hours lost):
Project: 2671 (Run 37, Clone 79, Gen 79)
Project: 2669 (Run 7, Clone 51, Gen 110)
Greg
Project: 2671 (Run 37, Clone 79, Gen 79)
Project: 2669 (Run 7, Clone 51, Gen 110)
Greg
-
- Posts: 94
- Joined: Thu Nov 13, 2008 4:18 pm
- Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2 - Location: Calgary, Canada
Re: List of SMP WUs with the "1 core usage" issue
One of my folders was served:
Project: 2671 (Run 37, Clone 79, Gen 79)
yet again, and it hung deleting yet again. Then it was served
Project: 2669 (Run 14, Clone 57, Gen 199)
6 times, hanging times 5 and 6.
Greg
Project: 2671 (Run 37, Clone 79, Gen 79)
yet again, and it hung deleting yet again. Then it was served
Project: 2669 (Run 14, Clone 57, Gen 199)
6 times, hanging times 5 and 6.
Greg
Re: List of SMP WUs with the "1 core usage" issue
Project: 2671 (Run 51, Clone 50, Gen 89)
compressed_data_size=1497732
no hang
compressed_data_size=1497732
no hang
Re: List of SMP WUs with the "1 core usage" issue
Project: 2671 (Run 52, Clone 43, Gen 83)
compressed_data_size=1506351
no hang
compressed_data_size=1506351
no hang