List of SMP WUs with the "1 core usage" issue

Moderators: Site Moderators, FAHC Science Team

GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: List of SMP WUs with the "1 core usage" issue

Post by GTron »

Sigh, one of my folders has now been served the following WU for the 5th time, and it hung deleting again!
Project: 2671 (Run 50, Clone 97, Gen 91)

Prof. Pande: Rather than "whack a mole" based on user reports, how about scanning the ws for projects 2671 and 2677 and remove (or at least not serve as aggressively/repeatedly) WUs with a payload/compressed size less than 1.5MB? Longer term, it would help if they could be automatically reported back to the ws/cs by the core/client rather than depending on numerous manual reports in this forum. Hope you all come up with a better approach soon.

Greg
martiou
Posts: 34
Joined: Fri May 23, 2008 12:13 pm
Location: France

Re: List of SMP WUs with the "1 core usage" issue

Post by martiou »

One more WU with CoreStatus = FF (255) :
Project: 2677 (Run 22, Clone 42, Gen 33)
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Project: 2671 (Run 50, Clone 97, Gen 91)

compressed_data_size=1492868

Code: Select all

[02:31:16] Entering M.D.
[02:31:45] CoreStatus = FF (255)
[02:31:45] Sending work to server
[02:31:45] Project: 2671 (Run 50, Clone 97, Gen 91)
[02:31:45] - Error: Could not get length of results file work/wuresults_04.dat
[02:31:45] - Error: Could not read unit 04 file. Removing from queue.
Project: 2671 (Run 12, Clone 40, Gen 89)

compressed_data_size=1506827

Project: 2675 (Run 2, Clone 114, Gen 130)

compressed_data_size=1531457

In each case the client tried to do the same WU a couple times then moved on to a different WU. None produced a 'hang' or any other obvious problem.
Last edited by bollix47 on Fri Sep 11, 2009 5:34 pm, edited 1 time in total.
Image
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: List of SMP WUs with the "1 core usage" issue

Post by VijayPande »

Here's an update from Kasson (who is away at a CECAM conference presenting the results, but still working on it while away). I think we're close.
For the 1-core WU's, we're close to the fix. The new core was step 1: it errors out on the bad WU's, but it also errors out when a bad WU is being *generated*. Server-side I had written about half the script to "unroll" the bad WU's and regenerate good ones before I left for this conference (I'm on my way back but got delayed). I would anticipate having this running early next week at the latest
(hopefully sooner).
I'll send your suggestions regarding client upgrades to help with this to Joe, who is working on the next gen v7 client.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Thank you for the update .... nice to see a light at the end of the tunnel :D
Image
JackOfAll
Posts: 14
Joined: Tue Mar 17, 2009 3:40 pm

Re: List of SMP WUs with the "1 core usage" issue

Post by JackOfAll »

VijayPande wrote:Here's an update from Kasson (who is away at a CECAM conference presenting the results, but still working on it while away). I think we're close.
For the 1-core WU's, we're close to the fix. The new core was step 1: it errors out on the bad WU's, but it also errors out when a bad WU is being *generated*. Server-side I had written about half the script to "unroll" the bad WU's and regenerate good ones before I left for this conference (I'm on my way back but got delayed). I would anticipate having this running early next week at the latest
(hopefully sooner).
I'll send your suggestions regarding client upgrades to help with this to Joe, who is working on the next gen v7 client.
Vijay, thank you for the update. But just to be clear on this, v2.10 of the core has more negatives than positives. Yes, it does error on the bad units. But if it causes a hang after the error (while deleting files). ie. the client does not move on to folding a new WU then it is no better than the previous v2.08 behaviour, just different. Science is still not being done! Also, the fact that PPD is decreased by up to 50% for users running 2 cores on native Linux or 2 cores in a Linux VM, is a major negative. There seem to be rather a lot of people running with a 2 core configuration in a VM. Perhaps it will be possible to revert the requirement for the v2.10 core as soon as Kasson has finished the work on the server side script? If these bad WU's are no longer being assigned then there is no need for a client side "fix". Or at least the core could be reverted until such a time as the other issues with the v2.10 core are resolved. I'm sure if people could delete their local v2.10 core and auto download v2.08 on the next WU it would be a step forward until the speed (PPD loss) issue is resolved.
Folding on Linux - Fedora 11 x86_64 / nVidia 180.60 driver / CUDA 2.1
Image
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Project: 2671 (Run 32, Clone 41, Gen 88) .... repeat

compressed_data_size=1498670

no hang
Image
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Project: 2671 (Run 37, Clone 79, Gen 79)

compressed_data_size=1513359

no hang
Image
ikerekes
Posts: 94
Joined: Thu Nov 13, 2008 4:18 pm
Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Location: Calgary, Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by ikerekes »

Project: 2677 (Run 22, Clone 42, Gen 33)
Project: 2677 (Run 3, Clone 75, Gen 38)
Project: 2677 (Run 38, Clone 62, Gen 48)

and 2 others what I have lost when my NF running natively from a usb stick hang deleting for 3 hours.
Image
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Project: 2671 (Run 50, Clone 97, Gen 92)

compressed_data_size=1492883

no hang
Image
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: List of SMP WUs with the "1 core usage" issue

Post by GTron »

Two more WUs on my folders to report, both involving multiple hangs deleting (and hours lost):

Project: 2671 (Run 37, Clone 79, Gen 79)
Project: 2669 (Run 7, Clone 51, Gen 110)

Greg
ikerekes
Posts: 94
Joined: Thu Nov 13, 2008 4:18 pm
Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Location: Calgary, Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by ikerekes »

Project: 2677 (Run 3, Clone 10, Gen 36)
Image
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: List of SMP WUs with the "1 core usage" issue

Post by GTron »

One of my folders was served:
Project: 2671 (Run 37, Clone 79, Gen 79)
yet again, and it hung deleting yet again. Then it was served
Project: 2669 (Run 14, Clone 57, Gen 199)
6 times, hanging times 5 and 6.
Greg
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Project: 2671 (Run 51, Clone 50, Gen 89)

compressed_data_size=1497732

no hang
Image
bollix47
Posts: 2959
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: List of SMP WUs with the "1 core usage" issue

Post by bollix47 »

Project: 2671 (Run 52, Clone 43, Gen 83)

compressed_data_size=1506351

no hang
Image
Post Reply