Project: 6801 (Run 4394, Clone 1, Gen 2)

Moderators: Site Moderators, FAHC Science Team

HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by HendricksSA »

Sticks435, you will probably have to try this to get rid of the offending work unit.
1. stop your client.
2. delete the work directory and queue.dat file.
3. restart the client

If that doesn't work and you get the same unit again, repeat steps above AND change your machine ID to a new unique number (not used by other clients on this computer). You should then get a new work unit. Good luck!
sticks435
Posts: 40
Joined: Thu Mar 03, 2011 8:29 am

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by sticks435 »

Yea, once I realized I was getting the same unit over and over, and a couple of others had the issue, I went ahead and did that (including changing the machine ID) last night. Completed 6 WU's now without issue.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by bruce »

The WU (P6801,R4394,C1,G2) has been reported as a bad WU.
Dave_Goodchild
Posts: 25
Joined: Thu Jun 19, 2008 7:10 pm

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by Dave_Goodchild »

Project: 6801 (Run 4394, Clone 1, Gen 2)

Just had this on one of the GPU folders and I also had it at the weekend on another machine so it's still being assigned.
Image
cordis
Posts: 18
Joined: Wed May 27, 2009 6:35 am

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by cordis »

I also have a gpu down because of this WU. Trying to downclock it to see if I can get it working, but given the reports here, this just seems like a bad wu. It's troubling, because I keep deleting the work folders and all that stuff, and it keeps downloading it again. Guess I'll see what else I can do....
Image
a_fool
Posts: 7
Joined: Fri Feb 12, 2010 4:35 am

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by a_fool »

I've failed this WU about 20+ times now.. Project: 6801 (Run 4394, Clone 1, Gen 2)

If it has been flagged as a bad WU, why is it still being assigned?

This particular WU is being folded on a GTX 470 @ stock settings. Here's a snippet of one failed attempt.

Code: Select all

# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.41r2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH\GPU0
Executable: C:\FAH\FAH_GPU3.exe
Arguments: -oneunit -forcegpu nvidia_fermi -advmethods -verbosity 9 -gpu 0 

[03:40:26] - Ask before connecting: No
[03:40:26] - User name: a_fool (Team 111065)
[03:40:26] - User ID: 551C209503FE274
[03:40:26] - Machine ID: 3
[03:40:26] 
[03:40:26] Gpu type=3 species=20.
[03:40:26] Work directory not found. Creating...
[03:40:26] Could not open work queue, generating new queue...
[03:40:26] - Preparing to get new work unit...
[03:40:26] - Autosending finished units... [May 4 03:40:26 UTC]
[03:40:26] Cleaning up work directory
[03:40:26] Trying to send all finished work units
[03:40:26] + No unsent completed units remaining.
[03:40:26] - Autosend completed
[03:40:26] + Attempting to get work packet
[03:40:26] Passkey found
[03:40:26] - Will indicate memory of 4094 MB
[03:40:26] Gpu type=3 species=20.
[03:40:26] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[03:40:26] - Connecting to assignment server
[03:40:26] Connecting to http://assign-GPU.stanford.edu:8080/
[03:40:27] Posted data.
[03:40:27] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[03:40:27] + News From Folding@Home: Welcome to Folding@Home
[03:40:27] Loaded queue successfully.
[03:40:27] Gpu type=3 species=20.
[03:40:27] Sent data
[03:40:27] Connecting to http://171.64.65.64:8080/
[03:40:27] Posted data.
[03:40:27] Initial: 0000; - Receiving payload (expected size: 39634)
[03:40:27] Conversation time very short, giving reduced weight in bandwidth avg
[03:40:27] - Downloaded at ~77 kB/s
[03:40:27] - Averaged speed for that direction ~77 kB/s
[03:40:27] + Received work.
[03:40:27] + Closed connections
[03:40:27] 
[03:40:27] + Processing work unit
[03:40:27] Core required: FahCore_15.exe
[03:40:27] Core found.
[03:40:27] Working on queue slot 01 [May 4 03:40:27 UTC]
[03:40:27] + Working ...
[03:40:27] - Calling '.\FahCore_15.exe -dir work/ -suffix 01 -nice 19 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 4352 -version 641'

[03:40:28] 
[03:40:28] *------------------------------*
[03:40:28] Folding@Home GPU Core
[03:40:28] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[03:40:28] 
[03:40:28] Build host: SimbiosNvdWin7
[03:40:28] Board Type: NVIDIA/CUDA
[03:40:28] Core      : x=15
[03:40:28]  Window's signal control handler registered.
[03:40:28] Preparing to commence simulation
[03:40:28] - Looking at optimizations...
[03:40:28] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[03:40:28] - Created dyn
[03:40:28] - Files status OK
[03:40:28] sizeof(CORE_PACKET_HDR) = 512 file=<>
[03:40:28] - Expanded 39122 -> 171827 (decompressed 439.2 percent)
[03:40:28] Called DecompressByteArray: compressed_data_size=39122 data_size=171827, decompressed_data_size=171827 diff=0
[03:40:28] - Digital signature verified
[03:40:28] 
[03:40:28] Project: 6801 (Run 4394, Clone 1, Gen 2)
[03:40:28] 
[03:40:28] Assembly optimizations on if available.
[03:40:28] Entering M.D.
[03:40:30] Tpr hash work/wudata_01.tpr:  3072867433 4181462573 2689415243 1031181546 1959007100
[03:40:30] Working on ALZHEIMER'S DISEASE AMYLOID
[03:40:30] Client config found, loading data.
[03:40:30] Starting GUI Server
[03:40:30] Setting checkpoint frequency: 500000
[03:40:30] Setting checkpoint frequency: 500000
[03:42:05] Completed    500000 out of 50000000 steps (1%).
[03:42:06] mdrun_gpu returned 52
[03:42:06] NANs detected on GPU
[03:42:06] 
[03:42:06] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:42:08] CoreStatus = 7A (122)
[03:42:08] Sending work to server
[03:42:08] Project: 6801 (Run 4394, Clone 1, Gen 2)
[03:42:08] - Read packet limit of 540015616... Set to 524286976.
[03:42:08] - Error: Could not get length of results file work/wuresults_01.dat
[03:42:08] - Error: Could not read unit 01 file. Removing from queue.
[03:42:08] Trying to send all finished work units
[03:42:08] + No unsent completed units remaining.
[03:42:08] + -oneunit flag given and have now finished a unit. Exiting.***** Got a SIGTERM signal (2)
[03:42:08] Killing all core threads

Folding@Home Client Shutdown.
and another after downgrading to the 6.30 client...

Code: Select all

# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH\GPU0
Executable: C:\FAH\FAH_GPU3.exe
Arguments: -oneunit -forcegpu nvidia_fermi -advmethods -verbosity 9 -gpu 0 

[00:37:45] - Ask before connecting: No
[00:37:45] - User name: a_fool (Team 111065)
[00:37:45] - User ID: 551C209503FE274
[00:37:45] - Machine ID: 3
[00:37:45] 
[00:37:45] Gpu type=3 species=30.
[00:37:45] Loaded queue successfully.
[00:37:45] 
[00:37:45] - Autosending finished units... [May 5 00:37:45 UTC]
[00:37:45] + Processing work unit
[00:37:45] Trying to send all finished work units
[00:37:45] Core required: FahCore_15.exe
[00:37:45] + No unsent completed units remaining.
[00:37:45] - Autosend completed
[00:37:45] Core found.
[00:37:45] Working on queue slot 01 [May 5 00:37:45 UTC]
[00:37:45] + Working ...
[00:37:45] - Calling '.\FahCore_15.exe -dir work/ -suffix 01 -nice 19 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 5100 -version 630'

[00:37:45] 
[00:37:45] *------------------------------*
[00:37:45] Folding@Home GPU Core
[00:37:45] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[00:37:45] 
[00:37:45] Build host: SimbiosNvdWin7
[00:37:45] Board Type: NVIDIA/CUDA
[00:37:45] Core      : x=15
[00:37:45]  Window's signal control handler registered.
[00:37:45] Preparing to commence simulation
[00:37:45] - Looking at optimizations...
[00:37:45] - Files status OK
[00:37:45] sizeof(CORE_PACKET_HDR) = 512 file=<>
[00:37:45] - Expanded 39122 -> 171827 (decompressed 439.2 percent)
[00:37:45] Called DecompressByteArray: compressed_data_size=39122 data_size=171827, decompressed_data_size=171827 diff=0
[00:37:45] - Digital signature verified
[00:37:45] 
[00:37:45] Project: 6801 (Run 4394, Clone 1, Gen 2)
[00:37:45] 
[00:37:45] Assembly optimizations on if available.
[00:37:45] Entering M.D.
[00:37:47] Tpr hash work/wudata_01.tpr:  3072867433 4181462573 2689415243 1031181546 1959007100
[00:37:47] Working on ALZHEIMER'S DISEASE AMYLOID
[00:37:47] Client config found, loading data.
[00:37:48] Starting GUI Server
[00:37:48] Setting checkpoint frequency: 500000
[00:37:48] Setting checkpoint frequency: 500000
[00:39:23] Completed    500000 out of 50000000 steps (1%).
[00:39:23] mdrun_gpu returned 52
[00:39:23] NANs detected on GPU
[00:39:23] 
[00:39:23] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:39:26] CoreStatus = 7A (122)
[00:39:26] Sending work to server
[00:39:26] Project: 6801 (Run 4394, Clone 1, Gen 2)
[00:39:26] - Read packet limit of 540015616... Set to 524286976.
[00:39:26] - Error: Could not get length of results file work/wuresults_01.dat
[00:39:26] - Error: Could not read unit 01 file. Removing from queue.
[00:39:26] Trying to send all finished work units
[00:39:26] + No unsent completed units remaining.
[00:39:26] + -oneunit flag given and have now finished a unit. Exiting.***** Got a SIGTERM signal (2)
[00:39:26] Killing all core threads

Folding@Home Client Shutdown.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6801 (Run 4394, Clone 1, Gen 2)

Post by bruce »

According to the records that I can see, the last time this project was assigned was 2011-03-23 20:52 UTC but I only see results that are actually returned, not ones that are assigned and then never returned.

I reported it as a bad WU Wed Apr 13, 2011 16:09 UTC and, just to be thorough, reported it again just now.
The WU (P6801,R4394,C1,G2) has been reported as a bad WU. Note that the list of reported WUs are stopped daily at 8am pacific time.

Your log shows it being downloaded [May 4 03:40 UTC] That should be impossible, but I'll ask and see if somebody understands what's going on.
Post Reply