Page 2 of 2
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Wed Apr 13, 2011 3:24 pm
by HendricksSA
Sticks435, you will probably have to try this to get rid of the offending work unit.
1. stop your client.
2. delete the work directory and queue.dat file.
3. restart the client
If that doesn't work and you get the same unit again, repeat steps above AND change your machine ID to a new unique number (not used by other clients on this computer). You should then get a new work unit. Good luck!
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Wed Apr 13, 2011 3:41 pm
by sticks435
Yea, once I realized I was getting the same unit over and over, and a couple of others had the issue, I went ahead and did that (including changing the machine ID) last night. Completed 6 WU's now without issue.
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Wed Apr 13, 2011 4:09 pm
by bruce
The WU (P6801,R4394,C1,G2) has been reported as a bad WU.
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Mon Apr 18, 2011 7:31 pm
by Dave_Goodchild
Project: 6801 (Run 4394, Clone 1, Gen 2)
Just had this on one of the GPU folders and I also had it at the weekend on another machine so it's still being assigned.
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Sun Apr 24, 2011 8:12 am
by cordis
I also have a gpu down because of this WU. Trying to downclock it to see if I can get it working, but given the reports here, this just seems like a bad wu. It's troubling, because I keep deleting the work folders and all that stuff, and it keeps downloading it again. Guess I'll see what else I can do....
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Thu May 05, 2011 12:53 am
by a_fool
I've failed this WU about 20+ times now.. Project: 6801 (Run 4394, Clone 1, Gen 2)
If it has been flagged as a bad WU, why is it still being assigned?
This particular WU is being folded on a GTX 470 @ stock settings. Here's a snippet of one failed attempt.
Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.41r2
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\FAH\GPU0
Executable: C:\FAH\FAH_GPU3.exe
Arguments: -oneunit -forcegpu nvidia_fermi -advmethods -verbosity 9 -gpu 0
[03:40:26] - Ask before connecting: No
[03:40:26] - User name: a_fool (Team 111065)
[03:40:26] - User ID: 551C209503FE274
[03:40:26] - Machine ID: 3
[03:40:26]
[03:40:26] Gpu type=3 species=20.
[03:40:26] Work directory not found. Creating...
[03:40:26] Could not open work queue, generating new queue...
[03:40:26] - Preparing to get new work unit...
[03:40:26] - Autosending finished units... [May 4 03:40:26 UTC]
[03:40:26] Cleaning up work directory
[03:40:26] Trying to send all finished work units
[03:40:26] + No unsent completed units remaining.
[03:40:26] - Autosend completed
[03:40:26] + Attempting to get work packet
[03:40:26] Passkey found
[03:40:26] - Will indicate memory of 4094 MB
[03:40:26] Gpu type=3 species=20.
[03:40:26] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[03:40:26] - Connecting to assignment server
[03:40:26] Connecting to http://assign-GPU.stanford.edu:8080/
[03:40:27] Posted data.
[03:40:27] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[03:40:27] + News From Folding@Home: Welcome to Folding@Home
[03:40:27] Loaded queue successfully.
[03:40:27] Gpu type=3 species=20.
[03:40:27] Sent data
[03:40:27] Connecting to http://171.64.65.64:8080/
[03:40:27] Posted data.
[03:40:27] Initial: 0000; - Receiving payload (expected size: 39634)
[03:40:27] Conversation time very short, giving reduced weight in bandwidth avg
[03:40:27] - Downloaded at ~77 kB/s
[03:40:27] - Averaged speed for that direction ~77 kB/s
[03:40:27] + Received work.
[03:40:27] + Closed connections
[03:40:27]
[03:40:27] + Processing work unit
[03:40:27] Core required: FahCore_15.exe
[03:40:27] Core found.
[03:40:27] Working on queue slot 01 [May 4 03:40:27 UTC]
[03:40:27] + Working ...
[03:40:27] - Calling '.\FahCore_15.exe -dir work/ -suffix 01 -nice 19 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 4352 -version 641'
[03:40:28]
[03:40:28] *------------------------------*
[03:40:28] Folding@Home GPU Core
[03:40:28] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[03:40:28]
[03:40:28] Build host: SimbiosNvdWin7
[03:40:28] Board Type: NVIDIA/CUDA
[03:40:28] Core : x=15
[03:40:28] Window's signal control handler registered.
[03:40:28] Preparing to commence simulation
[03:40:28] - Looking at optimizations...
[03:40:28] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[03:40:28] - Created dyn
[03:40:28] - Files status OK
[03:40:28] sizeof(CORE_PACKET_HDR) = 512 file=<>
[03:40:28] - Expanded 39122 -> 171827 (decompressed 439.2 percent)
[03:40:28] Called DecompressByteArray: compressed_data_size=39122 data_size=171827, decompressed_data_size=171827 diff=0
[03:40:28] - Digital signature verified
[03:40:28]
[03:40:28] Project: 6801 (Run 4394, Clone 1, Gen 2)
[03:40:28]
[03:40:28] Assembly optimizations on if available.
[03:40:28] Entering M.D.
[03:40:30] Tpr hash work/wudata_01.tpr: 3072867433 4181462573 2689415243 1031181546 1959007100
[03:40:30] Working on ALZHEIMER'S DISEASE AMYLOID
[03:40:30] Client config found, loading data.
[03:40:30] Starting GUI Server
[03:40:30] Setting checkpoint frequency: 500000
[03:40:30] Setting checkpoint frequency: 500000
[03:42:05] Completed 500000 out of 50000000 steps (1%).
[03:42:06] mdrun_gpu returned 52
[03:42:06] NANs detected on GPU
[03:42:06]
[03:42:06] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:42:08] CoreStatus = 7A (122)
[03:42:08] Sending work to server
[03:42:08] Project: 6801 (Run 4394, Clone 1, Gen 2)
[03:42:08] - Read packet limit of 540015616... Set to 524286976.
[03:42:08] - Error: Could not get length of results file work/wuresults_01.dat
[03:42:08] - Error: Could not read unit 01 file. Removing from queue.
[03:42:08] Trying to send all finished work units
[03:42:08] + No unsent completed units remaining.
[03:42:08] + -oneunit flag given and have now finished a unit. Exiting.***** Got a SIGTERM signal (2)
[03:42:08] Killing all core threads
Folding@Home Client Shutdown.
and another after downgrading to the 6.30 client...
Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.30r1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\FAH\GPU0
Executable: C:\FAH\FAH_GPU3.exe
Arguments: -oneunit -forcegpu nvidia_fermi -advmethods -verbosity 9 -gpu 0
[00:37:45] - Ask before connecting: No
[00:37:45] - User name: a_fool (Team 111065)
[00:37:45] - User ID: 551C209503FE274
[00:37:45] - Machine ID: 3
[00:37:45]
[00:37:45] Gpu type=3 species=30.
[00:37:45] Loaded queue successfully.
[00:37:45]
[00:37:45] - Autosending finished units... [May 5 00:37:45 UTC]
[00:37:45] + Processing work unit
[00:37:45] Trying to send all finished work units
[00:37:45] Core required: FahCore_15.exe
[00:37:45] + No unsent completed units remaining.
[00:37:45] - Autosend completed
[00:37:45] Core found.
[00:37:45] Working on queue slot 01 [May 5 00:37:45 UTC]
[00:37:45] + Working ...
[00:37:45] - Calling '.\FahCore_15.exe -dir work/ -suffix 01 -nice 19 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 5100 -version 630'
[00:37:45]
[00:37:45] *------------------------------*
[00:37:45] Folding@Home GPU Core
[00:37:45] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[00:37:45]
[00:37:45] Build host: SimbiosNvdWin7
[00:37:45] Board Type: NVIDIA/CUDA
[00:37:45] Core : x=15
[00:37:45] Window's signal control handler registered.
[00:37:45] Preparing to commence simulation
[00:37:45] - Looking at optimizations...
[00:37:45] - Files status OK
[00:37:45] sizeof(CORE_PACKET_HDR) = 512 file=<>
[00:37:45] - Expanded 39122 -> 171827 (decompressed 439.2 percent)
[00:37:45] Called DecompressByteArray: compressed_data_size=39122 data_size=171827, decompressed_data_size=171827 diff=0
[00:37:45] - Digital signature verified
[00:37:45]
[00:37:45] Project: 6801 (Run 4394, Clone 1, Gen 2)
[00:37:45]
[00:37:45] Assembly optimizations on if available.
[00:37:45] Entering M.D.
[00:37:47] Tpr hash work/wudata_01.tpr: 3072867433 4181462573 2689415243 1031181546 1959007100
[00:37:47] Working on ALZHEIMER'S DISEASE AMYLOID
[00:37:47] Client config found, loading data.
[00:37:48] Starting GUI Server
[00:37:48] Setting checkpoint frequency: 500000
[00:37:48] Setting checkpoint frequency: 500000
[00:39:23] Completed 500000 out of 50000000 steps (1%).
[00:39:23] mdrun_gpu returned 52
[00:39:23] NANs detected on GPU
[00:39:23]
[00:39:23] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:39:26] CoreStatus = 7A (122)
[00:39:26] Sending work to server
[00:39:26] Project: 6801 (Run 4394, Clone 1, Gen 2)
[00:39:26] - Read packet limit of 540015616... Set to 524286976.
[00:39:26] - Error: Could not get length of results file work/wuresults_01.dat
[00:39:26] - Error: Could not read unit 01 file. Removing from queue.
[00:39:26] Trying to send all finished work units
[00:39:26] + No unsent completed units remaining.
[00:39:26] + -oneunit flag given and have now finished a unit. Exiting.***** Got a SIGTERM signal (2)
[00:39:26] Killing all core threads
Folding@Home Client Shutdown.
Re: Project: 6801 (Run 4394, Clone 1, Gen 2)
Posted: Thu May 05, 2011 7:12 am
by bruce
According to the records that I can see, the last time this project was assigned was 2011-03-23 20:52 UTC but I only see results that are actually returned, not ones that are assigned and then never returned.
I reported it as a bad WU Wed Apr 13, 2011 16:09 UTC and, just to be thorough, reported it again just now.
The WU (P6801,R4394,C1,G2) has been reported as a bad WU. Note that the list of reported WUs are stopped daily at 8am pacific time.
Your log shows it being downloaded [May 4 03:40 UTC] That should be impossible, but I'll ask and see if somebody understands what's going on.