Page 1 of 1
Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Fri Aug 14, 2009 4:40 am
by anko1
I'm reporting this instant failure b/c it's on Big Red, which almost never fails on any unit. [Big Red: Windows SMP Console 6.23 Beta R1; Windows GPU console 6.20r1; FAH core 11, v.1.19; Intel Q9450 2.66G; ASUS P5Q 775 P45; NVidia GeForce 8800 GTX]
Code: Select all
[23:07:45] + Processing work unit
[23:07:45] Core required: FahCore_11.exe
[23:07:45] Core found.
[23:07:45] Working on queue slot 09 [August 11 23:07:45 UTC]
[23:07:45] + Working ...
[23:07:45] - Calling '.\FahCore_11.exe -dir work/ -suffix 09 -priority 96 -checkpoint 15 -verbose -lifeline 2204 -version 620'
[23:07:45]
[23:07:45] *------------------------------*
[23:07:45] Folding@Home GPU Core - Beta
[23:07:45] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:07:45]
[23:07:45] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[23:07:45] Build host: amoeba
[23:07:45] Board Type: Nvidia
[23:07:45] Core :
[23:07:45] Preparing to commence simulation
[23:07:45] - Looking at optimizations...
[23:07:45] - Created dyn
[23:07:45] - Files status OK
[23:07:45] - Expanded 45389 -> 251112 (decompressed 553.2 percent)
[23:07:45] Called DecompressByteArray: compressed_data_size=45389 data_size=251112, decompressed_data_size=251112 diff=0
[23:07:45] - Digital signature verified
[23:07:45]
[23:07:45] Project: 5771 (Run 1, Clone 112, Gen 955)
[23:07:45]
[23:07:45] Assembly optimizations on if available.
[23:07:45] Entering M.D.
[23:07:52] Working on Protein
[23:07:52] Client config found, loading data.
[23:07:52] mdrun_gpu returned
[23:07:52] NANs detected on GPU
[23:07:52]
[23:07:52] Folding@home Core Shutdown: UNSTABLE_MACHINE
[23:07:56] CoreStatus = 7A (122)
[23:07:56] Sending work to server
[23:07:56] Project: 5771 (Run 1, Clone 112, Gen 955)
[23:07:56] - Read packet limit of 540015616... Set to 524286976.
[23:07:56] - Error: Could not get length of results file work/wuresults_09.dat
[23:07:56] - Error: Could not read unit 09 file. Removing from queue.
[23:07:56] Trying to send all finished work units
[23:07:56] + No unsent completed units remaining.
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Mon Aug 17, 2009 6:27 pm
by anko1
I got the same WU on Big Red with identical results on 8/17/09 12:55. [Coincidentally also WU 09, but I checked and there were successful 09s b/w the failures.]
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Fri Aug 21, 2009 9:57 am
by ElectricVehicle
This looks like a bad WU for me also. It fails after less than one second of actual computation on a GPU I've had running for months with no failures.
[14:32:27] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:27]
[14:32:27] Assembly optimizations on if available.
[14:32:27] Entering M.D.
[14:32:33] Working on Protein
[14:32:34] Client config found, loading data.
[14:32:34] mdrun_gpu returned
[14:32:34] NANs detected on GPU
[14:32:34]
[14:32:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:37] CoreStatus = 7A (122)
[14:32:37] Sending work to server
[14:32:37] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:37] - Read packet limit of 540015616... Set to 524286976.
[14:32:37] - Error: Could not get length of results file work/wuresults_02.dat
[14:32:37] - Error: Could not read unit 02 file. Removing from queue.
[14:32:37] EUE limit exceeded. Pausing 24 hours.
Code: Select all
[14:32:05] - Preparing to get new work unit...
[14:32:05] + Attempting to get work packet
[14:32:05] - Will indicate memory of 2046 MB
[14:32:05] - Connecting to assignment server
[14:32:05] Connecting to http://assign-GPU.stanford.edu:8080/
[14:32:05] Posted data.
[14:32:05] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[14:32:05] + News From Folding@Home: Welcome to Folding@Home
[14:32:05] Loaded queue successfully.
[14:32:05] Connecting to http://171.67.108.11:8080/
[14:32:05] Posted data.
[14:32:05] Initial: 0000; - Receiving payload (expected size: 45901)
[14:32:05] Conversation time very short, giving reduced weight in bandwidth avg
[14:32:05] - Downloaded at ~89 kB/s
[14:32:05] - Averaged speed for that direction ~88 kB/s
[14:32:05] + Received work.
[14:32:05] Trying to send all finished work units
[14:32:05] + No unsent completed units remaining.
[14:32:05] + Closed connections
[14:32:10]
[14:32:10] + Processing work unit
[14:32:10] Core required: FahCore_11.exe
[14:32:10] Core found.
[14:32:10] Working on queue slot 01 [August 20 14:32:10 UTC]
[14:32:10] + Working ...
[14:32:10] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -priority 96 -checkpoint 15 -verbose -lifeline 2488 -version 620'
[14:32:10]
[14:32:10] *------------------------------*
[14:32:10] Folding@Home GPU Core - Beta
[14:32:10] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[14:32:10]
[14:32:10] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[14:32:10] Build host: amoeba
[14:32:10] Board Type: Nvidia
[14:32:10] Core :
[14:32:10] Preparing to commence simulation
[14:32:10] - Looking at optimizations...
[14:32:10] - Created dyn
[14:32:10] - Files status OK
[14:32:10] - Expanded 45389 -> 251112 (decompressed 553.2 percent)
[14:32:10] Called DecompressByteArray: compressed_data_size=45389 data_size=251112, decompressed_data_size=251112 diff=0
[14:32:10] - Digital signature verified
[14:32:10]
[14:32:10] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:10]
[14:32:10] Assembly optimizations on if available.
[14:32:10] Entering M.D.
[14:32:17] Working on Protein
[14:32:18] Client config found, loading data.
[14:32:18] mdrun_gpu returned
[14:32:18] NANs detected on GPU
[14:32:18]
[14:32:18] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:21] CoreStatus = 7A (122)
[14:32:21] Sending work to server
[14:32:21] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:21] - Read packet limit of 540015616... Set to 524286976.
[14:32:21] - Error: Could not get length of results file work/wuresults_01.dat
[14:32:21] - Error: Could not read unit 01 file. Removing from queue.
[14:32:21] Trying to send all finished work units
[14:32:21] + No unsent completed units remaining.
[14:32:21] - Preparing to get new work unit...
[14:32:21] + Attempting to get work packet
[14:32:21] - Will indicate memory of 2046 MB
[14:32:21] - Connecting to assignment server
[14:32:21] Connecting to http://assign-GPU.stanford.edu:8080/
[14:32:21] Posted data.
[14:32:21] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[14:32:21] + News From Folding@Home: Welcome to Folding@Home
[14:32:21] Loaded queue successfully.
[14:32:21] Connecting to http://171.67.108.11:8080/
[14:32:21] Posted data.
[14:32:21] Initial: 0000; - Receiving payload (expected size: 45901)
[14:32:21] Conversation time very short, giving reduced weight in bandwidth avg
[14:32:21] - Downloaded at ~89 kB/s
[14:32:21] - Averaged speed for that direction ~88 kB/s
[14:32:21] + Received work.
[14:32:21] Trying to send all finished work units
[14:32:21] + No unsent completed units remaining.
[14:32:21] + Closed connections
[14:32:26]
[14:32:26] + Processing work unit
[14:32:26] Core required: FahCore_11.exe
[14:32:26] Core found.
[14:32:26] Working on queue slot 02 [August 20 14:32:26 UTC]
[14:32:26] + Working ...
[14:32:26] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -checkpoint 15 -verbose -lifeline 2488 -version 620'
[14:32:27]
[14:32:27] *------------------------------*
[14:32:27] Folding@Home GPU Core - Beta
[14:32:27] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[14:32:27]
[14:32:27] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[14:32:27] Build host: amoeba
[14:32:27] Board Type: Nvidia
[14:32:27] Core :
[14:32:27] Preparing to commence simulation
[14:32:27] - Looking at optimizations...
[14:32:27] - Created dyn
[14:32:27] - Files status OK
[14:32:27] - Expanded 45389 -> 251112 (decompressed 553.2 percent)
[14:32:27] Called DecompressByteArray: compressed_data_size=45389 data_size=251112, decompressed_data_size=251112 diff=0
[14:32:27] - Digital signature verified
[14:32:27]
[14:32:27] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:27]
[14:32:27] Assembly optimizations on if available.
[14:32:27] Entering M.D.
[14:32:33] Working on Protein
[14:32:34] Client config found, loading data.
[14:32:34] mdrun_gpu returned
[14:32:34] NANs detected on GPU
[14:32:34]
[14:32:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:37] CoreStatus = 7A (122)
[14:32:37] Sending work to server
[14:32:37] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:37] - Read packet limit of 540015616... Set to 524286976.
[14:32:37] - Error: Could not get length of results file work/wuresults_02.dat
[14:32:37] - Error: Could not read unit 02 file. Removing from queue.
[14:32:37] EUE limit exceeded. Pausing 24 hours.
[16:33:24] - Autosending finished units... [August 20 16:33:24 UTC]
[16:33:24] Trying to send all finished work units
[16:33:24] + No unsent completed units remaining.
[16:33:24] - Autosend completed
[16:33:24] + Working...
[22:33:24] - Autosending finished units... [August 20 22:33:24 UTC]
[22:33:24] Trying to send all finished work units
[22:33:24] + No unsent completed units remaining.
[22:33:24] - Autosend completed
[22:33:24] + Working...
[04:33:24] - Autosending finished units... [August 21 04:33:24 UTC]
[04:33:24] Trying to send all finished work units
[04:33:24] + No unsent completed units remaining.
[04:33:24] - Autosend completed
[04:33:24] + Working...
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Sun Aug 23, 2009 8:51 am
by ei57
This one put one of my clients to sleep too. Same log as those above.
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Mon Aug 24, 2009 3:22 am
by Archangelboy
Ditto. Vista 64, GTX 280, fails before a single computation is performed.
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Mon Aug 24, 2009 6:15 pm
by bruce
Reported
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Sun Aug 30, 2009 5:40 am
by anko1
Thanks for reporting it, Bruce. It's still out there though. I got it 8 more times, the latest on 8/29. I know it's the weekend now, but hopefully PG will take care of it Monday morning.
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Tue Sep 01, 2009 7:21 pm
by vvoelz
Sorry for the delay on this one guys. I think the WU was stopped but the server might not have been restarted. I restarted the server and put a complete halt on this RUN/CLONE. It shouldn't be giving you trouble from now on. -- Vince
Re: Project: 5771 (Run 1, Clone 112, Gen 955)
Posted: Tue Sep 01, 2009 10:52 pm
by anko1
Thanks for taking care of it, Vince.
Angela