Project: 5771 (Run 1, Clone 112, Gen 955)

Moderators: Site Moderators, FAHC Science Team

Post Reply
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Project: 5771 (Run 1, Clone 112, Gen 955)

Post by anko1 »

I'm reporting this instant failure b/c it's on Big Red, which almost never fails on any unit. [Big Red: Windows SMP Console 6.23 Beta R1; Windows GPU console 6.20r1; FAH core 11, v.1.19; Intel Q9450 2.66G; ASUS P5Q 775 P45; NVidia GeForce 8800 GTX]

Code: Select all

[23:07:45] + Processing work unit
[23:07:45] Core required: FahCore_11.exe
[23:07:45] Core found.
[23:07:45] Working on queue slot 09 [August 11 23:07:45 UTC]
[23:07:45] + Working ...
[23:07:45] - Calling '.\FahCore_11.exe -dir work/ -suffix 09 -priority 96 -checkpoint 15 -verbose -lifeline 2204 -version 620'

[23:07:45] 
[23:07:45] *------------------------------*
[23:07:45] Folding@Home GPU Core - Beta
[23:07:45] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:07:45] 
[23:07:45] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[23:07:45] Build host: amoeba
[23:07:45] Board Type: Nvidia
[23:07:45] Core      : 
[23:07:45] Preparing to commence simulation
[23:07:45] - Looking at optimizations...
[23:07:45] - Created dyn
[23:07:45] - Files status OK
[23:07:45] - Expanded 45389 -> 251112 (decompressed 553.2 percent)
[23:07:45] Called DecompressByteArray: compressed_data_size=45389 data_size=251112, decompressed_data_size=251112 diff=0
[23:07:45] - Digital signature verified
[23:07:45] 
[23:07:45] Project: 5771 (Run 1, Clone 112, Gen 955)
[23:07:45] 
[23:07:45] Assembly optimizations on if available.
[23:07:45] Entering M.D.
[23:07:52] Working on Protein
[23:07:52] Client config found, loading data.
[23:07:52] mdrun_gpu returned 
[23:07:52] NANs detected on GPU
[23:07:52] 
[23:07:52] Folding@home Core Shutdown: UNSTABLE_MACHINE
[23:07:56] CoreStatus = 7A (122)
[23:07:56] Sending work to server
[23:07:56] Project: 5771 (Run 1, Clone 112, Gen 955)
[23:07:56] - Read packet limit of 540015616... Set to 524286976.
[23:07:56] - Error: Could not get length of results file work/wuresults_09.dat
[23:07:56] - Error: Could not read unit 09 file. Removing from queue.
[23:07:56] Trying to send all finished work units
[23:07:56] + No unsent completed units remaining.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by anko1 »

I got the same WU on Big Red with identical results on 8/17/09 12:55. [Coincidentally also WU 09, but I checked and there were successful 09s b/w the failures.]
ElectricVehicle
Posts: 157
Joined: Fri Feb 01, 2008 6:41 pm

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by ElectricVehicle »

This looks like a bad WU for me also. It fails after less than one second of actual computation on a GPU I've had running for months with no failures.

[14:32:27] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:27]
[14:32:27] Assembly optimizations on if available.
[14:32:27] Entering M.D.
[14:32:33] Working on Protein
[14:32:34] Client config found, loading data.
[14:32:34] mdrun_gpu returned
[14:32:34] NANs detected on GPU
[14:32:34]
[14:32:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:37] CoreStatus = 7A (122)
[14:32:37] Sending work to server
[14:32:37] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:37] - Read packet limit of 540015616... Set to 524286976.
[14:32:37] - Error: Could not get length of results file work/wuresults_02.dat
[14:32:37] - Error: Could not read unit 02 file. Removing from queue.
[14:32:37] EUE limit exceeded. Pausing 24 hours.

Code: Select all

[14:32:05] - Preparing to get new work unit...
[14:32:05] + Attempting to get work packet
[14:32:05] - Will indicate memory of 2046 MB
[14:32:05] - Connecting to assignment server
[14:32:05] Connecting to http://assign-GPU.stanford.edu:8080/
[14:32:05] Posted data.
[14:32:05] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[14:32:05] + News From Folding@Home: Welcome to Folding@Home
[14:32:05] Loaded queue successfully.
[14:32:05] Connecting to http://171.67.108.11:8080/
[14:32:05] Posted data.
[14:32:05] Initial: 0000; - Receiving payload (expected size: 45901)
[14:32:05] Conversation time very short, giving reduced weight in bandwidth avg
[14:32:05] - Downloaded at ~89 kB/s
[14:32:05] - Averaged speed for that direction ~88 kB/s
[14:32:05] + Received work.
[14:32:05] Trying to send all finished work units
[14:32:05] + No unsent completed units remaining.
[14:32:05] + Closed connections
[14:32:10] 
[14:32:10] + Processing work unit
[14:32:10] Core required: FahCore_11.exe
[14:32:10] Core found.
[14:32:10] Working on queue slot 01 [August 20 14:32:10 UTC]
[14:32:10] + Working ...
[14:32:10] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -priority 96 -checkpoint 15 -verbose -lifeline 2488 -version 620'

[14:32:10] 
[14:32:10] *------------------------------*
[14:32:10] Folding@Home GPU Core - Beta
[14:32:10] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[14:32:10] 
[14:32:10] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[14:32:10] Build host: amoeba
[14:32:10] Board Type: Nvidia
[14:32:10] Core      : 
[14:32:10] Preparing to commence simulation
[14:32:10] - Looking at optimizations...
[14:32:10] - Created dyn
[14:32:10] - Files status OK
[14:32:10] - Expanded 45389 -> 251112 (decompressed 553.2 percent)
[14:32:10] Called DecompressByteArray: compressed_data_size=45389 data_size=251112, decompressed_data_size=251112 diff=0
[14:32:10] - Digital signature verified
[14:32:10] 
[14:32:10] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:10] 
[14:32:10] Assembly optimizations on if available.
[14:32:10] Entering M.D.
[14:32:17] Working on Protein
[14:32:18] Client config found, loading data.
[14:32:18] mdrun_gpu returned 
[14:32:18] NANs detected on GPU
[14:32:18] 
[14:32:18] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:21] CoreStatus = 7A (122)
[14:32:21] Sending work to server
[14:32:21] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:21] - Read packet limit of 540015616... Set to 524286976.
[14:32:21] - Error: Could not get length of results file work/wuresults_01.dat
[14:32:21] - Error: Could not read unit 01 file. Removing from queue.
[14:32:21] Trying to send all finished work units
[14:32:21] + No unsent completed units remaining.
[14:32:21] - Preparing to get new work unit...
[14:32:21] + Attempting to get work packet
[14:32:21] - Will indicate memory of 2046 MB
[14:32:21] - Connecting to assignment server
[14:32:21] Connecting to http://assign-GPU.stanford.edu:8080/
[14:32:21] Posted data.
[14:32:21] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[14:32:21] + News From Folding@Home: Welcome to Folding@Home
[14:32:21] Loaded queue successfully.
[14:32:21] Connecting to http://171.67.108.11:8080/
[14:32:21] Posted data.
[14:32:21] Initial: 0000; - Receiving payload (expected size: 45901)
[14:32:21] Conversation time very short, giving reduced weight in bandwidth avg
[14:32:21] - Downloaded at ~89 kB/s
[14:32:21] - Averaged speed for that direction ~88 kB/s
[14:32:21] + Received work.
[14:32:21] Trying to send all finished work units
[14:32:21] + No unsent completed units remaining.
[14:32:21] + Closed connections
[14:32:26] 
[14:32:26] + Processing work unit
[14:32:26] Core required: FahCore_11.exe
[14:32:26] Core found.
[14:32:26] Working on queue slot 02 [August 20 14:32:26 UTC]
[14:32:26] + Working ...
[14:32:26] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -checkpoint 15 -verbose -lifeline 2488 -version 620'

[14:32:27] 
[14:32:27] *------------------------------*
[14:32:27] Folding@Home GPU Core - Beta
[14:32:27] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[14:32:27] 
[14:32:27] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[14:32:27] Build host: amoeba
[14:32:27] Board Type: Nvidia
[14:32:27] Core      : 
[14:32:27] Preparing to commence simulation
[14:32:27] - Looking at optimizations...
[14:32:27] - Created dyn
[14:32:27] - Files status OK
[14:32:27] - Expanded 45389 -> 251112 (decompressed 553.2 percent)
[14:32:27] Called DecompressByteArray: compressed_data_size=45389 data_size=251112, decompressed_data_size=251112 diff=0
[14:32:27] - Digital signature verified
[14:32:27] 
[14:32:27] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:27] 
[14:32:27] Assembly optimizations on if available.
[14:32:27] Entering M.D.
[14:32:33] Working on Protein
[14:32:34] Client config found, loading data.
[14:32:34] mdrun_gpu returned 
[14:32:34] NANs detected on GPU
[14:32:34] 
[14:32:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:37] CoreStatus = 7A (122)
[14:32:37] Sending work to server
[14:32:37] Project: 5771 (Run 1, Clone 112, Gen 955)
[14:32:37] - Read packet limit of 540015616... Set to 524286976.
[14:32:37] - Error: Could not get length of results file work/wuresults_02.dat
[14:32:37] - Error: Could not read unit 02 file. Removing from queue.
[14:32:37] EUE limit exceeded. Pausing 24 hours.
[16:33:24] - Autosending finished units... [August 20 16:33:24 UTC]
[16:33:24] Trying to send all finished work units
[16:33:24] + No unsent completed units remaining.
[16:33:24] - Autosend completed
[16:33:24] + Working...
[22:33:24] - Autosending finished units... [August 20 22:33:24 UTC]
[22:33:24] Trying to send all finished work units
[22:33:24] + No unsent completed units remaining.
[22:33:24] - Autosend completed
[22:33:24] + Working...
[04:33:24] - Autosending finished units... [August 21 04:33:24 UTC]
[04:33:24] Trying to send all finished work units
[04:33:24] + No unsent completed units remaining.
[04:33:24] - Autosend completed
[04:33:24] + Working...
Fold On! (with 100% Renewable, 0 Carbon electricity) ElectricVehicle EV1, RAV4 EV, LEAF, Bolt EV, Volt, M3, s4 Simulator
ei57
Posts: 64
Joined: Thu Jun 12, 2008 10:23 am

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by ei57 »

This one put one of my clients to sleep too. Same log as those above.
Archangelboy
Posts: 9
Joined: Thu Jul 09, 2009 5:52 pm
Hardware configuration: i7 920 on EVGA x58 SLI Vanilla @ 3.66 GHz Watercooled
6GB DDR3
3x GTX 280 GPU
4 notfreds smp clients, 3 GPU clients
Location: Bozeman, MT

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by Archangelboy »

Ditto. Vista 64, GTX 280, fails before a single computation is performed.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by bruce »

Reported
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by anko1 »

Thanks for reporting it, Bruce. It's still out there though. I got it 8 more times, the latest on 8/29. I know it's the weekend now, but hopefully PG will take care of it Monday morning.
vvoelz
Pande Group Member
Posts: 552
Joined: Sun Dec 02, 2007 8:07 pm
Location: Temple University, Philadelphia PA

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by vvoelz »

Sorry for the delay on this one guys. I think the WU was stopped but the server might not have been restarted. I restarted the server and put a complete halt on this RUN/CLONE. It shouldn't be giving you trouble from now on. -- Vince
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 5771 (Run 1, Clone 112, Gen 955)

Post by anko1 »

Thanks for taking care of it, Vince.

Angela
Post Reply