Project: 2665 (Run 0, Clone 479, Gen 20)

Moderators: Site Moderators, FAHC Science Team

314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Project: 2665 (Run 0, Clone 479, Gen 20)

Post by 314159 »

Linux Client-Q6600-stock clock (not the same computer for which I have posted other EUEs)

At least this one did not "hang". :)

Code: Select all

[12:19:47] Core required: FahCore_a1.exe
[12:19:47] Core found.
[12:19:47] Working on Unit 08 [August 26 12:19:47]
[12:19:47] + Working ...
[12:19:47] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 15 -forceasm -verbose -lifeline 5847 -version 602'

[12:19:47] 
[12:19:47] *------------------------------*
[12:19:47] Folding@Home Gromacs SMP Core
[12:19:47] Version 1.74 (November 27, 2006)
[12:19:47] 
[12:19:47] Preparing to commence simulation
[12:19:47] - Ensuring status. Please wait.
[12:20:04] - Assembly optimizations manually forced on.
[12:20:04] - Not checking prior termination.
[12:20:05] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[12:20:05] - Starting from initial work packet
[12:20:05] 
[12:20:05] Project: 2665 (Run 0, Clone 479, Gen 20)
[12:20:05] 
[12:20:05] Assembly optimizations on if available.
[12:20:05] Entering M.D.
[12:20:11] Rejecting checkpoint
[12:20:13] Protein: HGG in waterExtra SSE boost OK.
[12:20:13] 
[12:20:13] Extra SSE boost OK.
[12:20:14] Writing local files
[12:20:14] Completed 0 out of 250000 steps  (0 percent)
[12:20:14] 
[12:20:14] Folding@home Core Shutdown: INTERRUPTED
[12:20:18] CoreStatus = 0 (0)
[12:20:18] Client-core communications error: ERROR 0x0
[12:20:18] Deleting current work unit & continuing...
[12:24:40] - Warning: Could not delete all work unit files (8): Core returned invalid code
[12:24:40] Trying to send all finished work units
[12:24:40] + No unsent completed units remaining.
[12:24:40] - Preparing to get new work unit...
[12:24:40] + Attempting to get work packet
[12:24:40] - Will indicate memory of 1024 MB
[12:24:40] - Connecting to assignment server
[12:24:40] Connecting to http://assign.stanford.edu:8080/
[12:24:40] Posted data.
[12:24:40] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:24:40] + News From Folding@Home: Welcome to Folding@Home
[12:24:40] Loaded queue successfully.
[12:24:40] Connecting to http://171.64.65.64:8080/
[12:24:46] Posted data.
[12:24:46] Initial: 0000; - Receiving payload (expected size: 4735567)
[12:24:49] - Downloaded at ~1541 kB/s
[12:24:49] - Averaged speed for that direction ~1133 kB/s
[12:24:49] + Received work.
[12:24:49] + Closed connections
[12:24:54] 
[12:24:54] + Processing work unit
[12:24:54] Core required: FahCore_a1.exe
[12:24:54] Core found.
[12:24:54] Working on Unit 09 [August 26 12:24:54]
[12:24:54] + Working ...
[12:24:54] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 15 -forceasm -verbose -lifeline 5847 -version 602'

[12:24:54] 
[12:24:54] *------------------------------*
[12:24:54] Folding@Home Gromacs SMP Core
[12:24:54] Version 1.74 (November 27, 2006)
[12:24:54] 
[12:24:54] Preparing to commence simulation
[12:24:54] - Ensuring status. Please wait.
[12:25:11] - Assembly optimizations manually forced on.
[12:25:11] - Not checking prior termination.
[12:25:11] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[12:25:12] - Starting from initial work packet
[12:25:12] 
[12:25:12] Project: 2665 (Run 0, Clone 479, Gen 20)
[12:25:12] 
[12:25:12] Assembly optimizations on if available.
[12:25:12] Entering M.D.
[12:25:18] Rejecting checkpoint
[12:25:20] Protein: HGG in waterExtra SSE boost OK.
[12:25:20] 
[12:25:20] Extra SSE boost OK.
[12:25:21] Writing local files
[12:25:21] Completed 0 out of 250000 steps  (0 percent)
[12:25:21] 
[12:25:21] Folding@home Core Shutdown: INTERRUPTED
[12:25:25] CoreStatus = 0 (0)
[12:25:25] Client-core communications error: ERROR 0x0
[12:25:25] Deleting current work unit & continuing...
[12:29:47] - Warning: Could not delete all work unit files (9): Core returned invalid code
[12:29:47] Trying to send all finished work units
[12:29:47] + No unsent completed units remaining.
[12:29:47] - Preparing to get new work unit...
[12:29:47] + Attempting to get work packet
[12:29:47] - Will indicate memory of 1024 MB
[12:29:47] - Connecting to assignment server
[12:29:47] Connecting to http://assign.stanford.edu:8080/
[12:29:47] Posted data.
[12:29:47] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:29:47] + News From Folding@Home: Welcome to Folding@Home
[12:29:47] Loaded queue successfully.
[12:29:47] Connecting to http://171.64.65.64:8080/
[12:29:47] Posted data.
[12:29:47] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[12:29:48] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[12:29:54] + Attempting to get work packet
[12:29:54] - Will indicate memory of 1024 MB
[12:29:54] - Connecting to assignment server
[12:29:54] Connecting to http://assign.stanford.edu:8080/
[12:29:54] Posted data.
[12:29:54] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:29:54] + News From Folding@Home: Welcome to Folding@Home
[12:29:54] Loaded queue successfully.
[12:29:54] Connecting to http://171.64.65.64:8080/
[12:30:00] Posted data.
[12:30:00] Initial: 0000; - Receiving payload (expected size: 4675358)
[12:30:03] - Downloaded at ~1521 kB/s
[12:30:03] - Averaged speed for that direction ~1211 kB/s
[12:30:03] + Received work.
[12:30:03] + Closed connections
[12:30:08] 
[12:30:08] + Processing work unit
[12:30:08] Core required: FahCore_a1.exe
[12:30:08] Core found.
[12:30:08] Working on Unit 00 [August 26 12:30:08]
[12:30:08] + Working ...
[12:30:08] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 15 -forceasm -verbose -lifeline 5847 -version 602'

[12:30:08] 
[12:30:08] *------------------------------*
[12:30:08] Folding@Home Gromacs SMP Core
[12:30:08] Version 1.74 (November 27, 2006)
[12:30:08] 
[12:30:08] Preparing to commence simulation
[12:30:08] - Ensuring status. Please wait.
[12:30:25] - Assembly optimizations manually forced on.
[12:30:25] - Not checking prior termination.
[12:30:26] - Expanded 4674846 -> 24111057 (decompressed 515.7 percent)
[12:30:26] - Starting from initial work packet
[12:30:26] 
[12:30:26] Project: 2665 (Run 1, Clone 208, Gen 43)
[12:30:26] 
[12:30:26] Assembly optimizations on if available.
[12:30:26] Entering M.D.
[12:30:32] Rejecting checkpoint
[12:30:33] Protein: IBX in water
[12:30:33] Writing local files
[12:30:34] Extra SSE boost OK.
[12:30:34] Writing local files
[12:30:35] Completed 0 out of 250000 steps  (0 percent)
[12:44:17] Writing local files
[12:44:17] Completed 2500 out of 250000 steps  (1 percent)
[12:58:03] Writing local files
[12:58:03] Completed 5000 out of 250000 steps  (2 percent)
[13:11:52] Writing local files
[13:11:52] Completed 7500 out of 250000 steps  (3 percent)
[13:25:36] Writing local files
[13:25:36] Completed 10000 out of 250000 steps  (4 percent)
[13:39:19] Writing local files
[13:39:19] Completed 12500 out of 250000 steps  (5 percent)
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by toTOW »

I see more than 20 reports for this WU ... all are EUE :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by 314159 »

:e?:

At least the darn thing doesn't simply cause the client to stop and remain idle until detected.

The thing that burns me is that I have one Quad that has been assigned the identical defective WU on at least 5 occasions. It runs for many hours and then 0xwhatevers and starts again from scratch.

Due to the lack of error trapping (in certain cases), the Powers that Be would be totally unaware of the failure(s) and others have undoubtedly suffered through the same thing.

I know that the SMP processing is extremely difficult to debug.

On the other hand, it seems to me that the code could be modified to revert to the last checkpoint and actually send results back to our friends at Stanford for ALL of the 0x cases for which the cause has not yet been determined. This communication should eliminate multiple assignments of the same WU.

Partial credit would be welcomed by many who fold for points (I fold "In Memory of").

The main benefit would be the elimination of the frustration from what I will continue to refer to as the Projects #1 Asset - i.e. its enthusiastic base of VOLUNTEERS. :)

The combined Linux Client is supposed to be a "final" stable release? :?:
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
Leoslocks
Posts: 120
Joined: Fri Jan 25, 2008 3:20 am
Hardware configuration: Q6600 | P35-DQ6 | Crucial 2 x 1 GB ram | VisionTek 3870
GPU2 Version 6.20| CPU three 6.20 Clients

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by Leoslocks »

I too fold [inMEMORYof]

I have seen a similar SMP situation where the client hangs or just plain shuts down after an EUE .
Using Vista Ultimate 64, dropping in the 6.22beta2r3 executable cured the 'shut down' after EUE issue.
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by ChelseaOilman »

I couldn't get anywhere with this WU.

Code: Select all

[12:04:27] Working on queue slot 08 [August 30 12:04:27 UTC]
[12:04:27] + Working ...
[12:04:27] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:04:28] 
[12:04:28] *------------------------------*
[12:04:28] Folding@Home Gromacs SMP Core
[12:04:28] Version 1.76 (February 23, 2008)
[12:04:28] 
[12:04:28] Preparing to commence simulation
[12:04:28] - Ensuring status. Please wait.
[12:04:45] - Assembly optimizations manually forced on.
[12:04:45] - Not checking prior termination.
[12:05:01] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[12:05:02] - Starting from initial work packet
[12:05:02] 
[12:05:02] Project: 2665 (Run 0, Clone 479, Gen 20)
[12:05:02] 
[12:05:39] Assembly optimizations on if available.
[12:05:39] Entering M.D.
[12:05:46] Rejecting checkpoint
[12:05:48] 
[12:05:48] Writing local files
[12:05:48] 
[12:05:48] Writing local files
[12:05:59] Extra SSE boost OK.
[12:05:59] Writing local files
[12:06:00]  send back what have done.
[12:06:00] logfile size: 9422Gromacs cannot continue further.
[12:06:00] Going to send back what have done.
[12:06:00] logfile size: 9422
[12:06:00] - Writing 9958 bytes of core data to disk...
[12:06:00]   ... Done.
[12:06:00] o delete work/wudata_08.bed
[12:06:00] - Failed to delete work/wudata_08.sas
[12:06:00] - Failed to delete work/wudata_08.goe
[12:06:00] Warning:  check for stray files
[12:06:00] 
[12:06:00] Folding@home Core Shutdown: EARLY_UNIT_END
[12:06:00] Finalizing output
[12:08:07] CoreStatus = 63 (99)
[12:08:07] + Error starting Folding@Home core.
[12:08:12] 
[12:08:12] + Processing work unit
[12:08:12] Work type a1 not eligible for variable processors
[12:08:12] Core required: FahCore_a1.exe
[12:08:12] Core found.
[12:08:12] Working on queue slot 08 [August 30 12:08:12 UTC]
[12:08:12] + Working ...
[12:08:12] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:08:13] 
[12:08:13] *------------------------------*
[12:08:13] Folding@Home Gromacs SMP Core
[12:08:13] Version 1.76 (February 23, 2008)
[12:08:13] 
[12:08:13] Preparing to commence simulation
[12:08:13] - Ensuring status. Please wait.
[12:08:30] - Assembly optimizations manually forced on.
[12:08:30] - Not checking prior termination.
[12:10:13] SING_WORK_FILES
[12:10:13] Finalizing output
[12:10:30] NG_WORK_FILES
[12:10:30] Finalizing output
[12:10:35] CoreStatus = 1 (1)
[12:10:35] Client-core communications error: ERROR 0x1
[12:10:35] Deleting current work unit & continuing...
[12:12:55] - Warning: Could not delete all work unit files (8): Core returned invalid code
[12:12:55] Trying to send all finished work units
[12:12:55] + No unsent completed units remaining.
[12:12:55] - Preparing to get new work unit...
[12:12:55] + Attempting to get work packet
[12:12:55] - Will indicate memory of 2047 MB
[12:12:55] - Connecting to assignment server
[12:12:55] Connecting to http://assign.stanford.edu:8080/
[12:12:55] Posted data.
[12:12:55] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:12:55] + News From Folding@Home: Welcome to Folding@Home
[12:12:55] Loaded queue successfully.
[12:12:55] Connecting to http://171.64.65.64:8080/
[12:13:01] Posted data.
[12:13:01] Initial: 0000; - Receiving payload (expected size: 4735567)
[12:13:10] - Downloaded at ~513 kB/s
[12:13:10] - Averaged speed for that direction ~498 kB/s
[12:13:10] + Received work.
[12:13:12] + Closed connections
[12:13:17] 
[12:13:17] + Processing work unit
[12:13:17] Work type a1 not eligible for variable processors
[12:13:17] Core required: FahCore_a1.exe
[12:13:17] Core found.
[12:13:17] Working on queue slot 09 [August 30 12:13:17 UTC]
[12:13:17] + Working ...
[12:13:17] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:13:18] 
[12:13:18] *------------------------------*
[12:13:18] Folding@Home Gromacs SMP Core
[12:13:18] Version 1.76 (February 23, 2008)
[12:13:18] 
[12:13:18] Preparing to commence simulation
[12:13:18] - Ensuring status. Please wait.
[12:13:35] - Assembly optimizations manually forced on.
[12:13:35] - Not checking prior termination.
[12:13:50] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[12:13:50] - Starting from initial work packet
[12:13:50] 
[12:13:50] Project: 2665 (Run 0, Clone 479, Gen 20)
[12:13:50] 
[12:14:41] Assembly optimizations on if available.
[12:14:41] Entering M.D.
[12:14:48] Rejecting checkpoint
[12:14:50] Protein: HGG in water
[12:14:50] Writing local files
[12:15:00] Extra SSE boost OK.
[12:15:01] ue further.
[12:15:01] Going to send back what have done.
[12:15:01] logfile size: 9422
[12:15:01] - Writing 9958 bytes of core data to disk...
[12:15:01]   ... Done.
[12:15:01] - Failed to delete work/wudata_09.arc
[12:15:01]  9958 bytes of core data to disk...
[12:15:01]   ... Done.
[12:15:01] o delete work/wudata_09.bed
[12:15:01] - Failed to delete work/wudata_09.sas
[12:15:01] - Failed to delete work/wudata_09.goe
[12:15:01] Warning:  check for stray files
[12:15:01] ck for stray files
[12:15:01] 9.xvg
[12:15:01] Warning:  check for stray files
[12:15:01] 
[12:15:01] Folding@home Core Shutdown: EARLY_UNIT_END
[12:15:01] Finalizing output
[12:17:06] CoreStatus = 63 (99)
[12:17:06] + Error starting Folding@Home core.
[12:17:06] - Attempting to download new core...
[12:17:06] + Downloading new core: FahCore_a1.exe
[12:17:06] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[12:17:06] Initial: AFDE; + 10240 bytes downloaded
<SNIP>
[12:17:08] Initial: 24B3; + 795847 bytes downloaded
[12:17:08] Verifying core Core_a1.fah...
[12:17:08] Signature is VALID
[12:17:08] 
[12:17:08] Trying to unzip core FahCore_a1.exe
[12:17:12] Decompressed FahCore_a1.exe (2117632 bytes) successfully
[12:17:17] + Core successfully engaged
[12:17:25] 
[12:17:25] + Processing work unit
[12:17:25] Work type a1 not eligible for variable processors
[12:17:25] Core required: FahCore_a1.exe
[12:17:25] Core found.
[12:17:25] Working on queue slot 09 [August 30 12:17:25 UTC]
[12:17:25] + Working ...
[12:17:25] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:17:26] 
[12:17:26] *------------------------------*
[12:17:26] Folding@Home Gromacs SMP Core
[12:17:26] Version 1.76 (February 23, 2008)
[12:17:26] 
[12:17:26] Preparing to commence simulation
[12:17:26] - Ensuring status. Please wait.
[12:17:43] - Assembly optimizations manually forced on.
[12:17:43] - Not checking prior termination.
[12:19:43] 
[12:19:43] Folding@home Core Shutdown: MISSING_WORK_FILES
[12:19:43] Finalizing output
[12:19:47] CoreStatus = 1 (1)
[12:19:47] Client-core communications error: ERROR 0x1
[12:19:47] Deleting current work unit & continuing...
[12:22:09] - Warning: Could not delete all work unit files (9): Core returned invalid code
[12:22:09] Trying to send all finished work units
[12:22:09] + No unsent completed units remaining.
[12:22:09] - Preparing to get new work unit...
[12:22:09] + Attempting to get work packet
[12:22:09] - Will indicate memory of 2047 MB
[12:22:09] - Connecting to assignment server
[12:22:09] Connecting to http://assign.stanford.edu:8080/
[12:22:10] Posted data.
[12:22:10] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:22:10] + News From Folding@Home: Welcome to Folding@Home
[12:22:10] Loaded queue successfully.
[12:22:10] Connecting to http://171.64.65.64:8080/
[12:22:15] Posted data.
[12:22:15] Initial: 0000; - Receiving payload (expected size: 4735567)
[12:22:26] - Downloaded at ~420 kB/s
[12:22:26] - Averaged speed for that direction ~482 kB/s
[12:22:26] + Received work.
[12:22:28] + Closed connections
[12:22:33] 
[12:22:33] + Processing work unit
[12:22:33] Work type a1 not eligible for variable processors
[12:22:33] Core required: FahCore_a1.exe
[12:22:33] Core found.
[12:22:33] Working on queue slot 00 [August 30 12:22:33 UTC]
[12:22:33] + Working ...
[12:22:33] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:22:33] 
[12:22:33] *------------------------------*
[12:22:33] Folding@Home Gromacs SMP Core
[12:22:33] Version 1.76 (February 23, 2008)
[12:22:33] 
[12:22:33] Preparing to commence simulation
[12:22:33] - Ensuring status. Please wait.
[12:22:50] - Assembly optimizations manually forced on.
[12:22:50] - Not checking prior termination.
[12:23:07] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[12:23:07] - Starting from initial work packet
[12:23:07] 
[12:23:07] Project: 2665 (Run 0, Clone 479, Gen 20)
[12:23:07] 
[12:23:56] Assembly optimizations on if available.
[12:23:56] Entering M.D.
[12:24:03] Rejecting checkpoint
[12:24:05] PWriting local files
[12:24:05] 
[12:24:05] Writing local files
[12:24:15] Extra SSE boost OK.
[12:24:16] ue further.
[12:24:16] Going to send back what have done.
[12:24:16] logfile size: 9421
[12:24:16] - Writing 9957 bytes of core data to disk...
[12:24:16]   ... Done.
[12:24:16] - Failed to delete work/wudata_00.arc
[12:24:16] - Failed to delete work/wudata_00.xtc
[12:24:16] - Failed to delete work/wudata_00.bed
[12:24:16] - Failed to delete work/wudata_00.sas
[12:24:16] - Failed to delete work/wudata_00.goe
[12:24:16] Warning:  check for stray files
[12:24:16] 
[12:24:16] Folding@home Core Shutdown: EARLY_UNIT_END
[12:24:16] Finalizing output
[12:26:29] CoreStatus = 63 (99)
[12:26:29] + Error starting Folding@Home core.
[12:26:34] 
[12:26:34] + Processing work unit
[12:26:34] Work type a1 not eligible for variable processors
[12:26:34] Core required: FahCore_a1.exe
[12:26:34] Core found.
[12:26:34] Working on queue slot 00 [August 30 12:26:34 UTC]
[12:26:34] + Working ...
[12:26:34] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:26:34] 
[12:26:34] *------------------------------*
[12:26:34] Folding@Home Gromacs SMP Core
[12:26:34] Version 1.76 (February 23, 2008)
[12:26:34] 
[12:26:34] Preparing to commence simulation
[12:26:34] - Ensuring status. Please wait.
[12:26:34] y forced on.
[12:26:34] - Not checking prior termination.
[12:26:34] 
[12:26:34] Folding@home Core Shutdown: MISSING_WORK_FILES
[12:26:34] Finalizing output
[12:28:51] NG_WORK_FILES
[12:28:51] Finalizing output
[12:28:54] CoreStatus = 1 (1)
[12:28:54] Client-core communications error: ERROR 0x1
[12:28:54] - Attempting to download new core...
[12:28:54] + Downloading new core: FahCore_a1.exe
[12:28:54] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[12:28:55] Initial: AFDE; + 10240 bytes downloaded
<SNIP>
[12:28:56] Initial: 24B3; + 795847 bytes downloaded
[12:28:56] Verifying core Core_a1.fah...
[12:28:56] Signature is VALID
[12:28:56] 
[12:28:56] Trying to unzip core FahCore_a1.exe
[12:28:57] Decompressed FahCore_a1.exe (2117632 bytes) successfully
[12:29:02] + Core successfully engaged
[12:29:03] Deleting current work unit & continuing...
[12:31:25] - Warning: Could not delete all work unit files (0): Core returned invalid code
[12:31:25] Trying to send all finished work units
[12:31:25] + No unsent completed units remaining.
[12:31:25] - Preparing to get new work unit...
[12:31:25] + Attempting to get work packet
[12:31:25] - Will indicate memory of 2047 MB
[12:31:25] - Connecting to assignment server
[12:31:25] Connecting to http://assign.stanford.edu:8080/
[12:31:25] Posted data.
[12:31:25] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:31:25] + News From Folding@Home: Welcome to Folding@Home
[12:31:25] Loaded queue successfully.
[12:31:25] Connecting to http://171.64.65.64:8080/
[12:31:26] Posted data.
[12:31:26] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[12:31:26] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[12:31:34] + Attempting to get work packet
[12:31:34] - Will indicate memory of 2047 MB
[12:31:34] - Connecting to assignment server
[12:31:34] Connecting to http://assign.stanford.edu:8080/
[12:31:34] Posted data.
[12:31:34] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[12:31:34] + News From Folding@Home: Welcome to Folding@Home
[12:31:35] Loaded queue successfully.
[12:31:35] Connecting to http://171.64.65.64:8080/
[12:31:42] Posted data.
[12:31:42] Initial: 0000; - Receiving payload (expected size: 4682396)
[12:31:50] - Downloaded at ~571 kB/s
[12:31:50] - Averaged speed for that direction ~500 kB/s
[12:31:50] + Received work.
[12:31:52] + Closed connections
[12:31:57] 
[12:31:57] + Processing work unit
[12:31:57] Work type a1 not eligible for variable processors
[12:31:57] Core required: FahCore_a1.exe
[12:31:57] Core found.
[12:31:57] Working on queue slot 01 [August 30 12:31:57 UTC]
[12:31:57] + Working ...
[12:31:57] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 15 -forceasm -verbose -lifeline 2816 -version 622'

[12:31:58] 
[12:31:58] *------------------------------*
[12:31:58] Folding@Home Gromacs SMP Core
[12:31:58] Version 1.76 (February 23, 2008)
[12:31:58] 
[12:31:58] Preparing to commence simulation
[12:31:58] - Ensuring status. Please wait.
[12:32:15] - Assembly optimizations manually forced on.
[12:32:15] - Not checking prior termination.
[12:32:29] - Expanded 4681884 -> 24111057 (decompressed 514.9 percent)
[12:32:29] - Starting from initial work packet
[12:32:29] 
[12:32:29] Project: 2665 (Run 1, Clone 664, Gen 46)
[12:32:29] 
[12:33:14] Assembly optimizations on if available.
[12:33:14] Entering M.D.
[12:33:21] Rejecting checkpoint
[12:33:23] PWriting local files
[12:33:23] 
[12:33:23] Writing local files
[12:33:32] Extra SSE boost OK.
[12:33:33] Writing local files
[12:33:33] Completed 0 out of 250000 steps  (0 percent)
Had to use qfix to upload the three wuresults_0x.dat files.
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by 314159 »

Geesh!

We spend the time and expense of running these defective WUs.
We come up with a 0xwhatever - multiple times each in most cases.
We take the time to report the defective WU here - more than 20 times on this one (per toTOW).

Nothing happens!!! :?

Why has this one not been pulled?
Why are we even wasting our time and effort in reporting these?

The worst news is that I have had a defective one re-assigned after completion of a good WU, only to have to go through the entire process again.

I am a bit miffed. (to say it as politely and mildly as possible) :)

I know that you Forum Mods (and above) have contacts with the Pande folks.
Is there not something that you can do?
Perhaps it's time to bring up the subject in your Mods Forum or whatever you call it here?

People are dropping out of the project like flies!! :e(
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 2665 (Run 0, Clone 479, Gen 20) - ERROR 0X0

Post by 314159 »

Geesh! (#2) :(

Did I mention that defective WUs are being assigned to the same machine that attempted to complete them previously - and often days later?
What a waste!!

Here is the evidence (at least it is not a failure at frame 99): :)

Code: Select all

[14:38:02] Working on Unit 09 [August 31 14:38:02]
[14:38:02] + Working ...
[14:38:02] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 15 -forceasm -verbose -lifeline 5623 -version 602'

[14:38:02] 
[14:38:02] *------------------------------*
[14:38:02] Folding@Home Gromacs SMP Core
[14:38:02] Version 1.74 (November 27, 2006)
[14:38:02] 
[14:38:02] Preparing to commence simulation
[14:38:02] - Ensuring status. Please wait.
[14:38:19] - Assembly optimizations manually forced on.
[14:38:19] - Not checking prior termination.
[14:38:20] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[14:38:20] - Starting from initial work packet
[14:38:20] 
[14:38:20] Project: 2665 (Run 0, Clone 479, Gen 20)
[14:38:20] 
[14:38:21] Assembly optimizations on if available.
[14:38:21] Entering M.D.
[14:38:27] Rejecting checkpoint
[14:38:28] Protein: HGG in waterExtra SSE boost OK.
[14:38:28] 
[14:38:28] Extra SSE boost OK.
[14:38:29] Writing local files
[14:38:29] Completed 0 out of 250000 steps  (0 percent)
[14:38:29] 
[14:38:29] Folding@home Core Shutdown: INTERRUPTED
[14:38:34] CoreStatus = 0 (0)
[14:38:34] Client-core communications error: ERROR 0x0
[14:38:34] Deleting current work unit & continuing...
[14:42:55] - Warning: Could not delete all work unit files (9): Core returned invalid code
[14:42:55] Trying to send all finished work units
[14:42:55] + No unsent completed units remaining.
[14:42:55] - Preparing to get new work unit...
[14:42:55] + Attempting to get work packet
[14:42:55] - Will indicate memory of 1000 MB
[14:42:55] - Connecting to assignment server
[14:42:55] Connecting to http://assign.stanford.edu:8080/
[14:42:56] Posted data.
[14:42:56] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[14:42:56] + News From Folding@Home: Welcome to Folding@Home
[14:42:56] Loaded queue successfully.
[14:42:56] Connecting to http://171.64.65.64:8080/
[14:43:01] Posted data.
[14:43:01] Initial: 0000; - Receiving payload (expected size: 4735567)
[14:43:04] - Downloaded at ~1541 kB/s
[14:43:04] - Averaged speed for that direction ~1255 kB/s
[14:43:04] + Received work.
[14:43:04] + Closed connections
[14:43:09] 
[14:43:09] + Processing work unit
[14:43:09] Core required: FahCore_a1.exe
[14:43:09] Core found.
[14:43:09] Working on Unit 00 [August 31 14:43:09]
[14:43:09] + Working ...
[14:43:09] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 15 -forceasm -verbose -lifeline 5623 -version 602'

[14:43:09] 
[14:43:09] *------------------------------*
[14:43:09] Folding@Home Gromacs SMP Core
[14:43:09] Version 1.74 (November 27, 2006)
[14:43:09] 
[14:43:09] Preparing to commence simulation
[14:43:09] - Ensuring status. Please wait.
[14:43:26] - Assembly optimizations manually forced on.
[14:43:26] - Not checking prior termination.
[14:43:27] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[14:43:27] - Starting from initial work packet
[14:43:27] 
[14:43:27] Project: 2665 (Run 0, Clone 479, Gen 20)
[14:43:27] 
[14:43:27] Assembly optimizations on if available.
[14:43:27] Entering M.D.
[14:43:33] Rejecting checkpoint
[14:43:34] Protein: HGG in water
[14:43:35] xtra SSE boost OK.
[14:43:35] 
[14:43:35] Extra SSE boost OK.
[14:43:36] Writing local files
[14:43:36] Completed 0 out of 250000 steps  (0 percent)
[14:43:36] 
[14:43:36] Folding@home Core Shutdown: INTERRUPTED
[14:43:40] CoreStatus = 0 (0)
[14:43:40] Client-core communications error: ERROR 0x0
[14:43:40] Deleting current work unit & continuing...
[14:48:01] - Warning: Could not delete all work unit files (0): Core returned invalid code
[14:48:01] Trying to send all finished work units
[14:48:01] + No unsent completed units remaining.
[14:48:01] - Preparing to get new work unit...
[14:48:01] + Attempting to get work packet
[14:48:01] - Will indicate memory of 1000 MB
[14:48:01] - Connecting to assignment server
[14:48:01] Connecting to http://assign.stanford.edu:8080/
[14:48:02] Posted data.
[14:48:02] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[14:48:02] + News From Folding@Home: Welcome to Folding@Home
[14:48:02] Loaded queue successfully.
[14:48:02] Connecting to http://171.64.65.64:8080/
[14:48:07] Posted data.
[14:48:07] Initial: 0000; - Receiving payload (expected size: 4735567)
[14:48:11] - Downloaded at ~1156 kB/s
[14:48:11] - Averaged speed for that direction ~1235 kB/s
[14:48:11] + Received work.
[14:48:11] + Closed connections
[14:48:16] 
[14:48:16] + Processing work unit
[14:48:16] Core required: FahCore_a1.exe
[14:48:16] Core found.
[14:48:16] Working on Unit 01 [August 31 14:48:16]
[14:48:16] + Working ...
[14:48:16] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 15 -forceasm -verbose -lifeline 5623 -version 602'

[14:48:16] 
[14:48:16] *------------------------------*
[14:48:16] Folding@Home Gromacs SMP Core
[14:48:16] Version 1.74 (November 27, 2006)
[14:48:16] 
[14:48:16] Preparing to commence simulation
[14:48:16] - Ensuring status. Please wait.
[14:48:34] - Assembly optimizations manually forced on.
[14:48:34] - Not checking prior termination.
[14:48:35] - Expanded 4735055 -> 24426905 (decompressed 515.8 percent)
[14:48:35] - Starting from initial work packet
[14:48:35] 
[14:48:35] Project: 2665 (Run 0, Clone 479, Gen 20)
[14:48:35] 
[14:48:35] Assembly optimizations on if available.
[14:48:35] Entering M.D.
[14:48:41] Rejecting checkpoint
[14:48:42] Protein: HGG in water
[14:48:42] xtra SSE boost OK.
[14:48:42] 
[14:48:43] Extra SSE boost OK.
[14:48:43] Writing local files
[14:48:43] Completed 0 out of 250000 steps  (0 percent)
[14:48:43] 
[14:48:43] Folding@home Core Shutdown: INTERRUPTED
[14:48:47] CoreStatus = 0 (0)
[14:48:47] Client-core communications error: ERROR 0x0
Would one of you Mods please change the topic of this thread to Project: 2665 (Run 0, Clone 479, Gen 20) - ERROR 0X0 if this post did not accomplish that.
Thanks!

Can we assign this one to the waste basket? :?: :?
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Project: 2665 (Run 0, Clone 479, Gen 20) - ERROR 0X0

Post by ChelseaOilman »

314159 wrote:Did I mention that defective WUs are being assigned to the same machine that attempted to complete them previously - and often days later?
Uploading the wuresults_0x.dat files will decrease the chance you get these bad WUs reassigned to you. You'll probably need to use qfix like I did. I checked the WU database for this WU, your not listed.
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by 314159 »

No wuresults_x.dat file is generated in these cases - at least with the Linux Client.

I think that they need to kill this WU and re-run Run 0, Clone 479, Gen 19.

In any event, my expectation is that I should NOT have to watch the machines on my "farm" closely.
They should be "fed" WUs that are reasonably stable AND do not cause client hangs (if the WUs are the cause of this).
Client/Cores should be properly coded so that ALL errors are reported to Pande and partial credit awarded.
Mutiple assignments of defective WUs to the same machine should be eliminated immediately.

That said I understand a few things, namely:

1. Errors generated by this genre of code can be extremely difficult to troubleshoot.
2. Apparently, the Linux Client is FAR down in the Project's priorities - quite subordinate to the GPU and WIN SMP work.
3. I am not particularly pleased with the release of what is proported to be a "final combined client"/core(s) with this many bugs in it.
IMHO it is still a "beta" release.

4. I greatly appreciate the efforts of those people at Pande who are trying to stabilize things.
I also appreciate their responsiveness.

Fold on!

John
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
Baowoulf
Posts: 208
Joined: Wed Dec 12, 2007 8:44 pm
Hardware configuration: Pentium 4 2.8 GHz, 512MB DDR Ram, 128MB Radeon 9800, Creative Soundblaster Audigy 4 Pro
Location: Jupiter 6
Contact:

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by Baowoulf »

314159 wrote:3. I am not particularly pleased with the release of what is proported to be a "final combined client"/core(s) with this many bugs in it.
IMHO it is still a "beta" release.
I thought SMP was still in beta? And that even the cpu version 6.22 that you can switch back and forth between cpu and smp clients was also and the only version 6 client in beta?
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by 314159 »

Linux
Linux (x86) and BSD *combined uniprocessor and SMP client* (64-bit required for SMP) 6.02

No "expiration date" that I know of and not labeled "beta" (as before).

You may, of course, be right. I am easily and gracefully correctable. :)

Are we talking about the same client?
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
Baowoulf
Posts: 208
Joined: Wed Dec 12, 2007 8:44 pm
Hardware configuration: Pentium 4 2.8 GHz, 512MB DDR Ram, 128MB Radeon 9800, Creative Soundblaster Audigy 4 Pro
Location: Jupiter 6
Contact:

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by Baowoulf »

Turns out we're both right. The Windows SMP is still in beta but not the Linux one.
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by 314159 »

:wink:
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
arfyness
Posts: 13
Joined: Sun Aug 31, 2008 3:13 pm
Hardware configuration: 5 x WinXP (console version installed as service)
1 x Linux (manually, until I figure out what's wrong)
Asus nForce2 Mobo (A7N8X-E) w/ AMD AthlonXP 3200+ :: 1.0GB RAM :: Ubuntu Hardy 8.04.1
Location: Columbus, Ohio, USA

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by arfyness »

Have y'all tried running one instance per core? I ask cause I'm curious whether the Linux 6.02 client will work that way. That's what I do on my mom's winxp dual core anyway, with the 6.20 console/service version. It works fine. But that's in Windows, which sadly still seems to have higher priority. I chose that route to avoid possible beta SMP baloney. Besides, the CPU runs to its potential, which it didn't with the Windows SMP version. (It's also sad that I built my mom a computer way better than mine in every way when her most intensive tasks are email and freecell.) :e?:

But this is about Linux. I'm using the same (current Linux version) 6.02 on a uniprocessor (AthlonXP 3200+) and I'm having some similar trouble of my own. So maybe it's related?

I agree that it's a terrible thing to disenfranchise the user base, since this project clearly would be NOWHERE without a user base. On the other hand, I think the average Linux user tends to be a bit more patient with the process of working bugs out than the average Windows user. :lol:

The crash reporting process, on the other hand, should DEFINITELY be built into the client / server communication structure, whether directly to the Pande group, or via assignment servers. This seems to me a fairly obvious concept which still to escapes the attention it deserves. C'mon, I even have a bittorrent client (Miro) that calls home with crash-report details.

Next time it crashes, I'll try this qfix thing you guys are talking about.
(And it likely might; I'm still on the same work unit - Project: 781 (Run 0, Clone 83, Gen 2).) :x

-- Nate
:: ./fah6 v6.02 :: Ubuntu Hardy 8.04.1 :: Asus A7N8X-E (nForce2) :: AMD AthlonXP 3200+ :: 1.0GB RAM :: 1 cpu ::

Image
crapiecorn
Posts: 5
Joined: Sat Mar 29, 2008 8:42 am
Location: Belgium
Contact:

Re: Project: 2665 (Run 0, Clone 479, Gen 20)

Post by crapiecorn »

Code: Select all

Launch directory: /home/folding/folding
Executable: ./fah6
Arguments: -smp 

[14:56:06] - Ask before connecting: No
[14:56:06] - User name: StrikeTeam (Team 34517)
[14:56:06] - User ID: 6D2FDEB400C80E92
[14:56:06] - Machine ID: 2
[14:56:06] 
[14:56:06] Work directory not found. Creating...
[14:56:06] Could not open work queue, generating new queue...
[14:56:06] - Preparing to get new work unit...
[14:56:06] + Attempting to get work packet
[14:56:06] - Connecting to assignment server
[14:56:07] - Successful: assigned to (171.64.65.64).
[14:56:07] + News From Folding@Home: Welcome to Folding@Home
[14:56:07] Loaded queue successfully.
[14:56:39] + Closed connections
[14:56:39] 
[14:56:39] + Processing work unit
[14:56:39] Core required: FahCore_a1.exe
[14:56:39] Core found.
[14:56:39] Working on Unit 01 [September 1 14:56:39]
[14:56:39] + Working ...
[14:56:39] 
[14:56:39] *------------------------------*
[14:56:39] Folding@Home Gromacs SMP Core
[14:56:39] Version 1.74 (November 27, 2006)
[14:56:39] 
[14:56:39] Preparing to commence simulation
[14:56:39] - Ensuring status. Please wait.
[14:56:40] - Starting from initial work packet
[14:56:40] 
[14:56:40] Project: 2665 (Run 0, Clone 828, Gen 47)
[14:56:40] 
[14:56:41] Assembly optimizations on if available.
[14:56:41] Entering M.D.
[14:56:57]  percent)
[14:56:57] - Starting from initial work packet
[14:56:57] 
[14:56:57] Project: 2665 (Run 0, Clone 828, Gen 47)
[14:56:57] 
[14:56:57] Entering M.D.
[14:57:05] Protein: HGG in water
[14:57:05] Writing local files
[14:57:09] Extra SSE boost OK.
[14:58:16] Finalizing output
[14:58:20] CoreStatus = 0 (0)
[14:58:20] Client-core communications error: ERROR 0x0
[14:58:20] Deleting current work unit & continuing...

Folding@Home Client Shutdown.
Post Reply