More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Andrius
Posts: 4
Joined: Thu Jun 19, 2008 4:36 pm
Hardware configuration: PC1 : Q6600 (3GHz), P35, 2GB DDR2, 9600GT (720,1800,2016) : 4150 to 5000 PPD
PC2 : A64 (2GHz), nf4, 2GB DDR, 8600GT (600,1450,1440) : 1450 to 1750 PPD
Location: Slovenia

More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by Andrius »

Project: 2665 (Run 3, Clone 807, Gen 37)
Died after 2 frames, failed to finalize, killed the client (with the popup).

Code: Select all

[August 13 ]
[07:38:53] Preparing to commence simulation
[07:38:53] - Looking at optimizations...
[07:38:53] - Created dyn
[07:38:53] - Files status OK
[07:39:08] - Expanded 4756305 -> 24426905 (decompressed 513.5 percent)
[07:39:08] - Starting from initial work packet
[07:39:08] 
[07:39:08] Project: 2665 (Run 3, Clone 807, Gen 37)
[07:39:08] 
[07:39:10] Assembly optimizations on if available.
[07:39:10] Entering M.D.
[07:39:16] Rejecting checkpoint
[07:39:17] 
[07:39:17] Writing local files
[07:39:18] 
[07:39:18] Writing local files
[07:39:26] Extra SSE boost OK.
[07:39:27] Writing local files
[07:39:27] Completed 0 out of 250000 steps  (0 percent)
[07:54:27] Timered checkpoint triggered.
[07:54:31] Writing local files
[07:54:31] Completed 2500 out of 250000 steps  (1 percent)
[08:09:31] Timered checkpoint triggered.
[08:09:36] Writing local files
[08:09:36] Completed 5000 out of 250000 steps  (2 percent)
[08:24:31] ning:  check for stray files
[08:24:31] 0.sas
[08:24:31] Warning:  check for stray files
[08:24:31] 
[08:24:31] Folding@home Core Shutdown: EARLY_UNIT_END
[08:24:31] Finalizing output
[08:24:31]  13501 bytes of core data to disk...
[08:24:31]   ... Done.
[08:26:31] 
[08:26:31] Folding@home Core Shutdown: EARLY_UNIT_END
[08:26:31] 
[08:26:31] Folding@home Core Shutdown: EARLY_UNIT_END
[08:26:35] CoreStatus = 63 (99)
[08:26:35] + Error starting Folding@Home core.
[08:26:40] 
[08:26:40] + Processing work unit
[08:26:40] Work type a1 not eligible for variable processors
[08:26:40] Core required: FahCore_a1.exe
[08:26:40] Core found.
[08:26:40] Working on queue slot 00 [August 13 08:26:40 UTC]
[08:26:40] + Working ...
[08:26:40] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 15 -verbose -lifeline 1496 -version 622'

[08:26:41] 
[08:26:41] *------------------------------*
[08:26:41] Folding@Home Gromacs SMP Core
[08:26:41] Version 1.76 (February 23, 2008)
[08:26:41] 
[08:26:41] Preparing to commence simulation
[08:26:41] - Looking at optimizations...
[08:26:41] - Created dyn
[08:26:41] - Files status OK
[08:26:41] 
[08:26:41] Folding@home Core Shutdown: MISSING_WORK_FILES
[08:26:41] Finalizing output
[08:28:44] CoreStatus = 1 (1)
[08:28:44] Client-core communications error: ERROR 0x1
[08:28:44] This is a sign of more serious problems, shutting down.
[10:00:06] - Autosending finished units... [August 13 10:00:06 UTC]
[10:00:06] Trying to send all finished work units
[10:00:06] + No unsent completed units remaining.
[10:00:06] - Autosend completed
UPDATE:
I deleted the bad WU with "-delete x" and tried downloading a new WU but got the same. It died again after 2 frames.
How do I get a different WU? Configure a new client with a different MachineID or something?

UPDATE2:
So after 3 failed attempts (and after deleting the work folder and queue.dat files 3 times) I got a different WU.
This time it's a Project: 2665 (Run 1, Clone 183, Gen 39) WU and it finished without problems.
Last edited by Andrius on Thu Aug 14, 2008 5:03 pm, edited 1 time in total.
d-con
Posts: 7
Joined: Fri Dec 07, 2007 9:27 am
Location: Makawao, HI USA

Re: More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by d-con »

Running 2665 (Run 3, Clone 165, Gen 40) I got a NaN at 19%.

This is windows smp client 5.91, stock box, no overclocking.

It's running the same WU again, now at 6%
d-con
Posts: 7
Joined: Fri Dec 07, 2007 9:27 am
Location: Makawao, HI USA

Re: More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by d-con »

I finally killed the client and moved off queue.dat and the work directory when it was running the same WU/PRG for the 4th time.

Of course, it assigned the same WU/CRG again, so I killed it again, and finally I got a different PRG, same project.

2665 (3, 165, 40) doesn't work. It gets a NaN every time at 19% on an unmodified AMD-based system with windows smp beta client 5.91

It's now running 2665(1, 377, 41). I hope this one completes.

-David
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by toTOW »

Someone else completed Project: 2665 (Run 3, Clone 807, Gen 37) successfully ...

Same for Project: 2665 (Run 3, Clone 165, Gen 40).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Andrius
Posts: 4
Joined: Thu Jun 19, 2008 4:36 pm
Hardware configuration: PC1 : Q6600 (3GHz), P35, 2GB DDR2, 9600GT (720,1800,2016) : 4150 to 5000 PPD
PC2 : A64 (2GHz), nf4, 2GB DDR, 8600GT (600,1450,1440) : 1450 to 1750 PPD
Location: Slovenia

Re: More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by Andrius »

@toTOW
Any idea on the client the other person used? I used 6.22 beta2 with the "SHM" fix.
Can you check this WU : Project: 2665 (Run 2, Clone 615, Gen 38)
[12:05:34] Completed 50000 out of 250000 steps (20 percent)
[12:14:16] Warning: long 1-4 interactions
[12:14:17] Gromacs cannot continue further.

I updated to R3 and it's at 7%.
I've done 13 units so far with the SMP client.
This was my second error so I don't think my machine is unstable.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by bruce »

Nobody has returned Project: 2665 (Run 2, Clone 615, Gen 38) yet. (this may be related to the delayed stats announced earlier today)

The data that the Mods can see doesn't tell us which client is being used. In any case, if 6.22b2-shm and 5.91 are probably both using the same version of Gromacs, and it's unlikely that the client version number matters when the Warning message is about a long 1-4 interaction.
Andrius
Posts: 4
Joined: Thu Jun 19, 2008 4:36 pm
Hardware configuration: PC1 : Q6600 (3GHz), P35, 2GB DDR2, 9600GT (720,1800,2016) : 4150 to 5000 PPD
PC2 : A64 (2GHz), nf4, 2GB DDR, 8600GT (600,1450,1440) : 1450 to 1750 PPD
Location: Slovenia

Re: More Bad WUs From Project: 2665 (Run 3, Clone 807, Gen 37)

Post by Andrius »

@bruce
True, but if it was a random instability the second run could fix it.
I'm guessing here but if it was done on a linux client it could explain the fact it was completed (not sure what projects are done on the linux SMP clients).
If it fails again I'll try again.

UPDATE (August 21 21:10 UTC) : The unit finished successfully on the second run. :shock:
[20:39:19] Project: 2665 (Run 2, Clone 615, Gen 38)
[21:06:58] + Number of Units Completed: 15
Post Reply