Project: 2653 (Run 14, Clone 0, Gen. 114) Hung@ 27%

Moderators: Site Moderators, FAHC Science Team

Post Reply
rexrzer
Posts: 44
Joined: Sat Dec 08, 2007 10:45 am

Project: 2653 (Run 14, Clone 0, Gen. 114) Hung@ 27%

Post by rexrzer »

These Beta SMP's on the Windows side of things are starting to really bother me a little! :twisted: :cry:

This number seems to be "hung up" @ 27%, as this log shows, and I frankly have no idea what to do about it, so any assistance from the experienced folks, and any Admin. who cares to venture an idea or two here, would be greatly appreciated:

03:00:30] Writing local files
[03:00:30] Completed 130000 out of 500000 steps (26 percent)
[03:15:30] Timered checkpoint triggered.
[03:30:30] Timered checkpoint triggered.
[03:43:46] - Autosending finished units... [February 22 03:43:46 UTC]
[03:43:46] Trying to send all finished work units
[03:43:46] + No unsent completed units remaining.
[03:43:46] - Autosend completed
[03:45:31] Timered checkpoint triggered.
[03:50:21] Writing local files
[03:50:22] Completed 135000 out of 500000 steps (27 percent)
[04:06:00] Timered checkpoint triggered.
[04:21:18] Timered checkpoint triggered.
[04:38:18] Timered checkpoint triggered.
[04:54:42] Timered checkpoint triggered.
[05:10:42] Timered checkpoint triggered.
[05:26:44] Timered checkpoint triggered.
[05:42:30] Timered checkpoint triggered.
[05:50:39] Killing all core threads
[05:50:39] Killing 2 cores
[05:50:39] Killing core 0
[05:50:39] Killing core 1

Folding@Home Client Shutdown at user request.
[05:50:39] ***** Got a SIGTERM signal (2)
[05:50:39] Killing all core threads
[05:50:39] Killing 2 cores
[05:50:39] Killing core 0
[05:50:39] Killing core 1

Folding@Home Client Shutdown.


--- Opening Log file [February 22 06:16:35 UTC]


# Windows SMP Console Edition #################################################
###############################################################################

Folding@Home Client Version 6.23 Beta R1

http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\David\FAH
Executable: C:\Users\David\FAH\Folding@home-Win32-x86.exe
Arguments: -smp -verbosity 9

[06:16:35] - Ask before connecting: No
[06:16:35] - User name: rexrzer (Team 38910)
[06:16:35] - User ID: 572B163062277FFF
[06:16:35] - Machine ID: 1
[06:16:35]
[06:16:35] Loaded queue successfully.
[06:16:35] - Autosending finished units... [February 22 06:16:35 UTC]
[06:16:35] Trying to send all finished work units
[06:16:35]
[06:16:35] + No unsent completed units remaining.
[06:16:35] - Autosend completed
[06:16:35] + Processing work unit
[06:16:35] Work type a1 not eligible for variable processors
[06:16:35] Core required: FahCore_a1.exe
[06:16:35] Core found.
[06:16:35] Using generic mpiexec calls
[06:16:35] Working on queue slot 05 [February 22 06:16:35 UTC]
[06:16:35] + Working ...
[06:16:35] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 05 -checkpoint 15 -verbose -lifeline 2532 -version 623'

[06:16:36]
[06:16:36] *------------------------------*
[06:16:36] Folding@Home Gromacs SMP Core
[06:16:36] Version 1.74 (March 10, 2007)
[06:16:36]
[06:16:36] Preparing to commence simulation
[06:16:37] - Ensuring status. Please wait.
[06:16:53] - Looking at optimizations...
[06:16:53] - Working with standard loops on this execution.
[06:16:53] Examination of work files indicates 8 consecutive improper terminations of core.
[06:16:57] - Expanded 2445548 -> 12901369 (decompressed 527.5 percent)
[06:16:58]
[06:16:58] Project: 2653 (Run 14, Clone 0, Gen 114)
[06:16:58]
[06:16:59] Entering M.D.
[06:17:09] Calling FAH init
[06:17:11] Read topology
[06:17:12] g local files
[06:17:12] checkpoint)
[06:17:12] Read checkpoint
[06:17:12] Protein: Protein in POPC
[06:17:12] Writing local files
[06:17:12] Completed 135289 out of 500000 steps (27 percent)
[06:17:15] Extra SSE boost OK.


As you can see, I've stopped and restarted the Client to no avail thus far, and if I check up on it in a few minutes and it is STILL stuck @ 27% I am shutting it down until I can get some advice about what to do...

Is there some "Prompt" or 'qfix' routine to do when this happens? Any Command Line exercise to try? I am scratching my head right now so please excuse me while I go cry in the corner! :?

Cheers,
rexrzer 8-)
i7 970 HexCore @ 4.3Ghz/24GB RAM; i7 920 @ 4.2Ghz/6GB RAM; Asus G73SW-3DE laptop/Core i7 2630QM @ 2.5 Ghz/16GB RAM; i7 920 @ 4.2Ghz/6GB RAM+GPU Clients: 2 EVGA GTX-560 Ti SC's-SLI+2 EVGA GTX-560 Ti SC's, all 'clocked 980/1960/2170
rexrzer
Posts: 44
Joined: Sat Dec 08, 2007 10:45 am

Re: Project: 2653 (Run 14, Clone 0, Gen. 114) Hung@ 27%

Post by rexrzer »

NEVERmind!

It just kicked over after some 2 more 15-minutes intervals! :!: :?: :shock:

I have never seen a WU pass a % in that much time. If it continues, I will miss deadline for sure, so don't think I won't post again about this one if it's still acting totally nuts like this...

Next time I will wait EVEN LONGer than I did for this one, not that I am sitting here watching my PC's all night or anything...in fact it was simple coincidence that I even saw this...was using the notebook for something else altogether when this happened.

SORRY for wasting the bandwidth. :oops:

Cheers,
rexrzer 8-)
i7 970 HexCore @ 4.3Ghz/24GB RAM; i7 920 @ 4.2Ghz/6GB RAM; Asus G73SW-3DE laptop/Core i7 2630QM @ 2.5 Ghz/16GB RAM; i7 920 @ 4.2Ghz/6GB RAM+GPU Clients: 2 EVGA GTX-560 Ti SC's-SLI+2 EVGA GTX-560 Ti SC's, all 'clocked 980/1960/2170
toTOW
Site Moderator
Posts: 6453
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2653 (Run 14, Clone 0, Gen. 114) Hung@ 27%

Post by toTOW »

There are two entry in the DB, both for full credit. But yours doesn't show up yet ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
rexrzer
Posts: 44
Joined: Sat Dec 08, 2007 10:45 am

Re: Project: 2653 (Run 14, Clone 0, Gen. 114) Hung@ 27%

Post by rexrzer »

Hello Mr. toTOW;

Well, this one probably won't show up EVER again the way things look now. :o :shock: :twisted:

It is totally messed up, HUNG UP, will not move...I am pulling the plug before it wrecks the CPU, to wit:

07:50:47] Timered checkpoint triggered.
[08:05:47] Timered checkpoint triggered.
[08:10:41] Writing local files
[08:10:41] Completed 150000 out of 500000 steps (30 percent)
[08:25:42] Timered checkpoint triggered.
[08:40:42] Timered checkpoint triggered.
[08:48:30] Writing local files
[08:48:30] Completed 155000 out of 500000 steps (31 percent)
[09:03:30] Timered checkpoint triggered.
[09:18:31] Timered checkpoint triggered.
[09:22:34] Writing local files
[09:22:34] Completed 160000 out of 500000 steps (32 percent)
[09:37:35] Timered checkpoint triggered.
[09:52:36] Timered checkpoint triggered.
[09:56:33] Writing local files
[09:56:34] Completed 165000 out of 500000 steps (33 percent)
[10:11:34] Timered checkpoint triggered.
[10:26:35] Timered checkpoint triggered.
[10:31:12] Writing local files
[10:31:13] Completed 170000 out of 500000 steps (34 percent)
[10:46:13] Timered checkpoint triggered.
[11:01:14] Timered checkpoint triggered.
[11:08:21] Writing local files
[11:08:21] Completed 175000 out of 500000 steps (35 percent)
[11:23:22] Timered checkpoint triggered.
[11:38:26] Timered checkpoint triggered.
[11:56:09] Writing local files
[11:57:21] Completed 180000 out of 500000 steps (36 percent)
[12:15:29] Timered checkpoint triggered.
[12:16:34] - Autosending finished units... [February 22 12:16:34 UTC]
[12:16:34] Trying to send all finished work units
[12:16:34] + No unsent completed units remaining.
[12:16:34] - Autosend completed
[12:32:53] Timered checkpoint triggered.
[12:51:51] Timered checkpoint triggered.
[13:09:21] Timered checkpoint triggered.
[13:29:23] Timered checkpoint triggered.
[13:46:41] Timered checkpoint triggered.
[14:03:41] Timered checkpoint triggered.
[14:23:05] Timered checkpoint triggered.
[14:41:11] Timered checkpoint triggered.
[14:58:27] Timered checkpoint triggered.
[15:15:57] Timered checkpoint triggered.
[15:39:03] Timered checkpoint triggered.
[15:57:51] Timered checkpoint triggered.
[16:16:05] Timered checkpoint triggered.
[16:34:25] Timered checkpoint triggered.
[16:53:15] Timered checkpoint triggered.
[17:11:45] Timered checkpoint triggered.
[17:29:59] Timered checkpoint triggered.
[17:47:59] Timered checkpoint triggered.
[18:06:53] Timered checkpoint triggered.
[18:16:33] - Autosending finished units... [February 22 18:16:33 UTC]
[18:16:33] Trying to send all finished work units
[18:16:33] + No unsent completed units remaining.
[18:16:33] - Autosend completed
[18:28:05] Timered checkpoint triggered.
[18:45:41] Timered checkpoint triggered.
[19:04:25] Timered checkpoint triggered.
[19:23:37] Timered checkpoint triggered.
[19:32:23] Killing all core threads
[19:32:23] Killing 2 cores
[19:32:23] Killing core 0
[19:32:23] Killing core 1

Folding@Home Client Shutdown at user request.

I don't have time right now to deal with this one, as I'm already late to the California 500 NASCAR Race and have to run, but would appreciate any suggestions as to what to do with the WU in order to start again, fresh, without this one starting up again. I've never had a bad one before on the Windows side, and have no idea what to do, what to trash, what files to deal with. :?

Thanking everyone who wants to help in advance...see you all online in the AM tomorrow, gotta run... :mrgreen:

Cheers,
rexrzer 8-)
i7 970 HexCore @ 4.3Ghz/24GB RAM; i7 920 @ 4.2Ghz/6GB RAM; Asus G73SW-3DE laptop/Core i7 2630QM @ 2.5 Ghz/16GB RAM; i7 920 @ 4.2Ghz/6GB RAM+GPU Clients: 2 EVGA GTX-560 Ti SC's-SLI+2 EVGA GTX-560 Ti SC's, all 'clocked 980/1960/2170
rexrzer
Posts: 44
Joined: Sat Dec 08, 2007 10:45 am

Re: Project: 2653 (Run 14, Clone 0, Gen. 114) Hung@ 27%

Post by rexrzer »

I got a chance to deal with the Bad WU this AM, and deleted the Work Folder's contents, and the Current WU File went in the trash and I flushed it...seemed to do the trick, pretty simple. I saw that the WU "tried' to start up again, but the SMP WU calling wasn't there, no Work Folder contents either, and it calmly downloaded a fresh WU which the notebook has been crunching very quickly, normally, everything back to GOOD STATUS now. :D

The 2653 was a either BAD WU, or it's been done before and the program has some new tricks for it, what do I know? I only reported what happened to my notebook on Saturday, and later on the next AM on Sunday, totally HUNG UP and not going anywhere as the Log shows. This notebook hasn't had much trouble since going online, has crunched 4 SMP WU's without incident, well almost...one had the classic 'Hang at Finish' thing happen, but I used QFIX to flush out the bad code line, and it reported fine after I restarted it.

That's the story with this 2653 WU. It's my opinion that it was a BAD WU, but what do I really know? I just report what happens... :e)

Cheers.
rexrzer 8-)
i7 970 HexCore @ 4.3Ghz/24GB RAM; i7 920 @ 4.2Ghz/6GB RAM; Asus G73SW-3DE laptop/Core i7 2630QM @ 2.5 Ghz/16GB RAM; i7 920 @ 4.2Ghz/6GB RAM+GPU Clients: 2 EVGA GTX-560 Ti SC's-SLI+2 EVGA GTX-560 Ti SC's, all 'clocked 980/1960/2170
Post Reply