Project: 2674 (Run 2, Clone 185, Gen 69)

Moderators: Site Moderators, FAHC Science Team

Post Reply
error10
Posts: 11
Joined: Sun Oct 05, 2008 2:04 pm

Project: 2674 (Run 2, Clone 185, Gen 69)

Post by error10 »

:shock: I got one of these about two hours ago and it's so far printed no frames. It also hasn't updated unitinfo.txt. So I have no idea if it's making any progress. I can see it chewing up CPU in 'top' though. (A normal 1920 point WU will do about one frame in 9:25 or so on this particular computer.) How can I know what the status of this work unit is?
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2674 (Run 2, Clone 185, Gen 69)

Post by toTOW »

Try to restart the client to see if that helps to get some log or screen output ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: Project: 2674 (Run 2, Clone 185, Gen 69)

Post by GTron »

I am having a problem with this WU as well. It is running SLOW. Section of the FAHlog.txt follows below, starting with the end of the previous WU (also a 2674 but Run 1, Clone 42, Gen 86) for comparison. This system has been stable for some time now. I am going to kill and restart this to see if it makes a difference and report back.
Greg

Code: Select all

[00:13:59] Completed 225000 out of 250000 steps  (90%)
[00:19:31] Completed 227500 out of 250000 steps  (91%)
[00:25:03] Completed 230000 out of 250000 steps  (92%)
[00:30:36] Completed 232500 out of 250000 steps  (93%)
[00:36:09] Completed 235000 out of 250000 steps  (94%)
[00:41:42] Completed 237500 out of 250000 steps  (95%)
[00:47:14] Completed 240000 out of 250000 steps  (96%)
[00:52:47] Completed 242500 out of 250000 steps  (97%)
[00:58:20] Completed 245000 out of 250000 steps  (98%)
[01:03:52] Completed 247500 out of 250000 steps  (99%)
[01:09:24] Completed 250000 out of 250000 steps  (100%)
[01:10:25] 
[01:10:25] Finished Work Unit:
[01:10:25] - Reading up to 21144528 from "work/wudata_08.trr": Read 21144528
[01:10:25] trr file hash check passed.
[01:10:25] - Reading up to 4509196 from "work/wudata_08.xtc": Read 4509196
[01:10:25] xtc file hash check passed.
[01:10:25] edr file hash check passed.
[01:10:25] logfile size: 177178
[01:10:25] Leaving Run
[01:10:25] - Writing 26030806 bytes of core data to disk...
[01:10:25]   ... Done.
[01:10:28] - Shutting down core
[01:10:28] 
[01:10:28] Folding@home Core Shutdown: FINISHED_UNIT
[01:13:47] CoreStatus = 64 (100)
[01:13:47] Unit 8 finished with 87 percent of time to deadline remaining.
[01:13:47] Updated performance fraction: 0.869598
[01:13:47] Sending work to server


[01:13:47] + Attempting to send results
[01:13:47] - Reading file work/wuresults_08.dat from core
[01:13:47]   (Read 26030806 bytes from disk)
[01:13:47] Connecting to http://171.67.108.24:8080/
[01:27:16] Posted data.
[01:27:16] Initial: 0000; - Uploaded at ~31 kB/s
[01:27:24] - Averaged speed for that direction ~31 kB/s
[01:27:24] + Results successfully sent
[01:27:24] Thank you for your contribution to Folding@Home.
[01:27:24] + Number of Units Completed: 366

[01:27:30] - Warning: Could not delete all work unit files (8): Core file absent
[01:27:30] Trying to send all finished work units
[01:27:30] + No unsent completed units remaining.
[01:27:30] - Preparing to get new work unit...
[01:27:30] + Attempting to get work packet
[01:27:30] - Will indicate memory of 1536 MB
[01:27:30] - Connecting to assignment server
[01:27:30] Connecting to http://assign.stanford.edu:8080/
[01:27:30] Posted data.
[01:27:30] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[01:27:30] + News From Folding@Home: Welcome to Folding@Home
[01:27:30] Loaded queue successfully.
[01:27:30] Connecting to http://171.67.108.24:8080/
[01:27:36] Posted data.
[01:27:36] Initial: 0000; - Receiving payload (expected size: 4846156)
[01:27:49] - Downloaded at ~364 kB/s
[01:27:49] - Averaged speed for that direction ~425 kB/s
[01:27:49] + Received work.
[01:27:49] Trying to send all finished work units
[01:27:49] + No unsent completed units remaining.
[01:27:49] + Closed connections
[01:27:49] 
[01:27:49] + Processing work unit
[01:27:49] Core required: FahCore_a2.exe
[01:27:49] Core found.
[01:27:49] Working on Unit 09 [December 12 01:27:49]
[01:27:49] + Working ...
[01:27:49] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 20 -verbose -lifeline 7297 -version 602'

[01:27:49] 
[01:27:49] *------------------------------*
[01:27:49] Folding@Home Gromacs SMP Core
[01:27:49] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[01:27:49] 
[01:27:49] Preparing to commence simulation
[01:27:49] - Ensuring status. Please wait.
[01:27:50] Called DecompressByteArray: compressed_data_size=4845644 data_size=24004849, decompressed_data_size=24004849 diff=0
[01:27:50] - Digital signature verified
[01:27:50] 
[01:27:50] Project: 2674 (Run 2, Clone 185, Gen 69)
[01:27:50] 
[01:27:50] Assembly optimizations on if available.
[01:27:50] Entering M.D.
[01:28:00] Run 2, Clone 185, Gen 69)
[01:28:00] 
[01:28:00] Entering M.D.
[06:09:25] - Autosending finished units...
[06:09:25] Trying to send all finished work units
[06:09:25] + No unsent completed units remaining.
[06:09:25] - Autosend completed
[06:16:16] 1%)
[11:04:23] Completed 255008 out of 12750000 steps  (2%)
[12:09:25] - Autosending finished units...
[12:09:25] Trying to send all finished work units
[12:09:25] + No unsent completed units remaining.
[12:09:25] - Autosend completed
[15:52:51] Completed 382508 out of 12750000 steps  (3%)
[18:09:25] - Autosending finished units...
[18:09:25] Trying to send all finished work units
[18:09:25] + No unsent completed units remaining.
[18:09:25] - Autosend completed
[20:41:11] Completed 510008 out of 12750000 steps  (4%)
[00:09:25] - Autosending finished units...
[00:09:25] Trying to send all finished work units
[00:09:25] + No unsent completed units remaining.
[00:09:25] - Autosend completed
[01:29:33] Completed 637508 out of 12750000 steps  (5%)
error10
Posts: 11
Joined: Sun Oct 05, 2008 2:04 pm

Re: Project: 2674 (Run 2, Clone 185, Gen 69)

Post by error10 »

I had already restarted it, an hour after I initially started it, and nothing happened. I let it run overnight, and woke up to this:

Code: Select all

[17:58:21] Project: 2674 (Run 2, Clone 185, Gen 69)
[17:58:21] 
[17:58:21] Entering M.D.
[17:58:27] Will resume from checkpoint file
[17:58:28] Resuming from checkpoint
[17:58:29] Verified work/wudata_02.log
[17:58:29] Verified work/wudata_02.trr
[17:58:29] Verified work/wudata_02.xtc
[17:58:29] Verified work/wudata_02.edr
[01:59:02] Completed 127508 out of 12750000 steps  (1%)
[09:55:40] Completed 255008 out of 12750000 steps  (2%)
I'm deleting this one.
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Re: Project: 2674 (Run 2, Clone 185, Gen 69)

Post by Ivoshiee »

What are your system specifications and do you see anything CPU demanding running with the FAH client?
GTron
Posts: 53
Joined: Wed Dec 05, 2007 3:47 pm
Location: Denver area, Colorado

Re: Project: 2674 (Run 2, Clone 185, Gen 69)

Post by GTron »

My restart of this WU has not made a difference, (FAHlog.txt of restart below). Nothing is running that shows above 0% on the process list for all users, except the cores. The 4 cores push total CPU utilization to close to 90%, so they are getting and using the CPU they should. The system has a Q6600 @2.88GHz, 2GB memory, on Ubuntu 8.04.

This WU will NOT meet anything close to deadline (local time is UTC-7):
issue: Thu Dec 11 18:26:47 2008; begin: Thu Dec 11 18:27:49 2008
expect: Wed Dec 31 21:47:19 2008; due: Sun Dec 14 18:27:49 2008 (3 days)
preferred: Sun Dec 14 18:27:49 2008 (3 days)

Perhaps the Pande Group should pull this WU back in house for investigation, or the researcher alerted at least.

Greg

Code: Select all

--- Opening Log file [December 13 06:28:33] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/smpfold/foldingathome/CPU1
Executable: /home/smpfold/foldingathome/CPU1/fah6
Arguments: -smp -verbosity 9 

[06:28:33] - Ask before connecting: No
[06:28:33] - User name: GTron (Team 0)
[06:28:33] - User ID: 76E5E3D439736F7C
[06:28:33] - Machine ID: 5
[06:28:33] 
[06:28:33] Loaded queue successfully.
[06:28:33] - Autosending finished units...
[06:28:33] Trying to send all finished work units
[06:28:33] + No unsent completed units remaining.
[06:28:33] - Autosend completed
[06:28:33] 
[06:28:33] + Processing work unit
[06:28:33] Core required: FahCore_a2.exe
[06:28:33] Core found.
[06:28:33] Working on Unit 09 [December 13 06:28:33]
[06:28:33] + Working ...
[06:28:33] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 20 -verbose -lifeline 6105 -version 602'

[06:28:33] 
[06:28:33] *------------------------------*
[06:28:33] Folding@Home Gromacs SMP Core
[06:28:33] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[06:28:33] 
[06:28:33] Preparing to commence simulation
[06:28:33] - Ensuring status. Please wait.
[06:28:33] Files status OK
[06:28:34] - Expanded 4845644 -> 24004849 (decompressed 495.3 percent)
[06:28:34] Called DecompressByteArray: compressed_data_size=4845644 data_size=24004849, decompressed_data_size=24004849 diff=0
[06:28:34] - Digital signature verified
[06:28:34] 
[06:28:34] Project: 2674 (Run 2, Clone 185, Gen 69)
[06:28:34] 
[06:28:34] Assembly optimizations on if available.
[06:28:34] Entering M.D.
[06:28:40] Will resume from checkpoint file
[06:28:44] ng M.D.
[06:28:50] Will resume from checkpoint file
[06:28:51] Resuming from checkpoint
[06:28:51] Verified work/wudata_09.log
[06:28:52] Verified work/wudata_09.trr
[06:28:52] Verified work/wudata_09.xtc
[06:28:52] Verified work/wudata_09.edr
[06:28:52] Completed 765018 out of 12750000 steps  (6%)
[11:17:48] Completed 892508 out of 12750000 steps  (7%)
[12:28:33] - Autosending finished units...
[12:28:33] Trying to send all finished work units
[12:28:33] + No unsent completed units remaining.
[12:28:33] - Autosend completed
[16:05:22] Completed 1020008 out of 12750000 steps  (8%)
(edit for OS)
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2674 (Run 2, Clone 185, Gen 69)

Post by kasson »

This one has too many steps. We fixed a problem relating to this in the past, but one seems to have snuck past our checks. I'll pull and reformulate the work unit in the next day or two.
Post Reply