Project: 2674 (Run 2, Clone 185, Gen 69)
Moderators: Site Moderators, FAHC Science Team
Project: 2674 (Run 2, Clone 185, Gen 69)
I got one of these about two hours ago and it's so far printed no frames. It also hasn't updated unitinfo.txt. So I have no idea if it's making any progress. I can see it chewing up CPU in 'top' though. (A normal 1920 point WU will do about one frame in 9:25 or so on this particular computer.) How can I know what the status of this work unit is?
-
- Site Moderator
- Posts: 6349
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Project: 2674 (Run 2, Clone 185, Gen 69)
Try to restart the client to see if that helps to get some log or screen output ...
Re: Project: 2674 (Run 2, Clone 185, Gen 69)
I am having a problem with this WU as well. It is running SLOW. Section of the FAHlog.txt follows below, starting with the end of the previous WU (also a 2674 but Run 1, Clone 42, Gen 86) for comparison. This system has been stable for some time now. I am going to kill and restart this to see if it makes a difference and report back.
Greg
Greg
Code: Select all
[00:13:59] Completed 225000 out of 250000 steps (90%)
[00:19:31] Completed 227500 out of 250000 steps (91%)
[00:25:03] Completed 230000 out of 250000 steps (92%)
[00:30:36] Completed 232500 out of 250000 steps (93%)
[00:36:09] Completed 235000 out of 250000 steps (94%)
[00:41:42] Completed 237500 out of 250000 steps (95%)
[00:47:14] Completed 240000 out of 250000 steps (96%)
[00:52:47] Completed 242500 out of 250000 steps (97%)
[00:58:20] Completed 245000 out of 250000 steps (98%)
[01:03:52] Completed 247500 out of 250000 steps (99%)
[01:09:24] Completed 250000 out of 250000 steps (100%)
[01:10:25]
[01:10:25] Finished Work Unit:
[01:10:25] - Reading up to 21144528 from "work/wudata_08.trr": Read 21144528
[01:10:25] trr file hash check passed.
[01:10:25] - Reading up to 4509196 from "work/wudata_08.xtc": Read 4509196
[01:10:25] xtc file hash check passed.
[01:10:25] edr file hash check passed.
[01:10:25] logfile size: 177178
[01:10:25] Leaving Run
[01:10:25] - Writing 26030806 bytes of core data to disk...
[01:10:25] ... Done.
[01:10:28] - Shutting down core
[01:10:28]
[01:10:28] Folding@home Core Shutdown: FINISHED_UNIT
[01:13:47] CoreStatus = 64 (100)
[01:13:47] Unit 8 finished with 87 percent of time to deadline remaining.
[01:13:47] Updated performance fraction: 0.869598
[01:13:47] Sending work to server
[01:13:47] + Attempting to send results
[01:13:47] - Reading file work/wuresults_08.dat from core
[01:13:47] (Read 26030806 bytes from disk)
[01:13:47] Connecting to http://171.67.108.24:8080/
[01:27:16] Posted data.
[01:27:16] Initial: 0000; - Uploaded at ~31 kB/s
[01:27:24] - Averaged speed for that direction ~31 kB/s
[01:27:24] + Results successfully sent
[01:27:24] Thank you for your contribution to Folding@Home.
[01:27:24] + Number of Units Completed: 366
[01:27:30] - Warning: Could not delete all work unit files (8): Core file absent
[01:27:30] Trying to send all finished work units
[01:27:30] + No unsent completed units remaining.
[01:27:30] - Preparing to get new work unit...
[01:27:30] + Attempting to get work packet
[01:27:30] - Will indicate memory of 1536 MB
[01:27:30] - Connecting to assignment server
[01:27:30] Connecting to http://assign.stanford.edu:8080/
[01:27:30] Posted data.
[01:27:30] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[01:27:30] + News From Folding@Home: Welcome to Folding@Home
[01:27:30] Loaded queue successfully.
[01:27:30] Connecting to http://171.67.108.24:8080/
[01:27:36] Posted data.
[01:27:36] Initial: 0000; - Receiving payload (expected size: 4846156)
[01:27:49] - Downloaded at ~364 kB/s
[01:27:49] - Averaged speed for that direction ~425 kB/s
[01:27:49] + Received work.
[01:27:49] Trying to send all finished work units
[01:27:49] + No unsent completed units remaining.
[01:27:49] + Closed connections
[01:27:49]
[01:27:49] + Processing work unit
[01:27:49] Core required: FahCore_a2.exe
[01:27:49] Core found.
[01:27:49] Working on Unit 09 [December 12 01:27:49]
[01:27:49] + Working ...
[01:27:49] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 20 -verbose -lifeline 7297 -version 602'
[01:27:49]
[01:27:49] *------------------------------*
[01:27:49] Folding@Home Gromacs SMP Core
[01:27:49] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[01:27:49]
[01:27:49] Preparing to commence simulation
[01:27:49] - Ensuring status. Please wait.
[01:27:50] Called DecompressByteArray: compressed_data_size=4845644 data_size=24004849, decompressed_data_size=24004849 diff=0
[01:27:50] - Digital signature verified
[01:27:50]
[01:27:50] Project: 2674 (Run 2, Clone 185, Gen 69)
[01:27:50]
[01:27:50] Assembly optimizations on if available.
[01:27:50] Entering M.D.
[01:28:00] Run 2, Clone 185, Gen 69)
[01:28:00]
[01:28:00] Entering M.D.
[06:09:25] - Autosending finished units...
[06:09:25] Trying to send all finished work units
[06:09:25] + No unsent completed units remaining.
[06:09:25] - Autosend completed
[06:16:16] 1%)
[11:04:23] Completed 255008 out of 12750000 steps (2%)
[12:09:25] - Autosending finished units...
[12:09:25] Trying to send all finished work units
[12:09:25] + No unsent completed units remaining.
[12:09:25] - Autosend completed
[15:52:51] Completed 382508 out of 12750000 steps (3%)
[18:09:25] - Autosending finished units...
[18:09:25] Trying to send all finished work units
[18:09:25] + No unsent completed units remaining.
[18:09:25] - Autosend completed
[20:41:11] Completed 510008 out of 12750000 steps (4%)
[00:09:25] - Autosending finished units...
[00:09:25] Trying to send all finished work units
[00:09:25] + No unsent completed units remaining.
[00:09:25] - Autosend completed
[01:29:33] Completed 637508 out of 12750000 steps (5%)
Re: Project: 2674 (Run 2, Clone 185, Gen 69)
I had already restarted it, an hour after I initially started it, and nothing happened. I let it run overnight, and woke up to this:
I'm deleting this one.
Code: Select all
[17:58:21] Project: 2674 (Run 2, Clone 185, Gen 69)
[17:58:21]
[17:58:21] Entering M.D.
[17:58:27] Will resume from checkpoint file
[17:58:28] Resuming from checkpoint
[17:58:29] Verified work/wudata_02.log
[17:58:29] Verified work/wudata_02.trr
[17:58:29] Verified work/wudata_02.xtc
[17:58:29] Verified work/wudata_02.edr
[01:59:02] Completed 127508 out of 12750000 steps (1%)
[09:55:40] Completed 255008 out of 12750000 steps (2%)
Re: Project: 2674 (Run 2, Clone 185, Gen 69)
What are your system specifications and do you see anything CPU demanding running with the FAH client?
Re: Project: 2674 (Run 2, Clone 185, Gen 69)
My restart of this WU has not made a difference, (FAHlog.txt of restart below). Nothing is running that shows above 0% on the process list for all users, except the cores. The 4 cores push total CPU utilization to close to 90%, so they are getting and using the CPU they should. The system has a Q6600 @2.88GHz, 2GB memory, on Ubuntu 8.04.
This WU will NOT meet anything close to deadline (local time is UTC-7):
issue: Thu Dec 11 18:26:47 2008; begin: Thu Dec 11 18:27:49 2008
expect: Wed Dec 31 21:47:19 2008; due: Sun Dec 14 18:27:49 2008 (3 days)
preferred: Sun Dec 14 18:27:49 2008 (3 days)
Perhaps the Pande Group should pull this WU back in house for investigation, or the researcher alerted at least.
Greg
(edit for OS)
This WU will NOT meet anything close to deadline (local time is UTC-7):
issue: Thu Dec 11 18:26:47 2008; begin: Thu Dec 11 18:27:49 2008
expect: Wed Dec 31 21:47:19 2008; due: Sun Dec 14 18:27:49 2008 (3 days)
preferred: Sun Dec 14 18:27:49 2008 (3 days)
Perhaps the Pande Group should pull this WU back in house for investigation, or the researcher alerted at least.
Greg
Code: Select all
--- Opening Log file [December 13 06:28:33]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/smpfold/foldingathome/CPU1
Executable: /home/smpfold/foldingathome/CPU1/fah6
Arguments: -smp -verbosity 9
[06:28:33] - Ask before connecting: No
[06:28:33] - User name: GTron (Team 0)
[06:28:33] - User ID: 76E5E3D439736F7C
[06:28:33] - Machine ID: 5
[06:28:33]
[06:28:33] Loaded queue successfully.
[06:28:33] - Autosending finished units...
[06:28:33] Trying to send all finished work units
[06:28:33] + No unsent completed units remaining.
[06:28:33] - Autosend completed
[06:28:33]
[06:28:33] + Processing work unit
[06:28:33] Core required: FahCore_a2.exe
[06:28:33] Core found.
[06:28:33] Working on Unit 09 [December 13 06:28:33]
[06:28:33] + Working ...
[06:28:33] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 20 -verbose -lifeline 6105 -version 602'
[06:28:33]
[06:28:33] *------------------------------*
[06:28:33] Folding@Home Gromacs SMP Core
[06:28:33] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[06:28:33]
[06:28:33] Preparing to commence simulation
[06:28:33] - Ensuring status. Please wait.
[06:28:33] Files status OK
[06:28:34] - Expanded 4845644 -> 24004849 (decompressed 495.3 percent)
[06:28:34] Called DecompressByteArray: compressed_data_size=4845644 data_size=24004849, decompressed_data_size=24004849 diff=0
[06:28:34] - Digital signature verified
[06:28:34]
[06:28:34] Project: 2674 (Run 2, Clone 185, Gen 69)
[06:28:34]
[06:28:34] Assembly optimizations on if available.
[06:28:34] Entering M.D.
[06:28:40] Will resume from checkpoint file
[06:28:44] ng M.D.
[06:28:50] Will resume from checkpoint file
[06:28:51] Resuming from checkpoint
[06:28:51] Verified work/wudata_09.log
[06:28:52] Verified work/wudata_09.trr
[06:28:52] Verified work/wudata_09.xtc
[06:28:52] Verified work/wudata_09.edr
[06:28:52] Completed 765018 out of 12750000 steps (6%)
[11:17:48] Completed 892508 out of 12750000 steps (7%)
[12:28:33] - Autosending finished units...
[12:28:33] Trying to send all finished work units
[12:28:33] + No unsent completed units remaining.
[12:28:33] - Autosend completed
[16:05:22] Completed 1020008 out of 12750000 steps (8%)
Re: Project: 2674 (Run 2, Clone 185, Gen 69)
This one has too many steps. We fixed a problem relating to this in the past, but one seems to have snuck past our checks. I'll pull and reformulate the work unit in the next day or two.