Page 1 of 1

Project: 2677 (Run 3, Clone 78, Gen 28) -- one core issue

Posted: Sun Aug 16, 2009 9:15 am
by shunter
Unit downloaded last night and only completed 4% in 6+ hours so no chance of completing within timescale permitted - I would normally complete a 2677 unit in approx 24 hours on this client pc. Have deleted it from my system but presumably still available. My extract of fahlog is below.

Can someone look into this unit and amend / delete it as it does appear faulty.
Thanks
Shunter

[23:53:59] - Digital signature verified
[23:53:59]
[23:53:59] Project: 2677 (Run 3, Clone 78, Gen 28)
[23:53:59]
[23:54:01] Assembly optimizations on if available.
[23:54:01] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=2 argc=22
NODEID=3 argc=22
NODEID=0 argc=22
NODEID=1 argc=22

:-) G R O M A C S (-:

Groningen Machine for Chemical Simulation

:-) VERSION 4.0.99_development_20090425 (-:

Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
:-) mdrun (-:
Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 65
NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp
Making 1D domain decomposition 1 x 1 x 4
starting mdrun 'IBX in water'
7250000 steps, 14500.0 ps (continuing from step 7000000, 14000.0 ps).
[23:54:33] Completed 0 out of 250000 steps (0%)
[00:36:12] - Autosending finished units...
[00:36:12] Trying to send all finished work units
[00:36:12] + No unsent completed units remaining.
[00:36:12] - Autosend completed
[01:29:26] Completed 2500 out of 250000 steps (1%)
[03:04:04] Completed 5000 out of 250000 steps (2%)
[04:38:21] Completed 7500 out of 250000 steps (3%)
[06:12:49] Completed 10000 out of 250000 steps (4%)
[06:36:13] - Autosending finished units...
[06:36:13] Trying to send all finished work units
[06:36:13] + No unsent completed units remaining.
[06:36:13] - Autosend completed
[07:47:25] Completed 12500 out of 250000 steps (5%)
[cli_0]: aborting job:

Edited title to add "one core issue" -- susato

Re: Project: 2677 (Run 3, Clone 78, Gen 28)

Posted: Sun Aug 16, 2009 12:42 pm
by parkut
Did you happen to notice what the system load was? There are scattered reports of WU's utilizing only one core, which would show this behaviour. If you still have the logfile entry ahead of this, where the WU is being downloaded, what is the compressed_data_size ?

It may be the same issue as reported here?

viewtopic.php?f=19&t=11065

viewtopic.php?f=19&t=11059

Re: Project: 2677 (Run 3, Clone 78, Gen 28)

Posted: Mon Aug 17, 2009 4:05 pm
by toTOW
I got it too ... it folds only on one core :(

Re: Project: 2677 (Run 3, Clone 78, Gen 28) -- one core issue

Posted: Wed Sep 02, 2009 4:40 am
by J T
I have it as well (Linux 6.02 client on an 8-core xeon). Four FahCore_a2.exe processes are started, and persist, but only one gets any appreciable time. The other three get around 1/10 of a percent of a cpu-core each (they each have access to almost a full core).

Any thoughts? I'd hate to just kill it, but it won't get done.
:(

Re: Project: 2677 (Run 3, Clone 78, Gen 28) -- one core issue

Posted: Wed Sep 02, 2009 4:46 am
by bruce
See the discussion here: viewtopic.php?f=19&t=11098