Project: 2662 (Run 2, Clone 5, Gen 20)

Moderators: Site Moderators, FAHC Science Team

Post Reply
TFarchive
Posts: 3
Joined: Mon Jun 16, 2008 2:43 pm
Location: Ottawa, Canada
Contact:

Project: 2662 (Run 2, Clone 5, Gen 20)

Post by TFarchive »

Hi, I had this WU running in my Ubuntu VM on a quad using 2 cores and it was taking forever to progress and fahmon reported 138PPD, which is way below normal for A2 WU's, it was going to miss the deadline by at least a week so I tried rebooting the VM and resuming but it was still running slow so I nuked it and got a new WU and so far it is running normally.

Could someone check this WU, something is wrong with it.

Thanks

Here is the logfile:

Code: Select all

[18:28:33] - Ask before connecting: No
[18:28:33] - User name: TFArchive (Team 80856)
[18:28:33] - User ID: F2460D103A2149C
[18:28:33] - Machine ID: 1
[18:28:33] 
[18:28:33] Loaded queue successfully.
[18:28:33] + Processing work unit
[18:28:33] Core required: FahCore_a2.exe
[18:28:33] Core found.
[18:28:33] Working on Unit 06 [August 23 18:28:33]
[18:28:33] + Working ...
[18:28:33] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 06 -checkpoint 15 -verbose -lifeline 5335 -version 602'

[18:28:34] 
[18:28:34] *------------------------------*
[18:28:34] Folding@Home Gromacs SMP Core
[18:28:34] Version 2.00 (Wed Jul 9 13:11:25 PDT 2008)
[18:28:34] 
[18:28:34] Preparing to commence simulation
[18:28:34] - Ensuring status. Please wait.
[18:28:34] Files status OK
[18:28:35] - Expanded 4923482 -> 24360573 (decompressed 494.7 percent)
[18:28:36] Called DecompressByteArray: compressed_data_size=4923482 data_size=24360573, decompressed_data_size=24360573 diff=0
[18:28:36] - Digital signature verified
[18:28:36] 
[18:28:36] Project: 2662 (Run 2, Clone 5, Gen 20)
[18:28:36] 
[18:28:36] Assembly optimizations on if available.
[18:28:36] Entering M.D.
[18:28:42] Will resume from checkpoint file
[18:28:57] Resuming from checkpoint
[18:28:57] fcSaveRestoreState: I/O failed dir=0, var=0000000001EC2D00, varsize=592836
[18:28:57] Verified work/wudata_06.log
[18:28:58] Verified work/wudata_06.trr
[18:28:58] Verified work/wudata_06.xtc
[18:28:59] Verified work/wudata_06.edr
[18:28:59] Completed 190020 out of 4750001 steps  (4%)
[21:54:47] Completed 237510 out of 4750001 steps  (5%)
[01:20:39] Completed 285010 out of 4750001 steps  (6%)
[04:46:09] Completed 332510 out of 4750001 steps  (7%)
[08:11:35] Completed 380010 out of 4750001 steps  (8%)
[11:36:54] Completed 427510 out of 4750001 steps  (9%)
[14:26:34] ***** Got an Activate signal (2)
[14:26:34] Killing all core threads
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2662 (Run 2, Clone 5, Gen 20)

Post by kasson »

Yes--something's definitely wrong with it: the work unit has too many steps. We saw this with one other work unit and are looking into it. Thanks for the report.
Post Reply