Page 1 of 1

Project: 7611 (Run 2, Clone 39, Gen 319) runs very slowly

Posted: Tue May 28, 2013 4:39 pm
by DrSpalding
I am seeing this WU be very slow with a TPF of around 1:06:00 (1h+) where other WUs from this project have been running at an average of 0:7:26 on that machine. I don't believe anything else is interfering with it, but would reboot the machine to see if it helps, but I don't think it would. Here is the log file in the relevant area:

Code: Select all

11:39:19:WARNING:WU01:FS00:Failed to get assignment from 'assign3.stanford.edu:8080': Failed to connect to assign3.stanford.edu:8080: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
11:39:19:WU01:FS00:Connecting to assign4.stanford.edu:80
11:39:19:WU01:FS00:News: Welcome to Folding@Home
11:39:19:WU01:FS00:Assigned to work server 171.64.65.104
11:39:19:WU01:FS00:Requesting new work unit for slot 00: READY smp:8 from 171.64.65.104
11:39:19:WU01:FS00:Connecting to 171.64.65.104:8080
11:39:19:WU01:FS00:Downloading 29.66KiB
11:39:19:WU01:FS00:Download complete
11:39:19:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:7611 run:2 clone:39 gen:319 core:0xa4 unit:0x000001b2664f2dd04df0f55b8d304537
11:39:20:WU01:FS00:Starting
11:39:20:WU01:FS00:Running FahCore: c:\\FAH\\Program/FAHCoreWrapper.exe C:/FAH/Data/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 702 -lifeline 2036 -checkpoint 15 -np 8 -service
11:39:20:WU00:FS00:Upload 46.84%
11:39:20:WU01:FS00:Started FahCore on PID 5184
11:39:20:WU01:FS00:Core PID:2712
11:39:20:WU01:FS00:FahCore 0xa4 started
11:39:20:WU01:FS00:Downloading project 7611 description
11:39:20:WU01:FS00:Connecting to fah-web.stanford.edu:80
11:39:20:WU01:FS00:Project 7611 description downloaded successfully
11:39:20:WU01:FS00:0xa4:
11:39:20:WU01:FS00:0xa4:*------------------------------*
11:39:20:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
11:39:20:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
11:39:20:WU01:FS00:0xa4:
11:39:20:WU01:FS00:0xa4:Preparing to commence simulation
11:39:20:WU01:FS00:0xa4:- Looking at optimizations...
11:39:20:WU01:FS00:0xa4:- Created dyn
11:39:20:WU01:FS00:0xa4:- Files status OK
11:39:20:WU01:FS00:0xa4:- Expanded 29859 -> 644556 (decompressed 2158.6 percent)
11:39:20:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29859 data_size=644556, decompressed_data_size=644556 diff=0
11:39:20:WU01:FS00:0xa4:- Digital signature verified
11:39:20:WU01:FS00:0xa4:
11:39:20:WU01:FS00:0xa4:Project: 7611 (Run 2, Clone 39, Gen 319)
11:39:20:WU01:FS00:0xa4:
11:39:20:WU01:FS00:0xa4:Assembly optimizations on if available.
11:39:20:WU01:FS00:0xa4:Entering M.D.
11:39:26:WU00:FS00:Upload 100.00%
11:39:26:WU01:FS00:0xa4:Mapping NT from 8 to 8 
11:39:26:WU00:FS00:Upload complete
11:39:26:WU00:FS00:Server responded WORK_ACK (400)
11:39:26:WU00:FS00:Final credit estimate, 2008.00 points
11:39:26:WU00:FS00:Cleaning up
11:39:27:WU01:FS00:0xa4:Completed 0 out of 2000000 steps  (0%)
12:45:50:WU01:FS00:0xa4:Completed 20000 out of 2000000 steps  (1%)
13:52:10:WU01:FS00:0xa4:Completed 40000 out of 2000000 steps  (2%)
14:05:47:Server connection id=30 on 0.0.0.0:36330 from 192.168.1.21
14:05:54:Server connection id=29 ended
******************************** Date: 28/05/13 ********************************
14:58:34:WU01:FS00:0xa4:Completed 60000 out of 2000000 steps  (3%)
16:04:56:WU01:FS00:0xa4:Completed 80000 out of 2000000 steps  (4%)
16:11:51:Server connection id=31 on 0.0.0.0:36330 from 127.0.0.1
16:17:36:FS00:Paused
16:17:36:FS00:Shutting core down
16:17:36:WARNING:FS00:FahCore not accepting gentle shutdown, killing
16:17:36:WARNING:FS00:Killing WU01
16:17:36:WU01:FS00:FahCore terminated
16:17:45:FS00:Unpaused
16:17:45:WU01:FS00:Starting
16:17:46:WU01:FS00:Running FahCore: c:\\FAH\\Program/FAHCoreWrapper.exe C:/FAH/Data/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 702 -lifeline 2036 -checkpoint 15 -np 8 -service
16:17:46:WU01:FS00:Started FahCore on PID 5796
16:17:46:WU01:FS00:Core PID:2424
16:17:46:WU01:FS00:FahCore 0xa4 started
16:17:46:WU01:FS00:0xa4:
16:17:46:WU01:FS00:0xa4:*------------------------------*
16:17:46:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
16:17:46:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
16:17:46:WU01:FS00:0xa4:
16:17:46:WU01:FS00:0xa4:Preparing to commence simulation
16:17:46:WU01:FS00:0xa4:- Ensuring status. Please wait.
16:17:55:WU01:FS00:0xa4:- Looking at optimizations...
16:17:55:WU01:FS00:0xa4:- Working with standard loops on this execution.
16:17:55:WU01:FS00:0xa4:- Previous termination of core was improper.
16:17:55:WU01:FS00:0xa4:- Files status OK
16:17:55:WU01:FS00:0xa4:- Expanded 29859 -> 644556 (decompressed 2158.6 percent)
16:17:55:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29859 data_size=644556, decompressed_data_size=644556 diff=0
16:17:55:WU01:FS00:0xa4:- Digital signature verified
16:17:55:WU01:FS00:0xa4:
16:17:55:WU01:FS00:0xa4:Project: 7611 (Run 2, Clone 39, Gen 319)
16:17:55:WU01:FS00:0xa4:
16:17:55:WU01:FS00:0xa4:Entering M.D.
16:18:01:WU01:FS00:0xa4:Using Gromacs checkpoints
16:18:01:WU01:FS00:0xa4:Mapping NT from 8 to 8 
16:18:01:WU01:FS00:0xa4:Resuming from checkpoint
16:18:01:WU01:FS00:0xa4:Verified 01/wudata_01.log
16:18:01:WU01:FS00:0xa4:Verified 01/wudata_01.trr
16:18:01:WU01:FS00:0xa4:Verified 01/wudata_01.xtc
16:18:01:WU01:FS00:0xa4:Verified 01/wudata_01.edr
16:18:02:WU01:FS00:0xa4:Completed 81370 out of 2000000 steps  (4%)

Re: Project: 7611 (Run 2, Clone 39, Gen 319) runs very slowl

Posted: Tue May 28, 2013 6:03 pm
by Napoleon
Probably one of these: Project 7611 (0, 23, 259) - High TPF
tjlane wrote:I am sorry for the re-occurence of this issue. Unfortunately I can't always guarantee this issue won't crop up, but I've done my best to mitigate it. The root cause is a bug in the A4 core, and I've reported it to the correct people. I think that because this issue occurs in >.1% of WUs, it hasn't been a priority for our dev team, which is usually swamped.

This issue will be resolved eventually, but in the mean time please feel free to dump these WUs. Not only are the points bad, but when this issue occurs the returned WU is meaningless. Therefore it's beneficial for the science (and the donor) to dump the WU.

I do apologize and will bug the core dev team again. Please let me know if there are any further questions or concerns.
EDIT: perhaps the report above should be appended to the existing topic I linked to.