Project: 3043 (Run 6, Clone 39, Gen 75) runs to 8%, reports long 1-4 interactions and hangs the client. Stopping and restarting the client results in WU restarting @ 8%, but quitting with a segmentation fault within a few minutes,wherepon the WU is deleted by the client. The process is then repeated with identical results. Q6600 @ 3.4, Native Linux SMP client (6.01beta2)
[09:49:16] Project: 3043 (Run 6, Clone 39, Gen 75)
[09:49:16]
[09:49:16] Assembly optimizations on if available.
[09:49:16] Entering M.D.
[09:49:22] Protein: 9684 p3029_SMP-emsv-03
[09:49:22] Writing local files
[09:49:22] Extra SSE boost OK.
[09:49:22]
[09:49:22] Extra SSE boost OK.
[09:49:22] Writing local files
[09:49:22] Completed 0 out of 10000000 steps (0 percent)
[10:00:12] Writing local files
[10:00:12] Completed 100000 out of 10000000 steps (1 percent)
[10:08:01] Writing local files
[10:08:01] Completed 200000 out of 10000000 steps (2 percent)
[10:12:45] - Autosending finished units...
[10:12:45] Trying to send all finished work units
[10:12:45] + No unsent completed units remaining.
[10:12:45] - Autosend completed
[10:17:43] Writing local files
[10:17:43] Completed 300000 out of 10000000 steps (3 percent)
[10:28:37] Writing local files
[10:28:37] Completed 400000 out of 10000000 steps (4 percent)
[10:39:29] Writing local files
[10:39:29] Completed 500000 out of 10000000 steps (5 percent)
[10:50:20] Writing local files
[10:50:20] Completed 600000 out of 10000000 steps (6 percent)
[11:01:08] Writing local files
[11:01:08] Completed 700000 out of 10000000 steps (7 percent)
[11:11:57] Writing local files
[11:11:57] Completed 800000 out of 10000000 steps (8 percent)
[11:20:52] Warning: long 1-4 interactions
[16:12:45] - Autosending finished units...
[16:12:45] Trying to send all finished work units
[16:12:45] + No unsent completed units remaining.
[16:12:45] - Autosend completed
[22:12:45] - Autosending finished units...
[22:12:45] Trying to send all finished work units
[22:12:45] + No unsent completed units remaining.
[22:12:45] - Autosend completed
[04:12:45] - Autosending finished units...
I got this WU again over the weekend. It needs to be taken out of circulation. If assigned to an unmonitored machine, it may well run doing nothing but autosend until the machine is rebooted. As it was, I lost a day and a half with the client hung.
[00:36:13] Project: 3043 (Run 0, Clone 82, Gen 6)
[00:36:13]
[00:36:13] Entering M.D.
[00:36:19]
[00:36:19] Writing local files
[00:36:19] Extra SSE boost OK.
[00:36:19] SMP-emsv-03Extra SSE boost OK.
[00:36:19]
[00:36:19] Extra SSE boost OK.
[00:36:19] Writing local files
[00:36:19] Completed 0 out of 10000000 steps (0 percent)
[00:45:40] Writing local files
[00:45:40] Completed 100000 out of 10000000 steps (1 percent)
///////////////////////////////////////////////////////
[16:21:56] Completed 10000000 out of 10000000 steps (100 percent)
[16:21:56] Writing final coordinates.
[16:21:56] Past main M.D. loop
[16:21:56] Will end MPI now
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
anandhanju wrote:Please see this topic indicating similar reports with this WU (although at a different %). Can a mod please look this up and advise?
Sure, but I'm not sure that's too helpful. The WU has been returned only once, and that was from Linux for partial credit.
Your WU (P3043 R6 C39 G75) was added to the stats database on 2007-12-22 08:45:14 for 172.67 points of credit.
We don't have access to any numbers about how many times it has been assigned. The fact that Windows Beta SMP tends to delete most WUs after an error makes it very difficult to tell what's going on in situations like this.
I'll merge the two threads since they're about the same WU.
12:02:53] Working on Unit 06 [March 10 12:02:53]
[12:02:53] + Working ...
[12:02:53]
[12:02:53] *------------------------------*
[12:02:53] Folding@Home Gromacs SMP Core
[12:02:53] Version 1.74 (March 10, 2007)
[12:02:53]
[12:02:53] Preparing to commence simulation
[12:02:53] - Ensuring status. Please wait.
[12:02:54] - Couldn't send HTTP request to server
[12:02:54] + Could not connect to Work Server (results)
[12:02:54] (171.64.65.64:8080)
[12:02:54] - Error: Could not transmit unit 05 (completed March 10) to work server.
[12:02:54] + Attempting to send results
[12:03:10] - Looking at optimizations...
[12:03:10] - Working with standard loops on this execution.
[12:03:10] Examination of work files indicates 8 consecutive improper terminations of core.
[12:03:11] - Expanded 283027 -> 1508541 (decompressed 533.0 percent)
[12:03:11]
[12:03:11] Project: 3043 (Run 6, Clone 39, Gen 75)
[12:03:11]
[12:03:11] Entering M.D.
[12:03:14] - Couldn't send HTTP request to server
[12:03:14] + Could not connect to Work Server (results)
[12:03:14] (171.64.122.76:8080)
[12:03:14] Could not transmit unit 05 to Collection server; keeping in queue.
[12:03:20] Calling FAH init
[12:03:20] Read topology
[12:03:21] (Starting from checkpoint)
[12:03:22] SSE boost OK.
[12:03:22] t
[12:03:22] Protein: 9684 p3029_SMP-emsv-03
[12:03:22] Writing local files
[12:03:22] Extra SSE boost OK.
[12:03:22] Writing local files
[12:03:22] Completed 0 out of 10000000 steps (0 percent)
[12:20:30] Writing local files
[[19:13:51] Writing local files
[19:13:51] Completed 3100000 out of 10000000 steps (31 percent)
[19:21:43] Gromacs cannot continue further.
[19:21:43] Going to send back what have done.
[19:21:43] logfile size: 106449
[19:21:43] - Writing 106985 bytes of core data to disk...
[19:21:43] ... Done.
[19:21:43] - Failed to delete work/wudata_06.xtc
[19:21:44] - Failed to delete work/wudata_06.bed
[19:21:44] - Failed to delete work/wudata_06.sas
[19:21:44] - Failed to delete work/wudata_06.goe
[19:21:44] Warning: check for stray files
[19:23:44]
[19:23:44] Folding@home Core Shutdown: EARLY_UNIT_END
[19:23:44]
[19:23:44] Folding@home Core Shutdown: EARLY_UNIT_END
[19:23:47] CoreStatus = 7B (123)
[19:23:47] Client-core communications error: ERROR 0x7b
[19:23:47] Deleting current work unit & continuing...
Last edited by Rolo71 on Thu Mar 20, 2008 5:36 pm, edited 3 times in total.
So far it appears that every WU is getting ERROR 0x7b which seems to mean very little. All we really know is that Windows cancelled the folding process, with no explanation as to why.
You can TRY stopping the folding process not long before the "Warning: long 1-4 interactions" message. Make a backup. Then resume processing. For some unexplained reason, some people have been able to pass the error and complete WUs like that one after a stop/resume.