WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Moderators: Site Moderators, FAHC Science Team

dschief
Posts: 163
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060
Location: California Wine country

WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by dschief »

This has been frozen for 4+ hours, wanted to post log before shutting down, to see what happens on a re-start.

3052 Run8 clone 51 Gen 19 9660 P3029_SMP-emsv-03

Code: Select all

[10:21:45] - Preparing to get new work unit...
[10:21:45] + Attempting to get work packet
[10:21:45] - Will indicate memory of 2013 MB
[10:21:45] - Connecting to assignment server
[10:21:45] Connecting to http://assign.stanford.edu:8080/
[10:21:45] Posted data.
[10:21:45] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[10:21:45] + News From Folding@Home: Welcome to Folding@Home
[10:21:45] Loaded queue successfully.
[10:21:45] Connecting to http://171.64.65.63:8080/
[10:21:45] Posted data.
[10:21:45] Initial: 0000; - Receiving payload (expected size: 282795)
[10:21:47] - Downloaded at ~138 kB/s
[10:21:47] - Averaged speed for that direction ~156 kB/s
[10:21:47] + Received work.
[10:21:47] Trying to send all finished work units
[10:21:47] + No unsent completed units remaining.
[10:21:47] + Closed connections
[10:21:47] 
[10:21:47] + Processing work unit
[10:21:47] Core required: FahCore_a1.exe
[10:21:47] Core found.
[10:21:47] Working on Unit 02 [April 3 10:21:47]
[10:21:47] + Working ...
[10:21:47] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 02 -priority 96 -checkpoint 15 -verbose -lifeline 31306 -version 601'

[10:21:47] 
[10:21:47] *------------------------------*
[10:21:47] Folding@Home Gromacs SMP Core
[10:21:47] Version 1.74 (November 27, 2006)
[10:21:47] 
[10:21:47] Preparing to commence simulation
[10:21:47] - Ensuring status. Please wait.
[10:21:47] - Starting from initial work packet
[10:21:47] 
[10:21:47] Project: 3052 (Run 8, Clone 41, Gen 19)
[10:21:47] 
[10:21:47] Assembly optimizations on if available.
[10:21:47] Entering M.D.
[10:22:04] ial work pa- Sta
[10:22:04] Project: 3052 (Run 8, Clone 41
[10:22:04] Project: 3Entering M.D.
[10:22:04] one 41, Gen 19)
[10:22:04] 
[10:22:04] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=1, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=2, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=3, HOSTNAME=localhost.localdomain
NODEID=3 argc=15
NODEID=2 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit http://www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

starting mdrun '9660 p3029_SMP-emsv-03'
10000000 steps,  20000.0 ps.

[10:22:11] cal files
[10:22:11] Completed 0 out of 10000000 steps  (0 percent)
[10:22:11]  SSE boost OK.
[10:37:11] int triggered.
[10:38:15] Writing local files
[10:38:15] Completed 100000 out of 10000000 steps  (1 percent)
[10:53:15] Timered checkpoint triggered.
[10:54:20] Writing local files
[10:54:20] Completed 200000 out of 10000000 steps  (2 percent)
[11:09:20] Timered checkpoint triggered.
[11:10:28] Writing local files
[11:10:28] Completed 300000 out of 10000000 steps  (3 percent)
[11:25:29] Timered checkpoint triggered.
[11:26:36] Writing local files
[11:26:36] Completed 400000 out of 10000000 steps  (4 percent)
[11:41:36] Timered checkpoint triggered.
[11:42:45] Writing local files
[11:42:45] Completed 500000 out of 10000000 steps  (5 percent)
[11:57:22] - Autosending finished units...
[11:57:22] Trying to send all finished work units
[11:57:22] + No unsent completed units remaining.
[11:57:22] - Autosend completed
[11:57:45] Timered checkpoint triggered.
[11:58:50] Writing local files
[11:58:50] Completed 600000 out of 10000000 steps  (6 percent)
[12:13:50] Timered checkpoint triggered.
[12:14:58] Writing local files
[12:14:58] Completed 700000 out of 10000000 steps  (7 percent)
[12:29:59] Timered checkpoint triggered.
[12:31:04] Writing local files
[12:31:04] Completed 800000 out of 10000000 steps  (8 percent)
[12:46:04] Timered checkpoint triggered.
[12:47:10] Writing local files
[12:47:10] Completed 900000 out of 10000000 steps  (9 percent)
[13:02:11] Timered checkpoint triggered.
[13:03:15] Writing local files
[13:03:15] Completed 1000000 out of 10000000 steps  (10 percent)
[13:18:16] Timered checkpoint triggered.
[13:19:20] Writing local files
[13:19:20] Completed 1100000 out of 10000000 steps  (11 percent)
[13:34:19] Timered checkpoint triggered.
[13:35:22] Writing local files
[13:35:22] Completed 1200000 out of 10000000 steps  (12 percent)
[13:50:22] Timered checkpoint triggered.
[13:51:26] Writing local files
[13:51:26] Completed 1300000 out of 10000000 steps  (13 percent)
[14:06:27] Timered checkpoint triggered.
[14:07:33] Writing local files
[14:07:33] Completed 1400000 out of 10000000 steps  (14 percent)
[14:22:33] Timered checkpoint triggered.
[14:23:38] Writing local files
[14:23:38] Completed 1500000 out of 10000000 steps  (15 percent)
[14:38:38] Timered checkpoint triggered.
[14:39:42] Writing local files
[14:39:42] Completed 1600000 out of 10000000 steps  (16 percent)
[14:54:42] Timered checkpoint triggered.
[14:55:48] Writing local files
[14:55:48] Completed 1700000 out of 10000000 steps  (17 percent)
[15:10:49] Timered checkpoint triggered.
[15:11:54] Writing local files
[15:11:54] Completed 1800000 out of 10000000 steps  (18 percent)
[15:26:53] Timered checkpoint triggered.
[15:28:01] Writing local files
[15:28:01] Completed 1900000 out of 10000000 steps  (19 percent)
[15:43:02] Timered checkpoint triggered.
[15:44:09] Writing local files
[15:44:09] Completed 2000000 out of 10000000 steps  (20 percent)
[15:59:08] Timered checkpoint triggered.
[16:00:17] Writing local files
[16:00:17] Completed 2100000 out of 10000000 steps  (21 percent)
[16:15:17] Timered checkpoint triggered.
[16:16:26] Writing local files
[16:16:26] Completed 2200000 out of 10000000 steps  (22 percent)
[16:31:26] Timered checkpoint triggered.
[16:32:35] Writing local files
[16:32:35] Completed 2300000 out of 10000000 steps  (23 percent)
[16:47:36] Timered checkpoint triggered.
[16:48:44] Writing local files
[16:48:45] Completed 2400000 out of 10000000 steps  (24 percent)
[17:03:44] Timered checkpoint triggered.
[17:04:52] Writing local files
[17:04:52] Completed 2500000 out of 10000000 steps  (25 percent)
[17:19:52] Timered checkpoint triggered.
[17:21:00] Writing local files
[17:21:00] Completed 2600000 out of 10000000 steps  (26 percent)
[17:36:01] Timered checkpoint triggered.
[17:37:11] Writing local files
[17:37:11] Completed 2700000 out of 10000000 steps  (27 percent)
[17:52:10] Timered checkpoint triggered.
[17:53:18] Writing local files
[17:53:18] Completed 2800000 out of 10000000 steps  (28 percent)
[17:57:25] - Autosending finished units...
[17:57:25] Trying to send all finished work units
[17:57:25] + No unsent completed units remaining.
[17:57:25] - Autosend completed
[18:08:19] Timered checkpoint triggered.
[18:09:25] Writing local files
[18:09:25] Completed 2900000 out of 10000000 steps  (29 percent)
[18:24:24] Timered checkpoint triggered.
[18:25:32] Writing local files
[18:25:32] Completed 3000000 out of 10000000 steps  (30 percent)
[18:40:33] Timered checkpoint triggered.
[18:41:38] Writing local files
[18:41:38] Completed 3100000 out of 10000000 steps  (31 percent)
[18:56:37] Timered checkpoint triggered.
[18:57:44] Writing local files
[18:57:44] Completed 3200000 out of 10000000 steps  (32 percent)
[19:12:45] Timered checkpoint triggered.
[19:13:53] Writing local files
[19:13:53] Completed 3300000 out of 10000000 steps  (33 percent)
[19:28:53] Timered checkpoint triggered.
[19:29:58] Writing local files
[19:29:58] Completed 3400000 out of 10000000 steps  (34 percent)
[19:44:58] Timered checkpoint triggered.
[19:46:02] Writing local files
[19:46:02] Completed 3500000 out of 10000000 steps  (35 percent)
[20:01:02] Timered checkpoint triggered.
[20:02:09] Writing local files
[20:02:09] Completed 3600000 out of 10000000 steps  (36 percent)
[20:17:10] Timered checkpoint triggered.
[20:18:15] Writing local files
[20:18:15] Completed 3700000 out of 10000000 steps  (37 percent)
[20:33:16] Timered checkpoint triggered.
[20:34:23] Writing local files
[20:34:23] Completed 3800000 out of 10000000 steps  (38 percent)
[20:49:23] Timered checkpoint triggered.
[20:50:30] Writing local files
[20:50:30] Completed 3900000 out of 10000000 steps  (39 percent)
[21:05:30] Timered checkpoint triggered.
[21:06:37] Writing local files
[21:06:37] Completed 4000000 out of 10000000 steps  (40 percent)
[21:21:38] Timered checkpoint triggered.
[21:22:45] Writing local files
[21:22:45] Completed 4100000 out of 10000000 steps  (41 percent)
[21:37:46] Timered checkpoint triggered.
[21:38:52] Writing local files
[21:38:52] Completed 4200000 out of 10000000 steps  (42 percent)
[21:53:52] Timered checkpoint triggered.
[21:54:57] Writing local files
[21:54:57] Completed 4300000 out of 10000000 steps  (43 percent)
[22:09:57] Timered checkpoint triggered.
[22:11:01] Writing local files
[22:11:01] Completed 4400000 out of 10000000 steps  (44 percent)
[22:26:01] Timered checkpoint triggered.
[22:27:08] Writing local files
[22:27:08] Completed 4500000 out of 10000000 steps  (45 percent)
[22:42:09] Timered checkpoint triggered.
[22:43:15] Writing local files
[22:43:15] Completed 4600000 out of 10000000 steps  (46 percent)
[22:58:15] Timered checkpoint triggered.
[22:59:24] Writing local files
[22:59:24] Completed 4700000 out of 10000000 steps  (47 percent)
[23:14:24] Timered checkpoint triggered.
[23:15:33] Writing local files
[23:15:33] Completed 4800000 out of 10000000 steps  (48 percent)
[23:30:33] Timered checkpoint triggered.
[23:31:43] Writing local files
[23:31:43] Completed 4900000 out of 10000000 steps  (49 percent)
[23:46:44] Timered checkpoint triggered.
[23:47:51] Writing local files
[23:47:51] Completed 5000000 out of 10000000 steps  (50 percent)
[23:57:29] - Autosending finished units...
[23:57:29] Trying to send all finished work units
[23:57:29] + No unsent completed units remaining.
[23:57:29] - Autosend completed
[00:02:52] Timered checkpoint triggered.
[00:03:58] Writing local files
[00:03:58] Completed 5100000 out of 10000000 steps  (51 percent)
[00:18:59] Timered checkpoint triggered.
[00:20:07] Writing local files
[00:20:07] Completed 5200000 out of 10000000 steps  (52 percent)
[00:35:08] Timered checkpoint triggered.
[00:36:15] Writing local files
[00:36:15] Completed 5300000 out of 10000000 steps  (53 percent)
[00:51:14] Timered checkpoint triggered.
[00:52:23] Writing local files
[00:52:23] Completed 5400000 out of 10000000 steps  (54 percent)
[01:07:23] Timered checkpoint triggered.
[01:08:30] Writing local files
[01:08:30] Completed 5500000 out of 10000000 steps  (55 percent)
[01:23:30] Timered checkpoint triggered.
[01:24:36] Writing local files
[01:24:36] Completed 5600000 out of 10000000 steps  (56 percent)
[01:39:37] Timered checkpoint triggered.
[01:40:43] Writing local files
[01:40:43] Completed 5700000 out of 10000000 steps  (57 percent)
[01:55:44] Timered checkpoint triggered.
[01:56:50] Writing local files
[01:56:50] Completed 5800000 out of 10000000 steps  (58 percent)
[02:11:50] Timered checkpoint triggered.
[02:12:59] Writing local files
[02:12:59] Completed 5900000 out of 10000000 steps  (59 percent)
[02:28:00] Timered checkpoint triggered.
[02:29:09] Writing local files
[02:29:09] Completed 6000000 out of 10000000 steps  (60 percent)
[02:44:10] Timered checkpoint triggered.
[02:45:17] Writing local files
[02:45:17] Completed 6100000 out of 10000000 steps  (61 percent)
[03:00:17] Timered checkpoint triggered.
[03:01:22] Writing local files
[03:01:22] Completed 6200000 out of 10000000 steps  (62 percent)
[03:16:23] Timered checkpoint triggered.
[03:17:26] Writing local files
[03:17:26] Completed 6300000 out of 10000000 steps  (63 percent)
[03:32:26] Timered checkpoint triggered.
[03:33:30] Writing local files
[03:33:31] Completed 6400000 out of 10000000 steps  (64 percent)
[03:48:30] Timered checkpoint triggered.
[03:49:37] Writing local files
[03:49:37] Completed 6500000 out of 10000000 steps  (65 percent)
[04:04:38] Timered checkpoint triggered.
[04:05:43] Writing local files
[04:05:43] Completed 6600000 out of 10000000 steps  (66 percent)
[04:20:44] Timered checkpoint triggered.
[04:21:45] Writing local files
[04:21:45] Completed 6700000 out of 10000000 steps  (67 percent)
[04:36:46] Timered checkpoint triggered.
[04:37:49] Writing local files
[04:37:49] Completed 6800000 out of 10000000 steps  (68 percent)
[04:52:50] Timered checkpoint triggered.
[04:53:53] Writing local files
[04:53:53] Completed 6900000 out of 10000000 steps  (69 percent)
[05:08:53] Timered checkpoint triggered.
[05:09:56] Writing local files
[05:09:56] Completed 7000000 out of 10000000 steps  (70 percent)
[05:24:57] Timered checkpoint triggered.
[05:25:59] Writing local files
[05:25:59] Completed 7100000 out of 10000000 steps  (71 percent)
[05:40:59] Timered checkpoint triggered.
[05:42:03] Writing local files
[05:42:03] Completed 7200000 out of 10000000 steps  (72 percent)
[05:57:04] Timered checkpoint triggered.
[05:57:31] Writing local files
[05:57:31] Completed 7300000 out of 10000000 steps  (73 percent)
[05:57:32] - Autosending finished units...
[05:57:32] Trying to send all finished work units
[05:57:32] + No unsent completed units remaining.
[05:57:32] - Autosend completed
[06:12:32] Timered checkpoint triggered.
[06:12:48] Writing local files
[06:12:48] Completed 7400000 out of 10000000 steps  (74 percent)
[06:27:48] Timered checkpoint triggered.
[06:28:12] Writing local files
[06:28:12] Completed 7500000 out of 10000000 steps  (75 percent)
[06:43:12] Timered checkpoint triggered.
[06:43:34] Writing local files
[06:43:34] Completed 7600000 out of 10000000 steps  (76 percent)
[06:58:35] Timered checkpoint triggered.
[06:58:57] Writing local files
[06:58:57] Completed 7700000 out of 10000000 steps  (77 percent)
[07:13:57] Timered checkpoint triggered.
[07:14:18] Writing local files
[07:14:18] Completed 7800000 out of 10000000 steps  (78 percent)
[07:29:18] Timered checkpoint triggered.
[07:29:41] Writing local files
[07:29:41] Completed 7900000 out of 10000000 steps  (79 percent)
[07:44:42] Timered checkpoint triggered.
[07:45:06] Writing local files
[07:45:06] Completed 8000000 out of 10000000 steps  (80 percent)
[08:00:07] Timered checkpoint triggered.
[08:00:32] Writing local files
[08:00:32] Completed 8100000 out of 10000000 steps  (81 percent)
[08:15:33] Timered checkpoint triggered.
[08:15:57] Writing local files
[08:15:57] Completed 8200000 out of 10000000 steps  (82 percent)
[08:30:58] Timered checkpoint triggered.
[08:31:24] Writing local files
[08:31:24] Completed 8300000 out of 10000000 steps  (83 percent)
[08:46:25] Timered checkpoint triggered.
[08:46:53] Writing local files
[08:46:53] Completed 8400000 out of 10000000 steps  (84 percent)
[09:01:54] Timered checkpoint triggered.
[09:02:23] Writing local files
[09:02:23] Completed 8500000 out of 10000000 steps  (85 percent)
[09:17:23] Timered checkpoint triggered.
[09:17:54] Writing local files
[09:17:54] Completed 8600000 out of 10000000 steps  (86 percent)
[09:32:54] Timered checkpoint triggered.
[09:33:22] Writing local files
[09:33:22] Completed 8700000 out of 10000000 steps  (87 percent)
[09:48:21] Timered checkpoint triggered.
[09:48:46] Writing local files
[09:48:46] Completed 8800000 out of 10000000 steps  (88 percent)
[10:03:46] Timered checkpoint triggered.
[10:04:10] Writing local files
[10:04:10] Completed 8900000 out of 10000000 steps  (89 percent)
[10:19:10] Timered checkpoint triggered.
[10:19:36] Writing local files
[10:19:36] Completed 9000000 out of 10000000 steps  (90 percent)
[10:34:36] Timered checkpoint triggered.
[10:35:02] Writing local files
[10:35:02] Completed 9100000 out of 10000000 steps  (91 percent)
[10:50:02] Timered checkpoint triggered.
[10:50:27] Writing local files
[10:50:27] Completed 9200000 out of 10000000 steps  (92 percent)
[11:05:27] Timered checkpoint triggered.
[11:05:52] Writing local files
[11:05:52] Completed 9300000 out of 10000000 steps  (93 percent)
[11:20:52] Timered checkpoint triggered.
[11:21:19] Writing local files
[11:21:19] Completed 9400000 out of 10000000 steps  (94 percent)
[11:36:19] Timered checkpoint triggered.
[11:36:46] Writing local files
[11:36:46] Completed 9500000 out of 10000000 steps  (95 percent)
[11:51:46] Timered checkpoint triggered.
[11:52:18] Writing local files
[11:52:18] Completed 9600000 out of 10000000 steps  (96 percent)
[11:57:35] - Autosending finished units...
[11:57:35] Trying to send all finished work units
[11:57:35] + No unsent completed units remaining.
[11:57:35] - Autosend completed
[12:07:18] Timered checkpoint triggered.
[12:07:46] Writing local files
[12:07:46] Completed 9700000 out of 10000000 steps  (97 percent)
[12:22:46] Timered checkpoint triggered.
[12:23:16] Writing local files
[12:23:16] Completed 9800000 out of 10000000 steps  (98 percent)
[12:38:15] Timered checkpoint triggered.
[12:38:44] Writing local files
[12:38:44] Completed 9900000 out of 10000000 steps  (99 percent)
[12:53:44] Timered checkpoint triggered.
[12:54:13] Writing local files
[12:54:13] Completed 10000000 out of 10000000 steps  (100 percent)



        M E G A - F L O P S   A C C O U N T I N G

        Parallel run - timing based on wallclock.
   RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
   T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
   NF=No Forces

 Computing:                        M-Number         M-Flops  % of Flops
-----------------------------------------------------------------------
 RF Coul                      146457.729122  4833105.061026     1.6
 RF Coul [W3]                    513.482502    50321.285196     0.0
 RF Coul + VdW(T)             264278.784627 17178121.000755     5.7
 RF Coul + VdW(T) [W3]         71899.575020  9346944.752600     3.1
 RF Coul + VdW(T) [W3-W3]     714644.534686 228686251.099520    75.8
 Outer nonbonded loop         291601.232103  2916012.321030     1.0
 1,4 nonbonded interactions    14520.001452  1306800.130680     0.4
 NS-Pairs                     395933.476239  8314603.001019     2.8
 Reset In Box                   9664.009664    86976.086976     0.0
 Shift-X                      193280.019328  1159680.115968     0.4
 CG-CoM                         3590.003590   104110.104110     0.0
 Sum Forces                   289920.028992   289920.028992     0.1
 Bonds                          2850.000285   122550.012255     0.0
 Angles                        10160.001016  1656080.165608     0.5
 Propers                        1170.000117   267930.026793     0.1
 RB-Dihedrals                  11640.001164  2875080.287508     1.0
 Virial                        97720.009772  1758960.175896     0.6
 Ext.ens. Update               96640.009664  5218560.521856     1.7
 Stop-CM                        9664.000000    96640.000000     0.0
 Calc-Ekin                     96640.019328  2609280.521856     0.9
 Shake                          6403.776131   192113.283930     0.1
 Constraint-V                  96640.009664   579840.057984     0.2
 Shake-Init                     2780.000278    27800.002780     0.0
 Constraint-Vir                93860.009386  2252640.225264     0.7
 Settle                        30360.003036  9806280.980628     3.2
-----------------------------------------------------------------------
 Total                                      301736601.250230   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:  95523.000  95523.000    100.0
                       1d02h32:03
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     12.539      3.159     18.090      1.327
[12:54:13] Writing final coordinates.
[12:54:13] Past main M.D. loop
[12:54:13] Will end MPI now
[12:55:13] 
[12:55:13] Finished Work Unit:
[12:55:13] - Reading up to 232056 from "work/wudata_02.arc": Read 232056
[12:55:13] - Reading up to 13734064 from "work/wudata_02.xtc": Read 13734064
[12:55:14] goefile size: 0
[12:55:14] logfile size: 257328
[12:55:14] Leaving Run
[12:55:16] - Writing 14622812 bytes of core data to disk...
[12:55:16]   ... Done.
[12:55:17] - Shutting down core
[12:55:17] 
[12:55:17] Folding@home Core Shutdown: FINISHED_UNIT
[17:57:39] - Autosending finished units...
[17:57:39] Trying to send all finished work units
[17:57:39] + No unsent completed units remaining.
[17:57:39] - Autosend completed

its now 22:28:47 and no connect to server, no upload, no results successfully sent . nada
Last edited by 7im on Fri Apr 04, 2008 3:54 pm, edited 2 times in total.
Reason: code blocks
butc8
Posts: 42
Joined: Wed Mar 19, 2008 3:37 pm

Re: Wu hung up after 100% completion

Post by butc8 »

Is there a wuresults_xx.dat file in your work folder?
Try running qfix

Anyway how do you get it to show the gflops etc?
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: Wu hung up after 100% completion

Post by Flathead74 »

[12:55:17] Folding@home Core Shutdown: FINISHED_UNIT
[17:57:39] - Autosending finished units...
[17:57:39] Trying to send all finished work units
[17:57:39] + No unsent completed units remaining.
[17:57:39] - Autosend completed
It looks like Autosend got in the way...

Open your "Process Monitor" and see if there are any "FahCores" still running.
If so, select one of them and "kill" it.
If there are others, they will follow.

Upload should then continue as usual.

I have had this happen many, many times and this method has always been successful.
dschief
Posts: 163
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060
Location: California Wine country

Re: Wu hung up after 100% completion

Post by dschief »

butc8 wrote:Is there a wuresults_xx.dat file in your work folder?
Try running qfix

Anyway how do you get it to show the gflops etc?
Have never worked with qfix, where do I get it and is there a version that runs under Linux

All my logs show gflops etc. don't do anything other than start the client with -verbosity 9
Last edited by dschief on Fri Apr 04, 2008 4:41 pm, edited 1 time in total.
butc8
Posts: 42
Joined: Wed Mar 19, 2008 3:37 pm

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by butc8 »

dschief
Posts: 163
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060
Location: California Wine country

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by dschief »

Found and ran qfix, which basically accomplished nothing. It printed out the contents of the queue, showing the correct wu info for location 02. Said the file was ok. and exited. I tried -send all =no joy; tried -send 02 also no joy.

re-starting client would hang and not down-load next wu. so I deleted everything and started over. now I backup and running
got the very same wu back again. Will wait and see what happens tomorrow.
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by uncle_fungus »

dschief wrote:Found and ran qfix, which basically accomplished nothing. It printed out the contents of the queue, showing the correct wu info for location 02. Said the file was ok. and exited. I tried -send all =no joy; tried -send 02 also no joy.

re-starting client would hang and not down-load next wu. so I deleted everything and started over. now I backup and running
got the very same wu back again. Will wait and see what happens tomorrow.
Hmm, that post of toTOW's wasn't meant for this situation.

What you should have done was:

* Run "qfix" - says queue is fine, ignore it
* Run "./fah --delete 02" - says delete failed, ignore it
* Run "qfix" - says its re queued the results
* Run "./fah6 -smp" - sends WU and continues as normal
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by Flathead74 »

There is apparently more than one reason for hanging at completion...

dschief, did you even bother to check to see if you still had FAhcores running?

I have seen your exact situation many times, as I said previously.
The directions that I gave, simple as they were, always work in these cases;
That is when 'Autosend interferes with the normal upload procedure of the WU.

But what would I know, I am only speaking from experience.
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by uncle_fungus »

There are multiple causes, when autosend isn't the culprit all the cores quit, and the client just sits there doing its six-hour autosend.

The fix for that is here: viewtopic.php?f=12&t=1938 (same as above, just more detailed).

In dschief's case I don't think the autosend bug is happening because there's a 5 hour gap between the unit finishing and the first autosend.
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by Flathead74 »

In dschief's case I don't think the autosend bug is happening because there's a 5 hour gap between the unit finishing and the first autosend.
That could be, but we will never know now, will we.
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by uncle_fungus »

Sadly not, no.
Ren02
Posts: 98
Joined: Tue Dec 11, 2007 1:16 am
Location: Estonia

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by Ren02 »

When qfix doesn't work then the next thing to try is the qgen. I had a similar problem where qfix said everything is ok and ./fah6 -send all still accomplished nothing. The queue.dat used by fah6 is version 6. Qgen generates a new ver5 queue.dat and in my case that worked. I doubt it is a good idea to continue using the ver5 queue.dat with v6 FAH though, it's just good enough for sending the results.
Image
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by uncle_fungus »

Ren02 wrote:When qfix doesn't work then the next thing to try is the qgen. I had a similar problem where qfix said everything is ok and ./fah6 -send all still accomplished nothing. The queue.dat used by fah6 is version 6. Qgen generates a new ver5 queue.dat and in my case that worked. I doubt it is a good idea to continue using the ver5 queue.dat with v6 FAH though, it's just good enough for sending the results.
The "delete" action in the post I mentioned gets around the need for qgen, as it removes the entry from the queue and then qfix can re-attach the orphaned results file.
Ren02
Posts: 98
Joined: Tue Dec 11, 2007 1:16 am
Location: Estonia

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by Ren02 »

I'll be damned...
Well that just proves that I didn't pay enough attention when reading this thread. :oops:
Interesting workaround, will try it next time. ;)
Image
dschief
Posts: 163
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060
Location: California Wine country

Re: WU hung after 100% - Project: 3052 (Run 8, Clone 41, Gen 19)

Post by dschief »

The problem wu and folder were backed up onto a usb stick, before deleting from the Linux box,

I re-loaded the folder, and tried the steps Uncle_fungus suggested.


Run "qfix" - says queue is fine, ignore it
* Run "./fah --delete 02" - says delete failed, ignore it { this step did not return " delete failed" }
* Run "qfix" - says its re queued the results { this step did not return " re queued }
* Run "./fah6 -smp" - sends WU and continues as normal

basically it's still hosed.

I still have the folder if any body wants it, or has other ideas to try.
Post Reply