Page 3 of 5

Re: 7520 (R119 C4 G264) Could not get length of results file

Posted: Tue Mar 10, 2015 1:18 pm
by ThunderRd
Thanks for that, u_f.

WU´s with wrong number of steps

Posted: Wed Mar 11, 2015 12:50 pm
by folding_hoomer
Got two Projects with a wrong number of steps:
Project: 7522 (Run 0, Clone 96, Gen 50) with 25.500.000 steps:

Code: Select all

01:53:44:WU00:FS00:0xa3:Project: 7522 (Run 0, Clone 96, Gen 50)
01:53:44:WU00:FS00:0xa3:
01:53:44:WU00:FS00:0xa3:Assembly optimizations on if available.
01:53:44:WU00:FS00:0xa3:Entering M.D.
01:53:50:WU00:FS00:0xa3:Mapping NT from 8 to 8 
01:53:50:WU01:FS00:Upload 3.83%
01:53:50:WU00:FS00:0xa3:Completed 0 out of 25500000 steps  (0%)
01:53:56:WU01:FS00:Upload 30.63%
01:54:02:WU01:FS00:Upload 95.73%
01:54:10:WU01:FS00:Upload complete
01:54:10:WU01:FS00:Server responded WORK_ACK (400)
01:54:10:WU01:FS00:Final credit estimate, 2066.00 points
01:54:10:WU01:FS00:Cleaning up
******************************* Date: 2015-03-11 *******************************
04:24:00:WU00:FS00:0xa3:Completed 255000 out of 25500000 steps  (1%)
06:52:50:WU00:FS00:0xa3:Completed 510000 out of 25500000 steps  (2%)
09:21:41:WU00:FS00:0xa3:Completed 765000 out of 25500000 steps  (3%)
09:53:48:FS00:Paused
09:53:48:FS00:Shutting core down
09:53:53:WU00:FS00:0xa3:Client no longer detected. Shutting down core.
09:53:53:WU00:FS00:0xa3:
09:53:53:WU00:FS00:0xa3:Folding@home Core Shutdown: CLIENT_DIED
09:53:53:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Project: 7522 (Run 0, Clone 84, Gen 126) with 63.500.000 steps:

Code: Select all

10:48:33:WU02:FS00:0xa3:Project: 7522 (Run 0, Clone 84, Gen 126)
10:48:33:WU02:FS00:0xa3:
10:48:33:WU02:FS00:0xa3:Assembly optimizations on if available.
10:48:33:WU02:FS00:0xa3:Entering M.D.
10:48:39:WU02:FS00:0xa3:Mapping NT from 10 to 10 
10:48:39:WU01:FS00:Upload 17.04%
10:48:39:WU02:FS00:0xa3:Completed 0 out of 63500000 steps  (0%)
10:48:45:WU01:FS00:Upload 59.63%
10:49:05:WU01:FS00:Upload complete
10:49:05:WU01:FS00:Server responded WORK_ACK (400)
10:49:05:WU01:FS00:Final credit estimate, 2153.00 points
10:49:05:WU01:FS00:Cleaning up
12:04:41:FS00:Paused
12:04:41:FS00:Shutting core down
12:04:44:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
CPU-Slot deleted to dump the Projects.

Re: Various CPU projects cause client to crash

Posted: Wed Mar 11, 2015 12:57 pm
by folding_hoomer
Project: 7523 (Run 0, Clone 74, Gen 487) - infinite loop:

Code: Select all

 . . .
09:10:52:WU01:FS00:Starting
09:10:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-083852.txt'
09:10:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:10:52:WU01:FS00:Started FahCore on PID 20399
09:10:52:WU01:FS00:Core PID:20403
09:10:52:WU01:FS00:FahCore 0xa3 started
09:10:53:WU01:FS00:0xa3:
09:10:53:WU01:FS00:0xa3:*------------------------------*
09:10:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:10:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:10:53:WU01:FS00:0xa3:
09:10:53:WU01:FS00:0xa3:Preparing to commence simulation
09:10:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:11:02:WU01:FS00:0xa3:- Looking at optimizations...
09:11:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:11:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:11:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:11:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:11:02:WU01:FS00:0xa3:- Digital signature verified
09:11:02:WU01:FS00:0xa3:
09:11:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:11:02:WU01:FS00:0xa3:
09:11:02:WU01:FS00:0xa3:Entering M.D.
09:11:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:11:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:11:52:WU01:FS00:Starting
09:11:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-083952.txt'
09:11:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:11:52:WU01:FS00:Started FahCore on PID 20408
09:11:52:WU01:FS00:Core PID:20412
09:11:52:WU01:FS00:FahCore 0xa3 started
09:11:53:WU01:FS00:0xa3:
09:11:53:WU01:FS00:0xa3:*------------------------------*
09:11:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:11:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:11:53:WU01:FS00:0xa3:
09:11:53:WU01:FS00:0xa3:Preparing to commence simulation
09:11:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:12:02:WU01:FS00:0xa3:- Looking at optimizations...
09:12:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:12:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:12:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:12:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:12:02:WU01:FS00:0xa3:- Digital signature verified
09:12:02:WU01:FS00:0xa3:
09:12:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:12:02:WU01:FS00:0xa3:
09:12:02:WU01:FS00:0xa3:Entering M.D.
09:12:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:12:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:12:52:WU01:FS00:Starting
09:12:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-084052.txt'
09:12:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:12:52:WU01:FS00:Started FahCore on PID 20417
09:12:52:WU01:FS00:Core PID:20421
09:12:52:WU01:FS00:FahCore 0xa3 started
09:12:53:WU01:FS00:0xa3:
09:12:53:WU01:FS00:0xa3:*------------------------------*
09:12:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:12:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:12:53:WU01:FS00:0xa3:
09:12:53:WU01:FS00:0xa3:Preparing to commence simulation
09:12:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:13:02:WU01:FS00:0xa3:- Looking at optimizations...
09:13:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:13:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:13:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:13:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:13:02:WU01:FS00:0xa3:- Digital signature verified
09:13:02:WU01:FS00:0xa3:
09:13:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:13:02:WU01:FS00:0xa3:
09:13:02:WU01:FS00:0xa3:Entering M.D.
09:13:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:13:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:13:52:WU01:FS00:Starting
09:13:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-084152.txt'
09:13:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:13:52:WU01:FS00:Started FahCore on PID 20427
09:13:52:WU01:FS00:Core PID:20431
09:13:52:WU01:FS00:FahCore 0xa3 started
09:13:53:WU01:FS00:0xa3:
09:13:53:WU01:FS00:0xa3:*------------------------------*
09:13:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:13:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:13:53:WU01:FS00:0xa3:
09:13:53:WU01:FS00:0xa3:Preparing to commence simulation
09:13:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:14:02:WU01:FS00:0xa3:- Looking at optimizations...
09:14:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:14:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:14:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:14:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:14:02:WU01:FS00:0xa3:- Digital signature verified
09:14:02:WU01:FS00:0xa3:
09:14:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:14:02:WU01:FS00:0xa3:
09:14:02:WU01:FS00:0xa3:Entering M.D.
09:14:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:14:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:14:30:FS00:Paused
CPU-Slot deleted to dump data.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Wed Mar 11, 2015 2:13 pm
by uncle_fungus
Merged a couple of posts, renamed thread and moved topic into more appropriate category.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Wed Mar 11, 2015 6:15 pm
by kasson
Thanks--we've been fixing WU's as we can and have also corrected 30 work units with too many steps.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 5:56 am
by ThunderRd
The above-mentioned *7520 (R119 C4 G264) Could not get length of results file* is still in the pipe. Today I finished an 8831 and the box immediately downloaded the faulty 7520 yet again. If it's already been checked/fixed, then there's still something wrong with it, and the log is identical to what I posted above.

After a dozen or so attempts to run the WU again with the same failed result, the client abandoned it and got another 8831, so I'm running that now.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 12:11 pm
by autogrog
I have been getting a variation of this problem. It will complete an 8xxx wu and then spend 10-15 minutes trying to process a 7520 before finally getting a good non-75xx wu. Today it has gone through several cycles of good then the same bad 7520. It seems that this work unit family to totally rubbish and should be regenerated. the problem has been evident for weeks.
Sample log frgment follows (6.34 client on Fedora 18):

Code: Select all

[06:45:14] Project: 7520 (Run 81, Clone 6, Gen 49)
[06:45:14] 
[06:45:14] Entering M.D.
[06:45:20] CoreStatus = 0 (0)
[06:45:20] Sending work to server
[06:45:20] Project: 7520 (Run 81, Clone 6, Gen 49)
[06:45:20] - Error: Could not get length of results file work/wuresults_05.dat
[06:45:20] - Error: Could not read unit 05 file. Removing from queue.
[06:45:20] Trying to send all finished work units
[06:45:20] + No unsent completed units remaining.
[06:45:20] - Preparing to get new work unit...
[06:45:20] Cleaning up work directory
[06:45:22] + Attempting to get work packet
[06:45:22] Passkey found
[06:45:22] - Will indicate memory of 3949 MB
[06:45:22] - Connecting to assignment server
[06:45:22] Connecting to http://assign.stanford.edu:8080/
[06:45:22] Posted data.
[06:45:22] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[06:45:22] + News From Folding@Home: 
[06:45:22] Loaded queue successfully.
[06:45:22] Sent data
[06:45:22] Connecting to http://128.143.199.97:8080/
[06:45:23] Posted data.
[06:45:23] Initial: 0000; - Receiving payload (expected size: 2357744)
[06:45:24] - Downloaded at ~2302 kB/s

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 2:53 pm
by 7im
Dr. Kasson should be actively looking through the server logs to find bad work units shown to have multiple failures, and for trajectories stuck on a specific generation caused by these corrupted work units instead of waiting for us miner's canaries to die and report the failures.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 5:30 pm
by Joe_H
Persons running the version 6 client will not be sending in a failure report when it fails a WU, that is a feature that usually works in the version 7 client. That is one reason to upgrade clients to the current release.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 7:41 pm
by Gary480six
Since people running the version 6 client have no other way of reporting problem work units - here are a few
This is from a variety of systems. All are running the last SMP version 6 client and Windows 7.
I believe that all the errors are from work units received from the 128.143.199.97 server.
The same systems will run fine on other work units - and even some P7520 work units (presumably ones already 'fixed')

From February 25th to February 28th, one of these systems downloaded and crashed the same P7520 R89 C4 G425 work unit
47 Times. Mind you.. not 47 in a row it would crash a few times, complete a different work unit, then go back to this P7520.
Several times the work folder, queue.dat and unitinfo.txt were deleted.. but that P7520 kept coming back.

Code: Select all

[18:18:09] Project: 7520 (Run 89, Clone 4, Gen 425)
[18:18:09] 
[18:18:09] Entering M.D.
[18:18:15] Mapping NT from 8 to 8 
[18:18:25] CoreStatus = C0000417 (-1073740777)
[18:18:25] Client-core communications error: ERROR 0xc0000417


[19:11:12] Project: 7520 (Run 106, Clone 8, Gen 2)
[19:11:12] 
[19:11:12] Entering M.D.
[19:11:18] Mapping NT from 8 to 8 
[19:11:28] CoreStatus = C0000417 (-1073740777)
[19:11:28] Client-core communications error: ERROR 0xc0000417


[04:30:01] Project: 7520 (Run 63, Clone 6, Gen 111)
[04:30:01] 
[04:30:01] Assembly optimizations on if available.
[04:30:01] Entering M.D.
[04:30:07] Mapping NT from 4 to 4 
[04:30:30] CoreStatus = C0000417 (-1073740777)
[04:30:30] Client-core communications error: ERROR 0xc0000417


[00:52:44] Project: 7520 (Run 115, Clone 3, Gen 252)
[00:52:44] 
[00:52:44] Assembly optimizations on if available.
[00:52:44] Entering M.D.
[00:52:50] Mapping NT from 4 to 4 
[00:52:50] Gromacs cannot continue further.
[00:52:50] Going to send back what have done -- stepsTotalG=0
[00:52:50] Work fraction=0.0000 steps=0.
[00:52:54] logfile size=2225 infoLength=2225 edr=0 trr=23
[00:52:54] logfile size: 2225 info=2225 bed=0 hdr=23
[00:52:54] - Writing 2761 bytes of core data to disk...
[00:52:54] Done: 2249 -> 1160 (compressed to 51.5 percent)
[00:52:54]   ... Done.
[00:52:55] 
[00:52:55] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:52:58] CoreStatus = 7A (122)

Code: Select all

04:28:29] Project: 7520 (Run 65, Clone 6, Gen 116)
[04:28:29] 
[04:28:29] Assembly optimizations on if available.
[04:28:29] Entering M.D.
[04:28:35] Mapping NT from 4 to 4 
[04:28:35] mdrun returned 255
[04:28:35] Going to send back what have done -- stepsTotalG=0
[04:28:35] Work fraction=0.0000 steps=0.
[04:28:39] logfile size=2225 infoLength=2225 edr=0 trr=25
[04:28:39] logfile size: 2225 info=2225 bed=0 hdr=25
[04:28:39] - Writing 2763 bytes of core data to disk...
[04:28:39] Done: 2251 -> 1156 (compressed to 51.3 percent)
[04:28:39]   ... Done.
[04:28:39] 
[04:28:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
04:28:58] Project: 7520 (Run 88, Clone 6, Gen 114)
[04:28:58] 
[04:28:58] Assembly optimizations on if available.
[04:28:58] Entering M.D.
[04:29:04] Mapping NT from 4 to 4 
[04:29:04] Gromacs cannot continue further.
[04:29:04] Going to send back what have done -- stepsTotalG=500000
[04:29:04] Work fraction=0.0000 steps=500000.
[04:29:08] logfile size=6817 infoLength=6817 edr=0 trr=23
[04:29:08] logfile size: 6817 info=6817 bed=0 hdr=23
[04:29:08] - Writing 7353 bytes of core data to disk...
[04:29:08] Done: 6841 -> 2423 (compressed to 35.4 percent)
[04:29:08]   ... Done.
[04:29:08] 
[04:29:08] Folding@home Core Shutdown: EARLY_UNIT_END
[04:29:11] CoreStatus = 72 (114)


[04:29:26] Project: 7520 (Run 104, Clone 7, Gen 119)
[04:29:26] 
[04:29:26] Assembly optimizations on if available.
[04:29:26] Entering M.D.
[04:29:32] Mapping NT from 4 to 4 
[04:29:32] mdrun returned 255
[04:29:32] Going to send back what have done -- stepsTotalG=0
[04:29:32] Work fraction=0.0000 steps=0.
[04:29:36] logfile size=2225 infoLength=2225 edr=0 trr=25
[04:29:36] logfile size: 2225 info=2225 bed=0 hdr=25
[04:29:36] - Writing 2763 bytes of core data to disk...
[04:29:36] Done: 2251 -> 1157 (compressed to 51.3 percent)
[04:29:36]   ... Done.
[04:29:37] 
[04:29:37] Folding@home Core Shutdown: UNSTABLE_MACHINE
[04:29:40] CoreStatus = 7A (122)



[18:51:20] Project: 7520 (Run 64, Clone 9, Gen 1)
[18:51:20] 
[18:51:20] Assembly optimizations on if available.
[18:51:20] Entering M.D.
[18:51:26] Mapping NT from 4 to 4 
[18:51:26] mdrun returned 255
[18:51:26] Going to send back what have done -- stepsTotalG=0
[18:51:26] Work fraction=0.0000 steps=0.
[18:51:30] logfile size=2225 infoLength=2225 edr=0 trr=25
[18:51:30] logfile size: 2225 info=2225 bed=0 hdr=25
[18:51:30] - Writing 2763 bytes of core data to disk...
[18:51:30] Done: 2251 -> 1157 (compressed to 51.3 percent)
[18:51:30]   ... Done.
[18:51:30] 
[18:51:30] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:51:34] CoreStatus = 7A (122)


[18:52:29] Project: 7520 (Run 67, Clone 9, Gen 1)
[18:52:29] 
[18:52:29] Assembly optimizations on if available.
[18:52:29] Entering M.D.
[18:52:35] Mapping NT from 4 to 4 
[18:52:35] mdrun returned 255
[18:52:35] Going to send back what have done -- stepsTotalG=0
[18:52:35] Work fraction=0.0000 steps=0.
[18:52:39] logfile size=2225 infoLength=2225 edr=0 trr=25
[18:52:39] logfile size: 2225 info=2225 bed=0 hdr=25
[18:52:39] - Writing 2763 bytes of core data to disk...
[18:52:39] Done: 2251 -> 1162 (compressed to 51.6 percent)
[18:52:39]   ... Done.
[18:52:42] 
[18:52:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:52:44] CoreStatus = 7A (122)


[19:58:24] Project: 7520 (Run 41, Clone 5, Gen 200)
[19:58:24] 
[19:58:24] Assembly optimizations on if available.
[19:58:24] Entering M.D.
[19:58:30] Mapping NT from 8 to 8 
[19:58:38] CoreStatus = C0000417 (-1073740777)
[19:58:38] Client-core communications error: ERROR 0xc0000417
[19:58:38] Deleting current work unit & continuing...
[19:58:50] - Preparing to get new work unit...
[19:58:50] Cleaning up work directory



[19:17:58] Project: 7520 (Run 55, Clone 8, Gen 5)
[19:17:58] 
[19:17:58] Assembly optimizations on if available.
[19:17:58] Entering M.D.
[19:18:04] Mapping NT from 8 to 8 
[19:18:14] CoreStatus = C0000417 (-1073740777)
[19:18:14] Client-core communications error: ERROR 0xc0000417
[19:18:14] Deleting current work unit & continuing...
[19:18:26] - Preparing to get new work unit...
[19:18:26] Cleaning up work directory

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 7:45 pm
by 7im
You've been running v6 too long to remember to change the Machine ID value in the config after deleting the WU info to force a new WU in Windows. Or to delete the ID .dat file in Linux to affect the same change.

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Thu Mar 12, 2015 8:14 pm
by autogrog
Since 6.34 apparently does not report bad wu's here is a current list of failed wu's

7520 (Run 123, Clone 2, Gen 492)
7520 (Run 20, Clone 6, Gen 98)
7520 (Run 30, Clone 7, Gen 57)
7520 (Run 123, Clone 2, Gen 492)

Error: Could not get length of results file
7520 (Run 102, Clone 2, Gen 496) x many consecutive attempts
'' 1.5 hours retrying before I dumped it
7520 (Run 81, Clone 6, Gen 49) 14 minutes of retries before a different (successful) wu

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Fri Mar 13, 2015 11:00 pm
by toTOW
Joe_H wrote:Persons running the version 6 client will not be sending in a failure report when it fails a WU, that is a feature that usually works in the version 7 client. That is one reason to upgrade clients to the current release.
People with v7 client might not too if these WUs crashes their clients like mines ... :?

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Fri Mar 13, 2015 11:05 pm
by uncle_fungus
toTOW wrote: People with v7 client might not too if these WUs crashes their clients like mines ... :?
Agreed, there are actually 2 issues here:
  1. There are broken work units which show a number of different failure modes
  2. The client crashes when the core dies in one of those failure modes

Re: 75XX Project issues (crashes, too many steps etc.)

Posted: Sat Mar 14, 2015 12:14 am
by 7im
toTOW wrote:
Joe_H wrote:Persons running the version 6 client will not be sending in a failure report when it fails a WU, that is a feature that usually works in the version 7 client. That is one reason to upgrade clients to the current release.
People with v7 client might not too if these WUs crashes their clients like mines ... :?
Which is better? v6 client stuck in an endless loop crashing and downloading the same WU over and over, and that might not get noticed for a few days, or a V7 client with a crashed notification on the screen as soon as it happens?

And while the v6 client doesn't feed in to the newer server analytics like V7 does, the researcher can still dig a little and find which runs or clones are stuck at a very low generation number (like in the old days) and regenerate them to restart them. Dr. Kasson has been around long enough to know how to do that, it's just tedious. ;) Plus if you wait long enough, the WUs that fail and get dumped from a v6 client eventually get assigned to a V7 client, and so then the analytics work. It just takes longer to fix all the reported bad WUs. :?