Page 1 of 1
Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups
Posted: Sat May 31, 2008 3:17 pm
by dschief
got to 15 % 3 times, printed out the { long 1-4 interactions msg } then freezes, no error code. I've deleted everything and started over each time.
Now it's started the same Wu a fourth time! If it crashes again, I'll most likely leave that rig shut down.
Re: P3062 lamda5_99sb run 4 clone 91 gen 40 repeated hang-ups
Posted: Sat May 31, 2008 4:07 pm
by Ivoshiee
If the WU is repeatedly crashing on the very same spot then post relevant parts of the FAHlog.txt, archive the WU and dump it. After couple of attempts to send it to you again the assignment logic will send you something else instead.
Re: P3062 lamda5_99sb run 4 clone 91 gen 40 repeated hang-ups
Posted: Sat May 31, 2008 4:49 pm
by tear
Hey dschief,
My personal take is that "hang-up" and "segfault/client-core comm" are the same class of problems
[different manifestations of the same issue].
Performing stop-before-failure-and-start-again workaround is not unreasonable thing to do IMHO.
tear
Re: P3062 lamda5_99sb run 4 clone 91 gen 40 repeated hang-ups
Posted: Sun Jun 01, 2008 3:03 am
by dschief
Ivoshiee wrote:If the WU is repeatedly crashing on the very same spot then post relevant parts of the FAHlog.txt, archive the WU and dump it. After couple of attempts to send it to you again the assignment logic will send you something else instead.
I doubt if there is any logic behind the assignment process, As noted in my previous post, after 3 straight failures the same Wu was down-loaded
a fourth time. that one also crashed right at 15%. And upon re-start , the same Wu was down-loaded a 5th. time.
I've shut down that box,
Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups
Posted: Sun Jun 01, 2008 5:11 am
by anandhanju
As tear suggested, you can try shutting down the client at 12% or so, wait for a minute or two and then fire it up. This step has been observed to get around repeated failures and you should be able to continue.
Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups
Posted: Sun Jun 01, 2008 2:06 pm
by dschief
anandhanju wrote:As tear suggested, you can try shutting down the client at 12% or so, wait for a minute or two and then fire it up. This step has been observed to get around repeated failures and you should be able to continue.
I'm am aware of that trick, and in the past have been able to recover an occasional Wu. I already attempted this procedure on this Wu
and it crashed at 18%. This is just a crappy package,
Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups
Posted: Sun Jun 01, 2008 3:14 pm
by ChelseaOilman
dschief wrote:I've deleted everything and started over each time.
Specifically what have you been deleting? What OS, Windows, or Linux? Deleting everything may be why your getting the WU assigned to you more than 3 times.
If you still have the queue.dat file and the work folder you can zip them up and email them to me to try. You can delete any files from previous WUs in the work folder first. Email them to my chelseaoilman gmail account.
Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups
Posted: Sun Jun 01, 2008 3:43 pm
by dschief
ChelseaOilman wrote:dschief wrote:I've deleted everything and started over each time.
Specifically what have you been deleting? What OS, Windows, or Linux? Deleting everything may be why your getting the WU assigned to you more than 3 times.
If you still have the queue.dat file and the work folder you can zip them up and email them to me to try. You can delete any files from previous WUs in the work folder first. Email them to my chelseaoilman gmail account.
ASUS P5K-E / Intel Q6600 2 gigs ram running fedora linux
that folder is gone, I've done a fresh down-load and install of f@h6. I've got 5 other clients to monitor besides this one, too much time has already been wasted for one Wu.
Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups
Posted: Sun Jun 01, 2008 4:01 pm
by rbrandman
Thanks for your post. I have notified the researcher in charge of that project, Dan Ensign, so he can look into it.
Relly