Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Moderators: Site Moderators, FAHC Science Team

Post Reply
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Post by dschief »

got to 15 % 3 times, printed out the { long 1-4 interactions msg } then freezes, no error code. I've deleted everything and started over each time.
Now it's started the same Wu a fourth time! If it crashes again, I'll most likely leave that rig shut down.
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Re: P3062 lamda5_99sb run 4 clone 91 gen 40 repeated hang-ups

Post by Ivoshiee »

If the WU is repeatedly crashing on the very same spot then post relevant parts of the FAHlog.txt, archive the WU and dump it. After couple of attempts to send it to you again the assignment logic will send you something else instead.
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: P3062 lamda5_99sb run 4 clone 91 gen 40 repeated hang-ups

Post by tear »

Hey dschief,

My personal take is that "hang-up" and "segfault/client-core comm" are the same class of problems
[different manifestations of the same issue].

Performing stop-before-failure-and-start-again workaround is not unreasonable thing to do IMHO.


tear
One man's ceiling is another man's floor.
Image
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Re: P3062 lamda5_99sb run 4 clone 91 gen 40 repeated hang-ups

Post by dschief »

Ivoshiee wrote:If the WU is repeatedly crashing on the very same spot then post relevant parts of the FAHlog.txt, archive the WU and dump it. After couple of attempts to send it to you again the assignment logic will send you something else instead.
I doubt if there is any logic behind the assignment process, As noted in my previous post, after 3 straight failures the same Wu was down-loaded
a fourth time. that one also crashed right at 15%. And upon re-start , the same Wu was down-loaded a 5th. time.
I've shut down that box,
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Post by anandhanju »

As tear suggested, you can try shutting down the client at 12% or so, wait for a minute or two and then fire it up. This step has been observed to get around repeated failures and you should be able to continue.
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Post by dschief »

anandhanju wrote:As tear suggested, you can try shutting down the client at 12% or so, wait for a minute or two and then fire it up. This step has been observed to get around repeated failures and you should be able to continue.
I'm am aware of that trick, and in the past have been able to recover an occasional Wu. I already attempted this procedure on this Wu
and it crashed at 18%. This is just a crappy package,
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Post by ChelseaOilman »

dschief wrote:I've deleted everything and started over each time.
Specifically what have you been deleting? What OS, Windows, or Linux? Deleting everything may be why your getting the WU assigned to you more than 3 times.

If you still have the queue.dat file and the work folder you can zip them up and email them to me to try. You can delete any files from previous WUs in the work folder first. Email them to my chelseaoilman gmail account.
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Post by dschief »

ChelseaOilman wrote:
dschief wrote:I've deleted everything and started over each time.
Specifically what have you been deleting? What OS, Windows, or Linux? Deleting everything may be why your getting the WU assigned to you more than 3 times.

If you still have the queue.dat file and the work folder you can zip them up and email them to me to try. You can delete any files from previous WUs in the work folder first. Email them to my chelseaoilman gmail account.

ASUS P5K-E / Intel Q6600 2 gigs ram running fedora linux

that folder is gone, I've done a fresh down-load and install of f@h6. I've got 5 other clients to monitor besides this one, too much time has already been wasted for one Wu.
rbrandman
Pande Group Member
Posts: 22
Joined: Wed May 14, 2008 4:11 pm

Re: Project: 3062 (Run 4, Clone 91, Gen 40) repeated hang-ups

Post by rbrandman »

Thanks for your post. I have notified the researcher in charge of that project, Dan Ensign, so he can look into it.

Relly
Post Reply