Page 1 of 1

Project: 6892 (Run 150, Clone 15, Gen 40)

Posted: Wed Dec 14, 2011 12:15 am
by Napoleon
Could unit 00 be a bad WU? I was running some fairly heavy duty stuff of my own, when suddenly FAHControl (v7.1.38) simply froze, and I ended up killing the FAHControl process tree in Task Manager. The last thing registered in this log is connection attempt to a WS. BTW, what does "FahCore, running Unit 00, returned: FAILED_2 (1 = 0x1)" mean?

Code: Select all

13:50:53:Unit 00:Writing local files
13:50:54:Unit 00:Completed 235000 out of 250000 steps  (94%)
13:51:26:Unit 03:Completed  25500000 out of 50000000 steps (51%).
13:54:32:Unit 01:Completed   7950000 out of 15000000 steps (53%).
13:56:13:Unit 03:Completed  26000000 out of 50000000 steps (52%).
13:57:13:Unit 02:Timered checkpoint triggered.
14:01:00:Unit 03:Completed  26500000 out of 50000000 steps (53%).
14:01:44:Unit 01:Completed   8100000 out of 15000000 steps (54%).
14:05:06:Unit 02:Writing local files
14:05:06:Unit 02:Completed 77500 out of 250000 steps  (31%)
14:05:46:Unit 03:Completed  27000000 out of 50000000 steps (54%).
14:09:07:Unit 01:Completed   8250000 out of 15000000 steps (55%).


14:10:33:Unit 03:Completed  27500000 out of 50000000 steps (55%).
14:15:20:Unit 03:Completed  28000000 out of 50000000 steps (56%).
14:16:06:Unit 01:Completed   8400000 out of 15000000 steps (56%).
14:20:06:Unit 03:Completed  28500000 out of 50000000 steps (57%).
14:20:58:Unit 00:Timered checkpoint triggered.
14:23:19:Unit 01:Completed   8550000 out of 15000000 steps (57%).
14:24:53:Unit 03:Completed  29000000 out of 50000000 steps (58%).
14:29:39:Unit 03:Completed  29500000 out of 50000000 steps (59%).
14:30:18:Unit 01:Completed   8700000 out of 15000000 steps (58%).
14:34:26:Unit 03:Completed  30000000 out of 50000000 steps (60%).
14:35:07:Unit 02:Timered checkpoint triggered.
14:37:31:Unit 01:Completed   8850000 out of 15000000 steps (59%).
14:39:13:Unit 03:Completed  30500000 out of 50000000 steps (61%).
14:41:40:Unit 00:Writing local files
14:41:41:Unit 00:Completed 237500 out of 250000 steps  (95%)
14:43:59:Unit 03:Completed  31000000 out of 50000000 steps (62%).
14:44:31:Unit 01:Completed   9000000 out of 15000000 steps (60%).
14:48:46:Unit 03:Completed  31500000 out of 50000000 steps (63%).
14:51:47:Unit 01:Completed   9150000 out of 15000000 steps (61%).
14:53:33:Unit 03:Completed  32000000 out of 50000000 steps (64%).
14:58:19:Unit 03:Completed  32500000 out of 50000000 steps (65%).
14:58:53:Unit 01:Completed   9300000 out of 15000000 steps (62%).
15:02:45:FahCore, running Unit 00, returned: FAILED_2 (1 = 0x1)
15:02:46:Sending unit results: id:00 state:SEND error:FAILED project:6892 run:150 clone:15 gen:40 core:0x78 unit:0x000000286652edc54e25ca2f38b5c6d4
15:02:46:Connecting to 171.67.108.53:8080
The story continues when I restarted FAHControl a bit later. On a side note, I've just recently tried my luck with ION1 folding once again, after I noticed that the troublesome (for ION1, that is) 660x projects no longer appear in psummary. I don't think retrying ION folding has anything to do with this new problem/anomaly, but I figured to mention it, just in case.

Code: Select all

15:10:44:Trying to access database...
15:10:44:Successfully acquired database lock
15:10:44:Enabled folding slot 00: PAUSED gpu:1:"GF108 [GeForce GT 430]"
15:10:44:Enabled folding slot 01: PAUSED uniprocessor
15:10:44:Enabled folding slot 02: PAUSED uniprocessor
15:10:44:Enabled folding slot 03: PAUSED uniprocessor
15:10:44:Enabled folding slot 04: PAUSED uniprocessor
15:10:44:Enabled folding slot 05: PAUSED smp:4
15:10:44:Enabled folding slot 06: PAUSED gpu:0:"ION VGA"
15:10:44:Started thread 1 on PID 4052
15:10:44:Started thread 3 on PID 4052
15:10:44:Downloading project 5765 description
15:10:44:Started thread 5 on PID 4052
15:10:44:Started thread 6 on PID 4052
15:10:44:Started thread 4 on PID 4052
15:10:44:Connecting to fah-web.stanford.edu:80
15:10:44:Sending unit results: id:00 state:SEND error:FAILED project:6892 run:150 clone:15 gen:40 core:0x78 unit:0x000000286652edc54e25ca2f38b5c6d4
15:10:44:Connecting to 171.67.108.53:8080
15:10:45:Server responded GOT_ALREADY (434)
15:10:45:WARNING: Server did not like results, dumping
15:10:45:Cleaning up Unit 00
Apparently the WS (171.67.108.53) got something back from me, even though there's a gap in logging. Most likely because I did "End process tree" to FAHControl.exe, I suppose I should've settled for "End Process" only. Anyway, even after restarting FAHControl, my system has been acting kind of funny - it can get horribly unresponsive occasionally - so I can't say for sure if this was a bad WU causing an anomaly. May be that my own stuff was causing some weird instability, manifesting itself as a FahCore crash.

Yes, my CPU was OC'd from 1.6GHz to 2.1GHz, but it has been rock stable for almost a year (more info in sig link). In any case, it's time to install some OS updates right now, so I'll be watching closely if said updates and a reboot fixes the problem(s). Nevertheless, I'd appreciate if someone could follow up this particular WU and let me know if it really is a bad one. TIA. :e)


Mod Note: Edited PRCG format in title to make the continued mod database checks easier for us. ~sorto'

Project: 6892 (Run 150, Clone 15, Gen 40)

Re: project:6892 run:150 clone:15 gen:40

Posted: Wed Dec 14, 2011 10:37 am
by PantherX
Nothing in the WU Database yet so I have marked it for a followup.

Re: project:6892 run:150 clone:15 gen:40

Posted: Tue Jan 24, 2012 3:30 pm
by Napoleon
Any news on this WU yet?

Re: project:6892 run:150 clone:15 gen:40

Posted: Tue Jan 24, 2012 4:58 pm
by PantherX
Sorry, there's nothing so far:
No data back from query
:(

Mod followup 2-7-2012: Still nothing.
Mod followup 2-12-2012 Still nothing.
Mod followup 2-22-2012 Still nothing.
Marked bad 3-4-2012 as there is still nothing in the database.
~sorto'