Project 781 -- FahCore_a0.exe ERRORs -- Deletes WUs
Posted: Mon Sep 01, 2008 3:52 am
I'm not sure what's going on here, but I have yet to successfully submit a work unit on my Ubuntu 8.04.1 machine.
To the point - pasted below the asterisks (and below the config stuff) are the errors I've been getting on every single work unit on the Gromacs 3.3 core (FahCore_a0.exe). I am using the latest available Linux client. There were a few errors before the work units pasted below (the logs don't go back that far). This is my second attempt on this computer to get fah6 working properly -- I had completely erased all the folding stuff (I don't have those logs either) and started from scratch.
As I recall, there were at least two previous instances of "ERROR 0x0" whatever that is, plus one or two with different hex codes. And it seems I'm only getting assigned core_a0 work units, none of which will finish...
I've checked the file permissions and the user running the client has full permissions on the containing directory and everything below it. I don't see anything at the OS level that would cause trouble for the client. There are GB's free on that partition, and RAM use hovers around 50% (I have 1GB, plus 1.4GB swap).
I have yet to successfully submit *any* results with this computer, and it's been running now for about three weeks. I'd really like to have this machine contribute to the project. Here's some details about my configuration:
CONFIG STUFF
And here are pastes from the log file for two of the errors (the only ones I still have in the logs)
Unit 02 - August 26
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Unit 03 - August 30
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Like I said, I've had at least two prior cases of "ERROR 0x0" and another one or two with different hex codes, and all are with FahCore_a0.exe. Any guesses whether these are just flukes? Is anyone else having these "results?"
Right now I'm running this from xterm so I can monitor the goings on of it all. CTRL-C seems the best way to end the task before shutting down / rebooting. That doesn't seem to pose any problems.
I hope to have this resolved because, as you see in the logs, this is a HUGE work unit, and takes DAYS before it crashes / restarts.
Thanks,
-- Nate
To the point - pasted below the asterisks (and below the config stuff) are the errors I've been getting on every single work unit on the Gromacs 3.3 core (FahCore_a0.exe). I am using the latest available Linux client. There were a few errors before the work units pasted below (the logs don't go back that far). This is my second attempt on this computer to get fah6 working properly -- I had completely erased all the folding stuff (I don't have those logs either) and started from scratch.
As I recall, there were at least two previous instances of "ERROR 0x0" whatever that is, plus one or two with different hex codes. And it seems I'm only getting assigned core_a0 work units, none of which will finish...
I've checked the file permissions and the user running the client has full permissions on the containing directory and everything below it. I don't see anything at the OS level that would cause trouble for the client. There are GB's free on that partition, and RAM use hovers around 50% (I have 1GB, plus 1.4GB swap).
I have yet to successfully submit *any* results with this computer, and it's been running now for about three weeks. I'd really like to have this machine contribute to the project. Here's some details about my configuration:
CONFIG STUFF
Code: Select all
nate@Redtail:/usr/local/folding> cat client.cfg
[settings]
username=Arfyness
team=45104
passkey=<<REMOVED>>
asknet=no
machineid=1
local=10
[http]
active=no
host=localhost
port=8080
[clienttype]
memory=200
type=3
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
nate@Redtail:/usr/local/folding> uname -a
Linux Redtail 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
nate@Redtail:/usr/local/folding> ./fah6 -queueinfo
[--- SNIP --- ]
CURRENT QUEUE:
00 EMPTY
01 EMPTY
02 EMPTY
03 EMPTY
04 *READY a0 171.64.122.138:8080 August 30 12:54 | January 31 12:54
[ P781R0C83F2 ]
05 EMPTY
06 EMPTY
07 EMPTY
08 EMPTY
09 EMPTY
Folding@Home Client Shutdown.
Unit 02 - August 26
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Code: Select all
--- Opening Log file [August 25 16:58:05]
< --- SNIP --- >
[16:58:06] Loaded queue successfully.
[16:58:06]
[16:58:06] + Processing work unit
[16:58:06] Core required: FahCore_a0.exe
[16:58:06] Core found.
[16:58:06] Working on Unit 02 [August 25 16:58:06]
[16:58:06] + Working ...
[16:58:06]
[16:58:06] *------------------------------*
[16:58:06] Folding@Home Gromacs 3.3 Core
[16:58:06] Version 1.92 (April 17. 2007)
[16:58:06]
[16:58:06] Preparing to commence simulation
[16:58:06] - Looking at optimizations...
[16:58:06] - Files status OK
[16:58:06] - Expanded 1169210 -> 6252409 (decompressed 534.7 percent)
[16:58:06]
[16:58:06] Project: 781 (Run 0, Clone 83, Gen 2)
[16:58:06]
[16:58:06] Assembly optimizations on if available.
[16:58:06] Entering M.D.
[16:58:28] (Starting from checkpoint)
[16:58:28] Protein: Mini chaperonin
[16:58:28] Writing local files
[16:58:28] Completed 116429 out of 500000 steps (23%)
[16:58:28] Extra 3DNow boost OK.
[16:58:28] Extra SSE boost OK.
[18:00:50] Writing local files
[18:00:51] Completed 120000 out of 500000 steps (24 percent)
[19:23:53] Writing local files
< --- SNIP --- >
[14:30:01] Completed 190000 out of 500000 steps (38 percent)
[16:00:06] Writing local files
[16:00:06] Completed 195000 out of 500000 steps (39 percent)
[16:09:25] CoreStatus = 0 (0)
[16:09:25] Client-core communications error: ERROR 0x0
[16:09:25] Deleting current work unit & continuing...
[16:09:43] - Preparing to get new work unit...
[16:09:43] + Attempting to get work packet
[16:09:43] - Connecting to assignment server
[16:09:44] - Successful: assigned to (171.64.122.138).
[16:09:44] + News From Folding@Home: Welcome to Folding@Home
[16:09:44] Loaded queue successfully.
[16:09:49] + Closed connections
[16:09:54]
[16:09:54] + Processing work unit
[16:09:54] Core required: FahCore_a0.exe
[16:09:54] Core found.
[16:09:54] Working on Unit 03 [August 26 16:09:54]
[16:09:54] + Working ...
[16:09:54]
[16:09:54] *------------------------------*
[16:09:54] Folding@Home Gromacs 3.3 Core
[16:09:54] Version 1.92 (April 17. 2007)
[16:09:54]
[16:09:54] Preparing to commence simulation
[16:09:54] - Looking at optimizations...
[16:09:54] - Created dyn
[16:09:54] - Files status OK
[16:09:55] - Expanded 1169210 -> 6252409 (decompressed 534.7 percent)
[16:09:55] - Starting from initial work packet
[16:09:55]
[16:09:55] Project: 781 (Run 0, Clone 83, Gen 2)
[16:09:55]
[16:09:55] Assembly optimizations on if available.
[16:09:55] Entering M.D.
[16:10:01] Protein: Mini chaperonin
[16:10:01] Writing local files
[16:10:02] Extra 3DNow boost OK.
[16:10:02] Extra SSE boost OK.
[16:10:03] Writing local files
[16:10:03] Completed 0 out of 500000 steps (0 percent)
[17:38:44] Writing local files
[17:38:44] Completed 5000 out of 500000 steps (1 percent)
Unit 03 - August 30
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Code: Select all
--- Opening Log file [August 27 16:38:17]
< --- SNIP --- >
[16:38:17] Loaded queue successfully.
[16:38:17]
[16:38:17] + Processing work unit
[16:38:17] Core required: FahCore_a0.exe
[16:38:17] Core found.
[16:38:17] Working on Unit 03 [August 27 16:38:17]
[16:38:17] + Working ...
[16:38:18]
[16:38:18] *------------------------------*
[16:38:18] Folding@Home Gromacs 3.3 Core
[16:38:18] Version 1.92 (April 17. 2007)
[16:38:18]
[16:38:18] Preparing to commence simulation
[16:38:18] - Looking at optimizations...
[16:38:18] - Files status OK
[16:38:20] - Expanded 1169210 -> 6252409 (decompressed 534.7 percent)
[16:38:20]
[16:38:20] Project: 781 (Run 0, Clone 83, Gen 2)
[16:38:20]
[16:38:20] Assembly optimizations on if available.
[16:38:20] Entering M.D.
[16:38:43] (Starting from checkpoint)
[16:38:43] Protein: Mini chaperonin
[16:38:43] Writing local files
[16:38:43] Completed 64865 out of 500000 steps (12%)
[16:38:44] Extra 3DNow boost OK.
[16:38:44] Extra SSE boost OK.
[16:42:29] Writing local files
[16:42:29] Completed 65000 out of 500000 steps (13 percent)
< --- SNIP --- >
[10:59:17] Completed 290000 out of 500000 steps (58 percent)
[12:30:02] Writing local files
[12:30:02] Completed 295000 out of 500000 steps (59 percent)
-------------------------------------------------------
Program Core_A0.exe, VERSION 3.3
Source code file: fatal.c, line: 342
Fatal error:
NaN detected: (ener[20])
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
[12:54:12] Gromacs error.
[12:54:12]
[12:54:12] Folding@home Core Shutdown: UNKNOWN_ERROR
[12:54:13] CoreStatus = 79 (121)
[12:54:13] Client-core communications error: ERROR 0x79
[12:54:13] Deleting current work unit & continuing...
[12:54:30] - Preparing to get new work unit...
[12:54:30] + Attempting to get work packet
[12:54:30] - Connecting to assignment server
[12:54:31] - Successful: assigned to (171.64.122.138).
[12:54:31] + News From Folding@Home: Welcome to Folding@Home
[12:54:31] Loaded queue successfully.
[12:54:36] + Closed connections
[12:54:41]
[12:54:41] + Processing work unit
[12:54:41] Core required: FahCore_a0.exe
[12:54:41] Core found.
[12:54:41] Working on Unit 04 [August 30 12:54:41]
[12:54:41] + Working ...
[12:54:41]
[12:54:41] *------------------------------*
[12:54:41] Folding@Home Gromacs 3.3 Core
[12:54:41] Version 1.92 (April 17. 2007)
[12:54:41]
[12:54:41] Preparing to commence simulation
[12:54:41] - Looking at optimizations...
[12:54:41] - Created dyn
[12:54:41] - Files status OK
[12:54:42] - Expanded 1169210 -> 6252409 (decompressed 534.7 percent)
[12:54:42] - Starting from initial work packet
[12:54:42]
[12:54:42] Project: 781 (Run 0, Clone 83, Gen 2)
[12:54:42]
[12:54:42] Assembly optimizations on if available.
[12:54:42] Entering M.D.
No option -tpi
starting mdrun 'Mini chaperonin'
500000 steps, 1000.0 ps.
[12:54:48] Protein: Mini chaperonin
[12:54:48] Writing local files
[12:54:49] Extra 3DNow boost OK.
[12:54:49] Extra SSE boost OK.
[12:54:50] Writing local files
[12:54:50] Completed 0 out of 500000 steps (0 percent)
[14:25:38] Writing local files
Right now I'm running this from xterm so I can monitor the goings on of it all. CTRL-C seems the best way to end the task before shutting down / rebooting. That doesn't seem to pose any problems.
I hope to have this resolved because, as you see in the logs, this is a HUGE work unit, and takes DAYS before it crashes / restarts.
Thanks,
-- Nate