Completed, but not submitted WU

Moderators: Site Moderators, FAHC Science Team

Post Reply
dutchmm
Posts: 15
Joined: Fri Dec 14, 2007 2:37 pm

Completed, but not submitted WU

Post by dutchmm »

I noticed this morning that the latest WU on my wife's computer had not been credited. When I looked at the log, I could see it had stopped at 100%, and not submitted itself:

Code: Select all

Performance:     23.126      2.874      0.560     42.839
[06:54:08] Completed 500000 out of 500000 steps  (100 percent)
[06:54:08] Writing final coordinates.
[06:54:08] Past main M.D. loop
[06:54:08] Will end MPI now
[06:55:08]
[06:55:08] Finished Work Unit:
[06:55:08] - Reading up to 3721200 from "work/wudata_01.arc": Read 3721200
[06:55:08] - Reading up to 1774276 from "work/wudata_01.xtc": Read 1774276
[06:55:08] goefile size: 0
[06:55:08] logfile size: 16927
[06:55:08] Leaving Run
[06:55:08] - Writing 5516803 bytes of core data to disk...
[06:55:08]   ... Done.
[06:55:08] - Shutting down core
[06:55:08]
[06:55:08] Folding@home Core Shutdown: FINISHED_UNIT
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.

I tried to restart the folding service (this is Linux SMP 6.0, beta 2), but the folding client did not want to run:

Code: Select all

[09:06:16] - Ask before connecting: No
[09:06:16] - User name: Dutchmm (Team 31574)
[09:06:16] - User ID: 627445492C7D9818
[09:06:16] - Machine ID: 1
[09:06:16]

A potential conflict was detected:

Process 7655 is currently running and may also be a client with Mach. ID 1.
Program will now exit. Upon restart, this check will not be done --
you may wish to check that no client is currently running in
/root before restarting.

Please press any key to exit.
At this stage, I decided to try and kill off the rogue process, and all the other FAH processes. Still it would not start

Code: Select all

[09:08:02] - Ask before connecting: No
[09:08:02] - User name: Dutchmm (Team 31574)
[09:08:02] - User ID: 627445492C7D9818
[09:08:02] - Machine ID: 1
[09:08:02]
[09:08:03] Loaded queue successfully.
[09:08:03]
[09:08:03] + Processing work unit
[09:08:03] Core required: FahCore_a1.exe
[09:08:03] Core found.
[09:08:03] Working on Unit 01 [March 6 09:08:03]
[09:08:03] + Working ...
[09:08:03]
[09:08:03] *------------------------------*
[09:08:03] Folding@Home Gromacs SMP Core
[09:08:03] Version 1.74 (November 27, 2006)
[09:08:03]
[09:08:03] Preparing to commence simulation
[09:08:03] - Ensuring status. Please wait.
[09:08:03] - Shutting down core
[09:08:20] k packet
[09:08:20]
[09:08:20] Project: 0i(Run 0, Clone 0, Gen 0)
[09:08:20]
[09:08:20] Error: Could not write local file.  Exiting.
[09:08:20] ot write local file.  Exiting.
[09:08:25] - Shutting down core
[0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[09:10:12] CoreStatus = 12 (18)
[09:10:12] Client-core communications error: ERROR 0x12
[09:10:12] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[09:14:33] - Preparing to get new work unit...
[09:14:33] + Attempting to get work packet
[09:14:33] - Connecting to assignment server
[09:14:34] - Successful: assigned to (171.64.65.56).
[09:14:34] + News From Folding@Home: Welcome to Folding@Home
[09:14:34] Loaded queue successfully.
[09:14:40] + Closed connections
So I decided perhaps the only way to get this WU uploaded was to bounce the machine. But no, it still does not want to submit this WU' s results, which - despite the alarming messages in the log, are still available. So my question is: where do I send them manually, and using which protocol?

FWIW, the unit in question is: Project: 2605 (Run 6, Clone 146, Gen 33)

Thx in advance for yr help.

Mike
Ren02
Posts: 98
Joined: Tue Dec 11, 2007 1:16 am
Location: Estonia

Re: Completed, but not submitted WU

Post by Ren02 »

You'll need a tool called qgen.
Basically your queue.dat file is corrupt and the only fix is to create a new one. Make a copy of the directory and then run qgen in it. I've used it once and don't remember all the details at the moment, but it will tell you what to do. (Will probably say that this looks too much like a real folding directory and wants you to rename a few files.)
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Completed, but not submitted WU

Post by 7im »

I'd start with qfix, and work up to qgen. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply