Page 1 of 1

Project: 6062 (Run 0, Clone 198, Gen 177) Exits at .06 sec

Posted: Mon Mar 21, 2011 3:10 pm
by Joe_H
Got another unit that appears bad, exited immediately both times it downloaded. System is stable, not overclocked, and has done over 150 wu's successfully. Log extract follows:

Code: Select all

--- Opening Log file [March 21 14:46:44 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/************/Library/Folding@home
Executable: ./fah6
Arguments: -oneunit -smp 2 -verbosity 9 

[14:46:44] - Ask before connecting: No
[14:46:44] - User name: Joe_H (Team 38910)
[14:46:44] - User ID: ****************
[14:46:44] - Machine ID: 3
[14:46:44] 
[14:46:44] Loaded queue successfully.
[14:46:44] - Preparing to get new work unit...
[14:46:44] Cleaning up work directory
[14:46:44] - Autosending finished units... [March 21 14:46:44 UTC]
[14:46:44] Trying to send all finished work units
[14:46:44] + No unsent completed units remaining.
[14:46:44] - Autosend completed
[14:46:45] + Attempting to get work packet
[14:46:45] Passkey found
[14:46:45] - Will indicate memory of 2048 MB
[14:46:45] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 6
[14:46:45] - Connecting to assignment server
[14:46:45] Connecting to http://assign.stanford.edu:8080/
[14:46:45] Posted data.
[14:46:45] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[14:46:45] + News From Folding@Home: Welcome to Folding@Home
[14:46:45] Loaded queue successfully.
[14:46:45] Sent data
[14:46:45] Connecting to http://171.64.65.54:8080/
[14:46:46] Posted data.
[14:46:46] Initial: 0000; - Receiving payload (expected size: 1766121)
[14:46:49] - Downloaded at ~574 kB/s
[14:46:49] - Averaged speed for that direction ~488 kB/s
[14:46:49] + Received work.
[14:46:49] + Closed connections
[14:46:49] 
[14:46:49] + Processing work unit
[14:46:49] Core required: FahCore_a3.exe
[14:46:49] Core found.
[14:46:49] Working on queue slot 00 [March 21 14:46:49 UTC]
[14:46:49] + Working ...
[14:46:49] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 00 -np 2 -checkpoint 15 -verbose -lifeline 6906 -version 629'

[14:46:50] 
[14:46:50] *------------------------------*
[14:46:50] Folding@Home Gromacs SMP Core
[14:46:50] Version 2.22 (May 7 2010)
[14:46:50] 
[14:46:50] Preparing to commence simulation
[14:46:50] - Ensuring status. Please wait.
[14:47:00] - Looking at optimizations...
[14:47:00] - Working with standard loops on this execution.
[14:47:00] - Created dyn
[14:47:00] - Files status OK
[14:47:00] - Expanded 1765609 -> 2254429 (decompressed 127.6 percent)
[14:47:00] Called DecompressByteArray: compressed_data_size=1765609 data_size=2254429, decompressed_data_size=2254429 diff=0
[14:47:00] - Digital signature verified
[14:47:00] 
[14:47:00] Project: 6062 (Run 0, Clone 198, Gen 177)
[14:47:00] 
[14:47:00] Entering M.D.
[14:47:06] CoreStatus = 0 (0)
[14:47:06] Sending work to server
[14:47:06] Project: 6062 (Run 0, Clone 198, Gen 177)
[14:47:06] - Error: Could not get length of results file work/wuresults_00.dat
[14:47:06] - Error: Could not read unit 00 file. Removing from queue.
[14:47:06] Trying to send all finished work units
[14:47:06] + No unsent completed units remaining.
[14:47:06] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[14:47:06] Cleaning up work directory
[14:47:06] ***** Got a SIGTERM signal (15)
[14:47:06] Killing all core threads

Folding@Home Client Shutdown.
Well, off to change machine ID, etc. to see about getting another unit to run later.

Re: Project: 6062 (Run 0, Clone 198, Gen 177) Exits at .06 s

Posted: Tue Mar 22, 2011 6:00 pm
by Joe_H
Followup to my own post, apparently this unit was downloaded by another machine I have folding about 2.5 hours before the download time in the log for my report. It ran fine on that Mac, and completed in 8 hours. No idea why it failed on the second Mac.

Further investigation showed both machines had the same User ID, but different Machine ID's. As best as I can tell, when I migrated account settings from the older machine to the newer last Summer, and then reinstalled F@H, one of the settings files did not get deleted when I deleted the previous folding setup that carried over. I have fixed that, now the machines have unique User ID's. I did not notice the matching User ID's last year when both were folding because they were getting different WU's from the assignment servers when running at the same time.

One apparent consequence of the two both being listed as downloading the same WU and failing twice on one, is that on the unit turned in successfully after 8 hours I just got the base points without any bonus. If that should not be the case, let me know. I had another unit last week that was turned in successfully in 8 hours and got no bonus then as well. I checked my logs, there was no duplicate units between machines then. I can't recall the unit PRCG as I did not make any specific note at the time thinking it was just a temporary glitch.

Anyways, just putting this out there in case it helps someone else troubleshoot a similar issue.

Re: Project: 6062 (Run 0, Clone 198, Gen 177) Exits at .06 s

Posted: Tue Mar 22, 2011 8:10 pm
by HendricksSA
Joe_H, if you want points to accumulate for one username, you should use it (and the same passkey) on each machine you own. The machine id only needs to be unique on the same operating system (same pc). You didn't get a bonus because the work was credited to the new user name.

Re: Project: 6062 (Run 0, Clone 198, Gen 177) Exits at .06 s

Posted: Tue Mar 22, 2011 9:01 pm
by bruce
HendricksSA wrote:The machine id only needs to be unique on the same operating system (same pc).
You were right about UserName and Passkey, but the MachineID may or may not be a critical issue here. When you install FAH on a new machine, Stanford assigns it a UserID (no I don't mean UserName). Each client must have either a unique UserID or a unique MachineID. The implicit assumption is that every machine was assigned a unique UserID. Occasionally this assumption fails to be true. In Linux/OS-X, if the complete client is copied from one machine to another, including a file called machinedependent.dat, this will cause duplicate assignments. In Windows, if the Registry is copied, this will cause duplicate assignments. As long as a new client is installed on a new OS, there will be no problem.

The fix for this problem, if it exists, is to delete machinedependent.dat or the Registry key on the cloned installattion.

Re: Project: 6062 (Run 0, Clone 198, Gen 177) Exits at .06 s

Posted: Wed Mar 23, 2011 3:22 am
by Joe_H
HendricksSA wrote:Joe_H, if you want points to accumulate for one username, you should use it (and the same passkey) on each machine you own. The machine id only needs to be unique on the same operating system (same pc). You didn't get a bonus because the work was credited to the new user name.
The username and passkey were correct, the machine I was folding the two units on, for which I got no bonus, successfully folded units before and after and received bonuses just fine. The unit without bonus points was credited to the same username and team as I have been on since the beginning.
bruce wrote: The fix for this problem, if it exists, is to delete machinedependent.dat or the Registry key on the cloned installattion.
For whatever reason, it looks like the first time I removed F@H and reinstalled after migrating from my older Mac that machinedependent.dat (if that is where the UserId is kept) was not deleted with everything else. I had to enter my own settings when I started up folding on the new iMac. Anyways, I did a full uninstall and reinstall, and they are properly unique again. Other details of the migration to the new machine are lost in a mental haze from finishing off chemo and surgery over the 6+ months in between.

Re: Project: 6062 (Run 0, Clone 198, Gen 177) Exits at .06 s

Posted: Wed Mar 23, 2011 6:29 am
by HendricksSA
Joe_H, I didn't read your post closely enough to understand it properly. Since you have received bonuses before and after this work unit, it may have been a timing issue. If I understand correctly, you downloaded and worked on the same work unit on two different machines. You may have received the bonus for the work unit on one machine and not the other. Another possibility is the Stanford database kept track of the earlier of the two work unit assignment times and calculated bonuses based on the resulting completion times. One got the bonus and the other did not. Anyway, I'm glad to hear it is not an ongoing problem but I'm sorry to hear about your chemo and surgery. I hope you get clear reports in the future!