Page 1 of 2

Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 2:22 pm
by MichaelO
Project 2673 appears to run fine on my Linux (Ubuntu 8.04) machine, but it cretaes a Unitinfo.txt file in the root FAH folder that is 80+ Megabytes in size. This causes problems for monitoring software and the size seems excessive. Is there some valid reason for this?

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 2:25 pm
by uncle_fungus
Can you run `head' on the file and post the output, that will at least give us an indication of what is being written to the file.

Code: Select all

head unitinfo.txt

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 2:54 pm
by Xilikon
80,000 megabytes as in 80 Gb ?

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 2:59 pm
by MichaelO
Uncle_Fungus.

Neither head, nor attempts to open with Gedit produced any meaningful results.

Head produce nothing but a screen full of pipes ( | ) in the terminal output and Gedit just failed to open the file. My guess is that it is garbage, but when deleted it just reproduces itself. Very strange and I have only ever seen it with this WU.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:00 pm
by MichaelO
Xilikon wrote:80,000 megabytes as in 80 Gb ?
Xlikon - sorry - typo - 80 Megabytes - still quite large. Post corrected.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:05 pm
by uncle_fungus
OK, the pipes do tell me something useful (I'm not surprised gedit refused to open the file, it's not designed to open files that large). Can you post the contents of your FAHlog.txt as well please? I suspect this WU has been misconfigured and is starting at a % value much larger than 0 and is filling unitinfo.txt with | to match.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:05 pm
by kasson
We think it's a core problem rather than a WU problem. If you look at your FAHlog.txt, you'll probably notice some really large % progress. There appears to be an issue with the A2 core that causes this to happen occasionally. Often on a checkpoint restart. The WU's still appear to be valid, though.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:16 pm
by MichaelO
Here is the fahlog for the project - it is currently still running.

Code: Select all


[09:01:03] - Warning: Could not delete all work unit files (1): Core file absent
[09:01:03] Trying to send all finished work units
[09:01:03] + No unsent completed units remaining.
[09:01:03] - Preparing to get new work unit...
[09:01:03] + Attempting to get work packet
[09:01:03] - Will indicate memory of 2014 MB
[09:01:03] - Connecting to assignment server
[09:01:03] Connecting to http://assign.stanford.edu:8080/
[09:01:03] Posted data.
[09:01:03] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[09:01:03] + News From Folding@Home: Welcome to Folding@Home
[09:01:03] Loaded queue successfully.
[09:01:03] Connecting to http://171.67.108.24:8080/
[09:01:09] Posted data.
[09:01:09] Initial: 0000; - Receiving payload (expected size: 4840064)
[09:01:19] - Downloaded at ~472 kB/s
[09:01:19] - Averaged speed for that direction ~334 kB/s
[09:01:19] + Received work.
[09:01:19] Trying to send all finished work units
[09:01:19] + No unsent completed units remaining.
[09:01:19] + Closed connections
[09:01:19] 
[09:01:19] + Processing work unit
[09:01:19] At least 4 processors must be requested.Core required: FahCore_a2.exe
[09:01:19] Core found.
[09:01:19] Working on queue slot 02 [November 10 09:01:19 UTC]
[09:01:19] + Working ...
[09:01:19] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 30 -verbose -lifeline 6944 -version 623'

[09:01:19] 
[09:01:19] *------------------------------*
[09:01:19] Folding@Home Gromacs SMP Core
[09:01:19] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[09:01:19] 
[09:01:19] Preparing to commence simulation
[09:01:19] - Ensuring status. Please wait.
[09:01:19] Called DecompressByteArray: compressed_data_size=4839552 data_size=24005045, decompressed_data_size=24005045 diff=0
[09:01:20] - Digital signature verified
[09:01:20] 
[09:01:20] Project: 2673 (Run 3, Clone 20, Gen 11)
[09:01:20] 
[09:01:20] Assembly optimizations on if available.
[09:01:20] Entering M.D.
[09:01:29] (Run 3, Clone 20, Gen 11)
[09:01:29] 
[09:01:29] Entering M.D.
[09:20:05] Completed 10009 out of 499999 steps  (2%)
[09:29:22] Completed 15009 out of 499999 steps  (3%)
[09:38:40] Completed 20009 out of 499999 steps  (4%)
[09:47:58] Completed 25009 out of 499999 steps  (5%)
[09:57:15] Completed 30009 out of 499999 steps  (6%)
[10:06:33] Completed 35009 out of 499999 steps  (7%)
[10:15:49] Completed 40009 out of 499999 steps  (8%)
[10:25:07] Completed 45009 out of 499999 steps  (9%)
[10:34:24] Completed 50009 out of 499999 steps  (10%)
[10:43:40] Completed 55009 out of 499999 steps  (11%)
[10:52:58] Completed 60009 out of 499999 steps  (12%)
[11:02:14] Completed 65009 out of 499999 steps  (13%)
[11:11:32] Completed 70009 out of 499999 steps  (14%)
[11:20:49] Completed 75009 out of 499999 steps  (15%)
[11:30:06] Completed 80009 out of 499999 steps  (16%)
[11:39:23] Completed 85009 out of 499999 steps  (17%)
[11:48:40] Completed 90009 out of 499999 steps  (18%)
[11:57:56] Completed 95009 out of 499999 steps  (19%)
[12:07:12] Completed 100009 out of 499999 steps  (20%)
[12:16:29] Completed 105009 out of 499999 steps  (21%)
[12:25:45] Completed 110009 out of 499999 steps  (22%)
[12:35:01] Completed 115009 out of 499999 steps  (23%)
[12:44:18] Completed 120009 out of 499999 steps  (24%)
[12:53:34] Completed 125009 out of 499999 steps  (25%)
[13:02:48] Completed 130009 out of 499999 steps  (26%)
[13:12:01] Completed 135009 out of 499999 steps  (27%)
[13:14:39] - Autosending finished units... [November 10 13:14:39 UTC]
[13:14:39] Trying to send all finished work units
[13:14:39] + No unsent completed units remaining.
[13:14:39] - Autosend completed
[13:21:15] Completed 140009 out of 499999 steps  (28%)
[13:30:29] Completed 145009 out of 499999 steps  (29%)
[13:39:43] Completed 150009 out of 499999 steps  (30%)
[13:48:57] Completed 155009 out of 499999 steps  (31%)
[13:58:11] Completed 160009 out of 499999 steps  (32%)
[14:07:27] Completed 165009 out of 499999 steps  (33%)
[14:16:42] Completed 170009 out of 499999 steps  (34%)
[14:25:55] Completed 175009 out of 499999 steps  (35%)
[14:35:09] Completed 180009 out of 499999 steps  (36%)
[14:44:21] Completed 185009 out of 499999 steps  (37%)
[14:55:27] Completed 190009 out of 499999 steps  (38%)
[15:04:44] Completed 195009 out of 499999 steps  (39%)
Hope this helps.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:18 pm
by parkut
cut -c 1-60 unitinfo.txt

this will remove the pipes

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:27 pm
by MichaelO
parkut wrote:cut -c 1-60 unitinfo.txt

this will remove the pipes
Parkut -

Thanks for the comment. I am reporting this because of the monitoring problem and the fact that the file does not seem necessary. Fortunately the WU itself seems to process just fine, but there does appear to be something wrong that needed to be brought to the attention of the staff before more of these big projects get introduced.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 3:36 pm
by MichaelO
One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 4:23 pm
by Ivoshiee
MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Back up the WU and post it for examination.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 5:00 pm
by MichaelO
Ivoshiee wrote:
MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Back up the WU and post it for examination.
I have not done that before. Can you point me to the instructions for doing this.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 5:14 pm
by Ivoshiee
MichaelO wrote:
Ivoshiee wrote:
MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Back up the WU and post it for examination.
I have not done that before. Can you point me to the instructions for doing this.

Code: Select all

tar czf faulty_WU.tgz /path_to_the_FAH_client 
And the finstall script has fah_backup script available for the exactly same purpose.

Re: Project 2673 - Linux SMP

Posted: Mon Nov 10, 2008 6:28 pm
by MichaelO
Ivoshiee -

I have the TAR - post or upload to where. Sorry for all the questions. First time at this.