Project 2673 - Linux SMP

Moderators: Site Moderators, FAHC Science Team

MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Project 2673 - Linux SMP

Post by MichaelO »

Project 2673 appears to run fine on my Linux (Ubuntu 8.04) machine, but it cretaes a Unitinfo.txt file in the root FAH folder that is 80+ Megabytes in size. This causes problems for monitoring software and the size seems excessive. Is there some valid reason for this?
Last edited by MichaelO on Mon Nov 10, 2008 3:01 pm, edited 1 time in total.
Image
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: Project 2673 - Linux SMP

Post by uncle_fungus »

Can you run `head' on the file and post the output, that will at least give us an indication of what is being written to the file.

Code: Select all

head unitinfo.txt
Xilikon
Posts: 155
Joined: Sun Dec 02, 2007 1:34 pm

Re: Project 2673 - Linux SMP

Post by Xilikon »

80,000 megabytes as in 80 Gb ?
Image
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

Uncle_Fungus.

Neither head, nor attempts to open with Gedit produced any meaningful results.

Head produce nothing but a screen full of pipes ( | ) in the terminal output and Gedit just failed to open the file. My guess is that it is garbage, but when deleted it just reproduces itself. Very strange and I have only ever seen it with this WU.
Image
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

Xilikon wrote:80,000 megabytes as in 80 Gb ?
Xlikon - sorry - typo - 80 Megabytes - still quite large. Post corrected.
Image
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: Project 2673 - Linux SMP

Post by uncle_fungus »

OK, the pipes do tell me something useful (I'm not surprised gedit refused to open the file, it's not designed to open files that large). Can you post the contents of your FAHlog.txt as well please? I suspect this WU has been misconfigured and is starting at a % value much larger than 0 and is filling unitinfo.txt with | to match.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project 2673 - Linux SMP

Post by kasson »

We think it's a core problem rather than a WU problem. If you look at your FAHlog.txt, you'll probably notice some really large % progress. There appears to be an issue with the A2 core that causes this to happen occasionally. Often on a checkpoint restart. The WU's still appear to be valid, though.
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

Here is the fahlog for the project - it is currently still running.

Code: Select all


[09:01:03] - Warning: Could not delete all work unit files (1): Core file absent
[09:01:03] Trying to send all finished work units
[09:01:03] + No unsent completed units remaining.
[09:01:03] - Preparing to get new work unit...
[09:01:03] + Attempting to get work packet
[09:01:03] - Will indicate memory of 2014 MB
[09:01:03] - Connecting to assignment server
[09:01:03] Connecting to http://assign.stanford.edu:8080/
[09:01:03] Posted data.
[09:01:03] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[09:01:03] + News From Folding@Home: Welcome to Folding@Home
[09:01:03] Loaded queue successfully.
[09:01:03] Connecting to http://171.67.108.24:8080/
[09:01:09] Posted data.
[09:01:09] Initial: 0000; - Receiving payload (expected size: 4840064)
[09:01:19] - Downloaded at ~472 kB/s
[09:01:19] - Averaged speed for that direction ~334 kB/s
[09:01:19] + Received work.
[09:01:19] Trying to send all finished work units
[09:01:19] + No unsent completed units remaining.
[09:01:19] + Closed connections
[09:01:19] 
[09:01:19] + Processing work unit
[09:01:19] At least 4 processors must be requested.Core required: FahCore_a2.exe
[09:01:19] Core found.
[09:01:19] Working on queue slot 02 [November 10 09:01:19 UTC]
[09:01:19] + Working ...
[09:01:19] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 30 -verbose -lifeline 6944 -version 623'

[09:01:19] 
[09:01:19] *------------------------------*
[09:01:19] Folding@Home Gromacs SMP Core
[09:01:19] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[09:01:19] 
[09:01:19] Preparing to commence simulation
[09:01:19] - Ensuring status. Please wait.
[09:01:19] Called DecompressByteArray: compressed_data_size=4839552 data_size=24005045, decompressed_data_size=24005045 diff=0
[09:01:20] - Digital signature verified
[09:01:20] 
[09:01:20] Project: 2673 (Run 3, Clone 20, Gen 11)
[09:01:20] 
[09:01:20] Assembly optimizations on if available.
[09:01:20] Entering M.D.
[09:01:29] (Run 3, Clone 20, Gen 11)
[09:01:29] 
[09:01:29] Entering M.D.
[09:20:05] Completed 10009 out of 499999 steps  (2%)
[09:29:22] Completed 15009 out of 499999 steps  (3%)
[09:38:40] Completed 20009 out of 499999 steps  (4%)
[09:47:58] Completed 25009 out of 499999 steps  (5%)
[09:57:15] Completed 30009 out of 499999 steps  (6%)
[10:06:33] Completed 35009 out of 499999 steps  (7%)
[10:15:49] Completed 40009 out of 499999 steps  (8%)
[10:25:07] Completed 45009 out of 499999 steps  (9%)
[10:34:24] Completed 50009 out of 499999 steps  (10%)
[10:43:40] Completed 55009 out of 499999 steps  (11%)
[10:52:58] Completed 60009 out of 499999 steps  (12%)
[11:02:14] Completed 65009 out of 499999 steps  (13%)
[11:11:32] Completed 70009 out of 499999 steps  (14%)
[11:20:49] Completed 75009 out of 499999 steps  (15%)
[11:30:06] Completed 80009 out of 499999 steps  (16%)
[11:39:23] Completed 85009 out of 499999 steps  (17%)
[11:48:40] Completed 90009 out of 499999 steps  (18%)
[11:57:56] Completed 95009 out of 499999 steps  (19%)
[12:07:12] Completed 100009 out of 499999 steps  (20%)
[12:16:29] Completed 105009 out of 499999 steps  (21%)
[12:25:45] Completed 110009 out of 499999 steps  (22%)
[12:35:01] Completed 115009 out of 499999 steps  (23%)
[12:44:18] Completed 120009 out of 499999 steps  (24%)
[12:53:34] Completed 125009 out of 499999 steps  (25%)
[13:02:48] Completed 130009 out of 499999 steps  (26%)
[13:12:01] Completed 135009 out of 499999 steps  (27%)
[13:14:39] - Autosending finished units... [November 10 13:14:39 UTC]
[13:14:39] Trying to send all finished work units
[13:14:39] + No unsent completed units remaining.
[13:14:39] - Autosend completed
[13:21:15] Completed 140009 out of 499999 steps  (28%)
[13:30:29] Completed 145009 out of 499999 steps  (29%)
[13:39:43] Completed 150009 out of 499999 steps  (30%)
[13:48:57] Completed 155009 out of 499999 steps  (31%)
[13:58:11] Completed 160009 out of 499999 steps  (32%)
[14:07:27] Completed 165009 out of 499999 steps  (33%)
[14:16:42] Completed 170009 out of 499999 steps  (34%)
[14:25:55] Completed 175009 out of 499999 steps  (35%)
[14:35:09] Completed 180009 out of 499999 steps  (36%)
[14:44:21] Completed 185009 out of 499999 steps  (37%)
[14:55:27] Completed 190009 out of 499999 steps  (38%)
[15:04:44] Completed 195009 out of 499999 steps  (39%)
Hope this helps.
Image
parkut
Posts: 366
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: Project 2673 - Linux SMP

Post by parkut »

cut -c 1-60 unitinfo.txt

this will remove the pipes
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

parkut wrote:cut -c 1-60 unitinfo.txt

this will remove the pipes
Parkut -

Thanks for the comment. I am reporting this because of the monitoring problem and the fact that the file does not seem necessary. Fortunately the WU itself seems to process just fine, but there does appear to be something wrong that needed to be brought to the attention of the staff before more of these big projects get introduced.
Image
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Image
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Re: Project 2673 - Linux SMP

Post by Ivoshiee »

MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Back up the WU and post it for examination.
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

Ivoshiee wrote:
MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Back up the WU and post it for examination.
I have not done that before. Can you point me to the instructions for doing this.
Image
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Re: Project 2673 - Linux SMP

Post by Ivoshiee »

MichaelO wrote:
Ivoshiee wrote:
MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Back up the WU and post it for examination.
I have not done that before. Can you point me to the instructions for doing this.

Code: Select all

tar czf faulty_WU.tgz /path_to_the_FAH_client 
And the finstall script has fah_backup script available for the exactly same purpose.
MichaelO
Posts: 50
Joined: Tue Jan 01, 2008 8:59 pm

Re: Project 2673 - Linux SMP

Post by MichaelO »

Ivoshiee -

I have the TAR - post or upload to where. Sorry for all the questions. First time at this.
Image
Post Reply