Project 2673 - Linux SMP
Moderators: Site Moderators, FAHC Science Team
Project 2673 - Linux SMP
Project 2673 appears to run fine on my Linux (Ubuntu 8.04) machine, but it cretaes a Unitinfo.txt file in the root FAH folder that is 80+ Megabytes in size. This causes problems for monitoring software and the size seems excessive. Is there some valid reason for this?
Last edited by MichaelO on Mon Nov 10, 2008 3:01 pm, edited 1 time in total.
-
- Site Admin
- Posts: 1288
- Joined: Fri Nov 30, 2007 9:37 am
- Location: Oxfordshire, UK
Re: Project 2673 - Linux SMP
Can you run `head' on the file and post the output, that will at least give us an indication of what is being written to the file.
Code: Select all
head unitinfo.txt
Re: Project 2673 - Linux SMP
Uncle_Fungus.
Neither head, nor attempts to open with Gedit produced any meaningful results.
Head produce nothing but a screen full of pipes ( | ) in the terminal output and Gedit just failed to open the file. My guess is that it is garbage, but when deleted it just reproduces itself. Very strange and I have only ever seen it with this WU.
Neither head, nor attempts to open with Gedit produced any meaningful results.
Head produce nothing but a screen full of pipes ( | ) in the terminal output and Gedit just failed to open the file. My guess is that it is garbage, but when deleted it just reproduces itself. Very strange and I have only ever seen it with this WU.
Re: Project 2673 - Linux SMP
Xlikon - sorry - typo - 80 Megabytes - still quite large. Post corrected.Xilikon wrote:80,000 megabytes as in 80 Gb ?
-
- Site Admin
- Posts: 1288
- Joined: Fri Nov 30, 2007 9:37 am
- Location: Oxfordshire, UK
Re: Project 2673 - Linux SMP
OK, the pipes do tell me something useful (I'm not surprised gedit refused to open the file, it's not designed to open files that large). Can you post the contents of your FAHlog.txt as well please? I suspect this WU has been misconfigured and is starting at a % value much larger than 0 and is filling unitinfo.txt with | to match.
Re: Project 2673 - Linux SMP
We think it's a core problem rather than a WU problem. If you look at your FAHlog.txt, you'll probably notice some really large % progress. There appears to be an issue with the A2 core that causes this to happen occasionally. Often on a checkpoint restart. The WU's still appear to be valid, though.
Re: Project 2673 - Linux SMP
Here is the fahlog for the project - it is currently still running.
Hope this helps.
Code: Select all
[09:01:03] - Warning: Could not delete all work unit files (1): Core file absent
[09:01:03] Trying to send all finished work units
[09:01:03] + No unsent completed units remaining.
[09:01:03] - Preparing to get new work unit...
[09:01:03] + Attempting to get work packet
[09:01:03] - Will indicate memory of 2014 MB
[09:01:03] - Connecting to assignment server
[09:01:03] Connecting to http://assign.stanford.edu:8080/
[09:01:03] Posted data.
[09:01:03] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[09:01:03] + News From Folding@Home: Welcome to Folding@Home
[09:01:03] Loaded queue successfully.
[09:01:03] Connecting to http://171.67.108.24:8080/
[09:01:09] Posted data.
[09:01:09] Initial: 0000; - Receiving payload (expected size: 4840064)
[09:01:19] - Downloaded at ~472 kB/s
[09:01:19] - Averaged speed for that direction ~334 kB/s
[09:01:19] + Received work.
[09:01:19] Trying to send all finished work units
[09:01:19] + No unsent completed units remaining.
[09:01:19] + Closed connections
[09:01:19]
[09:01:19] + Processing work unit
[09:01:19] At least 4 processors must be requested.Core required: FahCore_a2.exe
[09:01:19] Core found.
[09:01:19] Working on queue slot 02 [November 10 09:01:19 UTC]
[09:01:19] + Working ...
[09:01:19] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 30 -verbose -lifeline 6944 -version 623'
[09:01:19]
[09:01:19] *------------------------------*
[09:01:19] Folding@Home Gromacs SMP Core
[09:01:19] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[09:01:19]
[09:01:19] Preparing to commence simulation
[09:01:19] - Ensuring status. Please wait.
[09:01:19] Called DecompressByteArray: compressed_data_size=4839552 data_size=24005045, decompressed_data_size=24005045 diff=0
[09:01:20] - Digital signature verified
[09:01:20]
[09:01:20] Project: 2673 (Run 3, Clone 20, Gen 11)
[09:01:20]
[09:01:20] Assembly optimizations on if available.
[09:01:20] Entering M.D.
[09:01:29] (Run 3, Clone 20, Gen 11)
[09:01:29]
[09:01:29] Entering M.D.
[09:20:05] Completed 10009 out of 499999 steps (2%)
[09:29:22] Completed 15009 out of 499999 steps (3%)
[09:38:40] Completed 20009 out of 499999 steps (4%)
[09:47:58] Completed 25009 out of 499999 steps (5%)
[09:57:15] Completed 30009 out of 499999 steps (6%)
[10:06:33] Completed 35009 out of 499999 steps (7%)
[10:15:49] Completed 40009 out of 499999 steps (8%)
[10:25:07] Completed 45009 out of 499999 steps (9%)
[10:34:24] Completed 50009 out of 499999 steps (10%)
[10:43:40] Completed 55009 out of 499999 steps (11%)
[10:52:58] Completed 60009 out of 499999 steps (12%)
[11:02:14] Completed 65009 out of 499999 steps (13%)
[11:11:32] Completed 70009 out of 499999 steps (14%)
[11:20:49] Completed 75009 out of 499999 steps (15%)
[11:30:06] Completed 80009 out of 499999 steps (16%)
[11:39:23] Completed 85009 out of 499999 steps (17%)
[11:48:40] Completed 90009 out of 499999 steps (18%)
[11:57:56] Completed 95009 out of 499999 steps (19%)
[12:07:12] Completed 100009 out of 499999 steps (20%)
[12:16:29] Completed 105009 out of 499999 steps (21%)
[12:25:45] Completed 110009 out of 499999 steps (22%)
[12:35:01] Completed 115009 out of 499999 steps (23%)
[12:44:18] Completed 120009 out of 499999 steps (24%)
[12:53:34] Completed 125009 out of 499999 steps (25%)
[13:02:48] Completed 130009 out of 499999 steps (26%)
[13:12:01] Completed 135009 out of 499999 steps (27%)
[13:14:39] - Autosending finished units... [November 10 13:14:39 UTC]
[13:14:39] Trying to send all finished work units
[13:14:39] + No unsent completed units remaining.
[13:14:39] - Autosend completed
[13:21:15] Completed 140009 out of 499999 steps (28%)
[13:30:29] Completed 145009 out of 499999 steps (29%)
[13:39:43] Completed 150009 out of 499999 steps (30%)
[13:48:57] Completed 155009 out of 499999 steps (31%)
[13:58:11] Completed 160009 out of 499999 steps (32%)
[14:07:27] Completed 165009 out of 499999 steps (33%)
[14:16:42] Completed 170009 out of 499999 steps (34%)
[14:25:55] Completed 175009 out of 499999 steps (35%)
[14:35:09] Completed 180009 out of 499999 steps (36%)
[14:44:21] Completed 185009 out of 499999 steps (37%)
[14:55:27] Completed 190009 out of 499999 steps (38%)
[15:04:44] Completed 195009 out of 499999 steps (39%)
-
- Posts: 366
- Joined: Tue Feb 12, 2008 7:33 am
- Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
- Location: SE Michigan, USA
Re: Project 2673 - Linux SMP
cut -c 1-60 unitinfo.txt
this will remove the pipes
this will remove the pipes
Re: Project 2673 - Linux SMP
Parkut -parkut wrote:cut -c 1-60 unitinfo.txt
this will remove the pipes
Thanks for the comment. I am reporting this because of the monitoring problem and the fact that the file does not seem necessary. Fortunately the WU itself seems to process just fine, but there does appear to be something wrong that needed to be brought to the attention of the staff before more of these big projects get introduced.
Re: Project 2673 - Linux SMP
One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Re: Project 2673 - Linux SMP
Back up the WU and post it for examination.MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Re: Project 2673 - Linux SMP
I have not done that before. Can you point me to the instructions for doing this.Ivoshiee wrote:Back up the WU and post it for examination.MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Re: Project 2673 - Linux SMP
MichaelO wrote:I have not done that before. Can you point me to the instructions for doing this.Ivoshiee wrote:Back up the WU and post it for examination.MichaelO wrote:One addition observation. I stopped the client, deleted the file, restarted the client, and watched the file be reproduced. In Linux it is reported at 82 Meg and it grows to that size immediately but no larger.
Code: Select all
tar czf faulty_WU.tgz /path_to_the_FAH_client
Re: Project 2673 - Linux SMP
Ivoshiee -
I have the TAR - post or upload to where. Sorry for all the questions. First time at this.
I have the TAR - post or upload to where. Sorry for all the questions. First time at this.