Page 34 of 60

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Sat Dec 13, 2008 12:55 am
by uncle_fungus
Gah. I'll check the rounding logic again then.

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Fri Dec 19, 2008 5:14 pm
by Frodo The Hobbit
I still have some bugs with FahMon... 25% of my Twin Opteron 2212 -_-

Image

Somebody have a solution ?

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Fri Dec 19, 2008 7:05 pm
by jrweiss
Revert to v2.3.2

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Tue Dec 23, 2008 6:24 pm
by Hyperlife
uncle_fungus wrote:For those of you seeing massive CPU load while reloading, can you give me a ballpark figure for the size of your FAHlogs please. I don't think this is the cause of the load but it might be contributing if the files are large.
I did a little more testing on the 99% CPU problem on 2.3.4 under Windows, and I can confirm that the logs are definitely the source of the problem for me. Long CPU spikes only happens when FahMon tries to reload a client with a large log file. Logs under 100K are loaded quickly with no long CPU spike, but every time it reloads a client with a large FAHlog.txt file, it freezes my single-core Pentium M laptop for about 5-10 seconds with 99% CPU usage.

The large log problem is most noticeable on my GPU clients, since they can create large log files relatively quickly. I just rebooted two of them, thereby creating new FAHlog.txt files, and FahMon reloaded them right away with no long CPU spike. The third, with a FAHlog.txt file of 500K, still causes the problem.

So for now, the solution seems to be stopping and restarting my clients if their logs get too big.

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 12:59 am
by P5-133XL
My solution to the problem has been to only load FAHMON when I want to look at it, rather than keep it continiously run. The result, is that regardless of whatever inefficiency occurs it will only be there for a few minutes rather than 24x7.

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 1:15 am
by codysluder
Maybe FahMon could keep track of the number of records in each FAHlog. It's probably less CPU intensive to skip to record N than to read them all sequentially (even over a LAN connection). There would need to be logic to read the whole file if this method fails (like when the file is replaced or if scrolling backwards).

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 8:43 pm
by uncle_fungus
bollix47 wrote:Sorry, to say but there was no difference in the projections. Still as described above except the current WU has 250001 steps.
I've just altered the logic again and it now works for me.

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 9:02 pm
by bollix47
I don't currently have a WU with 250001 steps but do have one with 249999 steps and that one now appears to be working correctly.

The only message(s) that have an X beside them are:

Code: Select all

[24/12/08 - 15:54:03] ! The progress value in file \\ATLANTISVM01\folding\unitinfo.txt could not be found/parsed
[24/12/08 - 15:54:03] X Error while reading \\ATLANTISVM01\folding\unitinfo.txt!
but I get that on all my clients.

Thanks for the Christmas gift and for all the work that you do.

Merry Christmas

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 9:29 pm
by uncle_fungus
I've just uploaded another minor change that will also account for the 249999 case as well as the 250001 case (the calculated frame count would end up as 66 instead of 100 without this fix).

I'm working on the unitinfo error message above, as I can replicate that here too.

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 9:40 pm
by bollix47
Compiled program again for 'another minor change' and can see that the projection for the WU with 249999 steps is now correct. The previous one was off by about an hour (was at 90%) but now appears to be spot on. ;)

OT a bit but what do I need in Windows to do the svn? I've been doing it on a Linux setup and copying the files over to my windows box. :?

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Wed Dec 24, 2008 10:07 pm
by uncle_fungus
Tortoisesvn http://tortoisesvn.tigris.org/ (integrates with explorer) or Slik SVN http://www.sliksvn.com/en/download (plain console version)

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Tue Dec 30, 2008 6:34 am
by EvilAlchemist
Hyperlife wrote:
uncle_fungus wrote:For those of you seeing massive CPU load while reloading, can you give me a ballpark figure for the size of your FAHlogs please. I don't think this is the cause of the load but it might be contributing if the files are large.
I did a little more testing on the 99% CPU problem on 2.3.4 under Windows, and I can confirm that the logs are definitely the source of the problem for me. Long CPU spikes only happens when FahMon tries to reload a client with a large log file. Logs under 100K are loaded quickly with no long CPU spike, but every time it reloads a client with a large FAHlog.txt file, it freezes my single-core Pentium M laptop for about 5-10 seconds with 99% CPU usage.
I was checking my F@H logs from my 10 x ATI 4850 and many were 300K to over 600K.
It does take FAHmon a good while to load each client.

Some are saying go back to 2.3.3 or 2.3.2, but if the logs are the problem, shoudn't it show up there as well>?

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Tue Dec 30, 2008 5:52 pm
by ikerekes
Today morning i started on a WU which caused FahMon to take my E5200@3.7Ghz to its knees.
the WU is:

Code: Select all

[13:47:29] + News From Folding@Home: Welcome to Folding@Home
[13:47:30] Loaded queue successfully.
[13:47:30] Connecting to http://171.64.65.56:8080/
[13:47:35] Posted data.
[13:47:35] Initial: 0000; - Receiving payload (expected size: 4834396)
[13:47:45] - Downloaded at ~472 kB/s
[13:47:45] - Averaged speed for that direction ~409 kB/s
[13:47:45] + Received work.
[13:47:45] Trying to send all finished work units
[13:47:45] + No unsent completed units remaining.
[13:47:45] + Closed connections
[13:47:45] 
[13:47:45] + Processing work unit
[13:47:45] At least 4 processors must be requested.Core required: FahCore_a2.exe
[13:47:45] Core found.
[13:47:45] Working on queue slot 04 [December 30 13:47:45 UTC]
[13:47:45] + Working ...
[13:47:45] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 6668 -version 623'

[13:47:45] 
[13:47:45] *------------------------------*
[13:47:45] Folding@Home Gromacs SMP Core
[13:47:45] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[13:47:45] 
[13:47:45] Preparing to commence simulation
[13:47:45] - Ensuring status. Please wait.
[13:47:54] - Assembly optimizations manually forced on.
[13:47:54] - Not checking prior termination.
[13:47:55] - Expanded 4833884 -> 23977801 (decompressed 496.0 percent)
[13:47:55] Called DecompressByteArray: compressed_data_size=4833884 data_size=23977801, decompressed_data_size=23977801 diff=0
[13:47:56] - Digital signature verified
[13:47:56] 
[13:47:56] Project: 2669 (Run 7, Clone 17, Gen 41)
[13:47:56] 
[13:47:56] Assembly optimizations on if available.
[13:47:56] Entering M.D.
[13:56:15] Completed 2509 out of 249999 steps  (1%)
[14:04:35] Completed 5009 out of 249999 steps  (2%)
[14:12:54] Completed 7509 out of 249999 steps  (3%)
[14:20:50] Completed 10009 out of 249999 steps  (4%)
[14:28:45] Completed 12509 out of 249999 steps  (5%)
[14:36:41] Completed 15009 out of 249999 steps  (6%)
[14:44:36] Completed 17509 out of 249999 steps  (7%)
[14:52:31] Completed 20009 out of 249999 steps  (8%)
[15:00:28] Completed 22509 out of 249999 steps  (9%)
[15:08:23] Completed 25009 out of 249999 steps  (10%)
[15:16:18] Completed 27509 out of 249999 steps  (11%)
[15:24:14] Completed 30009 out of 249999 steps  (12%)
[15:32:09] Completed 32509 out of 249999 steps  (13%)
the size of the unitinfo is humongous 165M
-rw-r--r-- 1 kerekei kerekei 165M 2008-12-30 10:38 unitinfo.txt

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Tue Dec 30, 2008 7:40 pm
by uncle_fungus
Yes, I know. You'll need to build FahMon from svn at the moment as the current release versions will try and parse the entire unitinfo file (which is bad for obvious reasons). SVN now only loads the first 512 bytes (if possible) of the file which fixes this problem.

Re: FahMon (multi-platform app to monitor various F@h clients)

Posted: Sun Jan 11, 2009 10:46 pm
by Frodo The Hobbit
The SVN build now solves all my problems. Thx for your work uncle_fungus ;)