Page 1 of 2

Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 7:41 pm
by jcoffland
I recently created a new core numbered b4 which runs the ProtoMol MD code. This is still in beta but you will see it in the wild soon.

This core was written from a whole new code base so we immediately had incompatibilities with third-party monitoring applications. The upcoming Desmond core also uses this code base. This sparked a discussion about making an official interface for accessing information about running cores.

I am starting this thread to open the discussion. Here are the main topics:

1) What is needed for a new core to support existing third-party software.
2) What would you like the core/client interface to look like?
3) What third-party software is important to you?

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 9:26 pm
by parkut
Various monitoring applications have been written to satisfy the basic need tell at a glance if things are "normal", offline, or otherwise not progressing as expected.

FahMon looks for the details in unitinfo.txt (Protein name, deadline, percentage complete), and parses the log file looking for progress in the Percent completion lines to guestimate frame time, and how much wall clock time till complete.

My own scripts also utilize QD's output of the dat file for protein name, points per hour, deadline, etc.

So, QD and FahMon are important to me.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 9:31 pm
by 7im

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 9:38 pm
by jcoffland
QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.

As far as FahMon goes, I'm will update the cores so that they output 'unitinfo.txt' and the correctly log lines.

As far as I know the log lines needed for FahMon look like this:

Code: Select all

[12:06:50] Project: 2662 (Run 1, Clone 209, Gen 47)
[13:06:36] Completed 35000 out of 250000 steps  (14%)

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 9:40 pm
by jcoffland
smartcat99s wrote:The key lines for log parsing with existing applications are the PRCG line and the step completion lines which can be seen below:

Code: Select all

[12:06:50] Project: 2662 (Run 1, Clone 209, Gen 47)
[13:06:36] Completed 35000 out of 250000 steps  (14%)
In addition, the unitinfo.txt file is used to track overall progress. (note that there's a progress bar on the progress line that I trim in code due to an old bug)

Code: Select all

Current Work Unit
-----------------
Name: Gromacs
Tag: P2669R0C34G165
Download time: October 14 04:24:45
Due time: October 17 04:24:45
Progress: 0%
One standard place to get all the information would be greatly appreciated, especially if features such as push notifications (i.e. signal on frame completion) were avaliable. Maybe JSON (or other similar-style format) is an option for status information in the future. However that is a discussion not for this thread.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 9:42 pm
by jcoffland
jcoffland wrote:Do you need both the log lines and 'unitinfo.txt' or would one of these be sufficient?
smartcat99s wrote:I use both places for information about the current status, unit info for the current frame status, and the log for the frame time information. If the information was only in one place though, I'd rather have it in the log lines as that's what most systems expect.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 9:51 pm
by uncle fuzzy
I've been using FahMon since I started folding (2 1/2 years) , so I can tell at a glance if my clients are playing nice.

If you can make the new cores talk to FahMon, I'll be a happy camper...eh, folder.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Thu Oct 15, 2009 10:20 pm
by kiore
3) Fahmon or HFM to monitor my clients without needing to check logs individually as usually have 8 gpu's and 2 quad cores on the go at any given time. Anything else that does this would be fine for me.

2) At a glance: ppd, time to completion, time per frame, unit number etc.

Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.

kiore.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Fri Oct 16, 2009 12:22 am
by jcoffland
kiore wrote: Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.
The new client will allow you to get snapshots of the protein position information as often or as rarely as you like. The GUI part of the new client will render this in 3D.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Fri Oct 16, 2009 12:24 am
by bollix47
3) FahMon & qfix.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Fri Oct 16, 2009 1:05 am
by k1wi
3) Fahmon is my monitor of choice

Can I just say +1 for coming to third parties/the community for comment :)

Gives me warm fuzzies

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Fri Oct 16, 2009 5:42 am
by Pick2
I use "qd" and tail the FAHlog.txt , it works well over SSH :)
'unitinfo.txt' has had a few problems in the past. I still have a couple Linux boxes with small amounts of storage where I've changed permissions of 'unitinfo.txt' to "root" because the user account client would write half a Gigabyte to 'unitinfo.txt' and crash the system :) Hey , it works , don't knock it !

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Fri Oct 16, 2009 10:33 am
by kiore
jcoffland wrote:
kiore wrote: Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.
The new client will allow you to get snapshots of the protein position information as often or as rarely as you like. The GUI part of the new client will render this in 3D.
:D :eugeek: :D
Bring it on.
kiore.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Fri Oct 16, 2009 7:15 pm
by harlam357
Hi Joe,

harlam357... developer of HFM.NET. Which was at least mentioned once in this thread... Thanks kiore! :)

http://code.google.com/p/hfm-net/

viewtopic.php?f=14&t=9903

I'm the new kid (monitoring tool) on the block (released first beta ~4 months ago), but suffice it to say HFM.NET works on much the same principles as does FahMon. So much of what people are saying is necessary to support FahMon is also necessary to support HFM.NET.

I also appreciate you coming to the community regarding these changes and hope with all the input you receive that the transition to a new format can go decently smoothly for us all. :)
jcoffland wrote:I am starting this thread to open the discussion. Here are the main topics:

1) What is needed for a new core to support existing third-party software.
2) What would you like the core/client interface to look like?
3) What third-party software is important to you?
jcoffland wrote:QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.

As far as FahMon goes, I'm will update the cores so that they output 'unitinfo.txt' and the correctly log lines.

As far as I know the log lines needed for FahMon look like this:

Code: Select all

[12:06:50] Project: 2662 (Run 1, Clone 209, Gen 47)
[13:06:36] Completed 35000 out of 250000 steps  (14%)
1) FAHlog.txt and unitinfo.txt. I've also just recently finished (though not released yet) the addition of queue.dat reading in HFM.NET. So it hits home a bit now that you've said the queue.dat will be going away (as I've spent quite a bit of time adding the code and unit tests) and I'm left with socket reads. :( I'd like to see some sort of queue file in the future clients, even if the structure is going to drastically change. After working through the qd.c code and rehashing everything in C#, the queue.dat could probably use a new start. As long as the structure is made public so we can read it... then I'm ok with that.

As far as the FAHlog.txt file is concerned, yes those are the primary lines of interest, but there are many others I use to key off of in HFM.NET.

The lines "] + Processing work unit", "] + Working ...", and "] *------------------------------*" are important to my parsing routines. These lines allow me to determine the bounds of where a WU log starts and subsequently stops... which allows me to isolate specific sections of log... and tie those sections to queue entries. Making the lines "] Working on Unit 0" and "] Working on queue slot 0" also very important.

Here's a not so brief list... some being more important to my application than others. This is just a section identifying the lines. For some lines, such as "- Received User ID =" HFM.NET also has some parsing code to pull the UserID from that log line. This is a bad example, but any such formats would also be important.

Code: Select all

         if (logLine.Contains("--- Opening Log file"))
         {
            return LogLineType.LogOpen;
         }
         else if (logLine.Contains("###############################################################################"))
         {
            return LogLineType.LogHeader;
         }
         else if (logLine.Contains("] - Autosending finished units..."))
         {
            return LogLineType.ClientAutosendStart;
         }
         else if (logLine.Contains("] - Autosend completed"))
         {
            return LogLineType.ClientAutosendComplete;
         }
         else if (logLine.Contains("] + Attempting to send results"))
         {
            return LogLineType.ClientSendStart;
         }
         else if (logLine.Contains("] + Could not connect to Work Server"))
         {
            return LogLineType.ClientSendConnectFailed;
         }
         else if (logLine.Contains("] - Error: Could not transmit unit"))
         {
            return LogLineType.ClientSendFailed;
         }
         else if (logLine.Contains("] + Results successfully sent"))
         {
            return LogLineType.ClientSendComplete;
         }
         else if (logLine.Contains("Arguments:"))
         {
            return LogLineType.ClientArguments;
         }
         else if (logLine.Contains("] - User name:"))
         {
            return LogLineType.ClientUserNameTeam;
         }
         else if (logLine.Contains("] + Requesting User ID from server"))
         {
            return LogLineType.ClientRequestingUserID;
         }
         else if (logLine.Contains("- Received User ID ="))
         {
            return LogLineType.ClientReceivedUserID;
         }
         else if (logLine.Contains("] - User ID:"))
         {
            return LogLineType.ClientUserID;
         }
         else if (logLine.Contains("] - Machine ID:"))
         {
            return LogLineType.ClientMachineID;
         }
         else if (logLine.Contains("] + Attempting to get work packet"))
         {
            return LogLineType.ClientAttemptGetWorkPacket;
         }
         else if (logLine.Contains("] - Will indicate memory of"))
         {
            return LogLineType.ClientIndicateMemory;
         }
         else if (logLine.Contains("] - Detect CPU. Vendor:"))
         {
            return LogLineType.ClientDetectCpu;
         }
         else if (logLine.Contains("] + Processing work unit"))
         {
            return LogLineType.WorkUnitProcessing;
         }
         else if (logLine.Contains("] + Downloading new core"))
         {
            return LogLineType.WorkUnitCoreDownload;
         }
         else if (logLine.Contains("] Working on Unit 0"))
         {
            return LogLineType.WorkUnitIndex;
         }
         else if (logLine.Contains("] Working on queue slot 0"))
         {
            return LogLineType.WorkUnitQueueIndex;
         }
         else if (logLine.Contains("] + Working ..."))
         {
            return LogLineType.WorkUnitWorking;
         }
         else if (logLine.Contains("] *------------------------------*"))
         {
            return LogLineType.WorkUnitStart;
         }
         else if (logLine.Contains("] Version"))
         {
            return LogLineType.WorkUnitCoreVersion;
         }
         else if (IsLineTypeWorkUnitStarted(logLine))
         {
            return LogLineType.WorkUnitRunning;
         }
         else if (logLine.Contains("] Project:"))
         {
            return LogLineType.WorkUnitProject;
         }
         else if (logLine.Contains("] + Paused"))
         {
            return LogLineType.WorkUnitPaused;
         }
         else if (logLine.Contains("] - Shutting down core"))
         {
            return LogLineType.WorkUnitShuttingDownCore;
         }
         else if (logLine.Contains("] Folding@home Core Shutdown:"))
         {
            return LogLineType.WorkUnitCoreShutdown;
         }
         else if (logLine.Contains("] + Number of Units Completed:"))
         {
            return LogLineType.ClientNumberOfUnitsCompleted;
         }
         else if (logLine.Contains("] This is a sign of more serious problems, shutting down."))
         {
            return LogLineType.ClientCoreCommunicationsErrorShutdown;
         }
         else if (logLine.Contains("] EUE limit exceeded. Pausing 24 hours."))
         {
            return LogLineType.ClientEuePauseState;
         }
         else if (logLine.Contains("Folding@Home Client Shutdown"))
         {
            return LogLineType.ClientShutdown;
         }
         else
         {
            return LogLineType.Unknown;
         }
2) I'm not exactly sure what you mean here... are you talking about the textual output of the console client and/or the look/feel of the GUI client?

If that's what you mean... I don't really have a preference. My concerning is being able to read the log and queue information correctly.

3) Since we know the queue.dat will be broken, or gone. (yikes!) qd and qfix are already broken... however, I do use those in addition to HFM.NET.

Re: Standardizing the Core/Client to Third-Party Interface

Posted: Sat Oct 17, 2009 4:57 pm
by smoking2000
jcoffland wrote:I recently created a new core numbered b4 which runs the ProtoMol MD code. This is still in beta but you will see it in the wild soon.

This core was written from a whole new code base so we immediately had incompatibilities with third-party monitoring applications. The upcoming Desmond core also uses this code base. This sparked a discussion about making an official interface for accessing information about running cores.
As the developer of FCI, current maintainer/developer of most of Dick Howells utilities (qd, qfix, wuinfo, xyz2pdb, and friends), and author of A plea to open the queue.dat, I applaud this initiative wholeheartedly!
jcoffland wrote:1) What is needed for a new core to support existing third-party software.
Dick Howells utilities use the information in queue.dat, work/logfile_<##>.txt, work/wuinfo_<##>.dat and work/current.xyz files.

FCI in turn depends on these utilities for its information to display, and additionally parses the FAHlog.txt and FAHlog-Prev.txt to detect issues, generate TPF graphs, etc.

For qd and qfix the queue.dat is the most important as it uses all information contained in this file.

The work/logfile_<##>.txt is secondary and used to retrieve additional information not (always) available in the queue.dat.

The work unit name is parsed from the lines like:

Code: Select all

Protein: p1234_my_pink_ribbon
The core version from the lines matching:

Code: Select all

Folding@Home Client Core Version %19s %c
And progress information from the various versions of the Completed x out of y steps lines.

GPU cores:

Code: Select all

Completed %d%%
Most Gromacs cores:

Code: Select all

Completed %d out of %d

Various cores:

Code: Select all

Iterations: %d of %d
Finished a frame (%d
Tinker and/or Genome core:

Code: Select all

- Frames Completed: %d, Remaining: %d
[SP%*c] Designing protein sequence  %d of %d
[SP%*c]  %d.0 %c
qd additionally uses the checkpoint lines to get a higher precision progress:

Code: Select all

Timered checkpoint triggered
Writing checkpoint files
[SPG]   %d positions in protei%c
[SP%*c] Writing current.pdb, chainlength = %u
As you can see, qd has needed to adapt to every new format used by a certain core to determine the progress. The near standardization on the Gromacs format was a relief, but the GPU client breaking from tradition using a new format was a bit of a disappointment. Settling on a standard format for progress lines would be most welcome.

The SMP cores are notable too, as they no longer log the Protein line in the logfile, nor store the name in wuinfo, whereby qd doesn't know the name of the current WU anymore. As a last resort qd uses the work/wuinfo_<##>.dat and work/current.xyz files to retrieve the work unit name if it was not logged in the work/logfile_<##>.txt (also progress from wuinfo_<##>.dat if there was no(ne in the) work/logfile_<##>.txt). The logfile is preferred for progress and other work unit info because the wuinfo_<##>.dat has historically proven to be unreliable.

The use of the current.xyz has been abandoned by most recent cores, whereby fpd and FCI lost the ability to visualize the current work unit. The GPU client appears to start a GUI server which I think can be used to get data to visualize the WU, but the protocol to interface with this server is not available to figure out how (I don't run windows so I can't run the GUI client and sniff the traffic from the GUI client to help reverse engineering).

Now that I mention reverse engineering. Not having to do so to get data from the FAH client and cores sounds great. The hint towards a socket interface looks very promising, but this is near useless without documentation. I had a lot of fun, but also did a lot of cursing, while figuring out what the new values written in the queue.dat by the v6 client where, and I'm still not sure if I got them all right.
jcoffland wrote:2) What would you like the core/client interface to look like?
I'm quite fond of the nvidia-settings utility on Linux which I can use to query various settings of the GPUs in the system. A query interface in the FAH client smiliar to `nvidia-settings -q <Property|all>` would be nice.

But a simple memcached-like protocol (`GET <key>`) over a socket is nice too. The FAH client could use this protocol to fetch the data from the core, which it makes available via the query interface.

While I don't mind diving into socket programming, I think it's a bit too complex for the various "simpler" monitoring tools out there. Those usually just parse the unitinfo.txt, and the more advanced use qd to (also) parse the queue.dat.

Making it easy to get the various bits of information is important I think, and making it easy to parse as well. The nvidia-settings utility displays a semi-human readable format of its query answers by default, but when using the -t parameter it returns it in easy to parse format. Human readable output similar to what's currently used in the FAHlog.txt and unitinfo.txt by default, and XML output when specifically requested would be an option.

This query interface in the FAH client would be -queueinfo on steroids :D
jcoffland wrote:3) What third-party software is important to you?
Those I wrote and/or maintain as mentioned above. :) Their affiliated programs (i.e. those using qd), like InCrease, F@H WUdget, Protein Think and finstall. And the programs using ports of qd in other languages, like FahMon & HFM.NET.
jcoffland wrote:QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.
While a socket interface is cool, I can handle new formats of the queue.dat too. Dick Howell adapted qd over time, supporting even ancient clients like 3.24, and I've picked up the work since v5.91. Having documentation of the format this time would be nice though :)

Also, whatever you change in the queue.dat, please keep some sort of track of historic WUs. I find it quite valuable being able to detect faulty clients by seeing multiple deleted WUs in queue, so only providing info of the current WU is not enough.