Standardizing the Core/Client to Third-Party Interface
Moderator: Site Moderators
-
- Site Admin
- Posts: 1018
- Joined: Fri Oct 10, 2008 6:42 pm
- Location: Helsinki, Finland
- Contact:
Standardizing the Core/Client to Third-Party Interface
I recently created a new core numbered b4 which runs the ProtoMol MD code. This is still in beta but you will see it in the wild soon.
This core was written from a whole new code base so we immediately had incompatibilities with third-party monitoring applications. The upcoming Desmond core also uses this code base. This sparked a discussion about making an official interface for accessing information about running cores.
I am starting this thread to open the discussion. Here are the main topics:
1) What is needed for a new core to support existing third-party software.
2) What would you like the core/client interface to look like?
3) What third-party software is important to you?
This core was written from a whole new code base so we immediately had incompatibilities with third-party monitoring applications. The upcoming Desmond core also uses this code base. This sparked a discussion about making an official interface for accessing information about running cores.
I am starting this thread to open the discussion. Here are the main topics:
1) What is needed for a new core to support existing third-party software.
2) What would you like the core/client interface to look like?
3) What third-party software is important to you?
Cauldron Development LLC
http://cauldrondevelopment.com/
http://cauldrondevelopment.com/
-
- Posts: 363
- Joined: Tue Feb 12, 2008 7:33 am
- Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
- Location: SE Michigan, USA
Re: Standardizing the Core/Client to Third-Party Interface
Various monitoring applications have been written to satisfy the basic need tell at a glance if things are "normal", offline, or otherwise not progressing as expected.
FahMon looks for the details in unitinfo.txt (Protein name, deadline, percentage complete), and parses the log file looking for progress in the Percent completion lines to guestimate frame time, and how much wall clock time till complete.
My own scripts also utilize QD's output of the dat file for protein name, points per hour, deadline, etc.
So, QD and FahMon are important to me.
FahMon looks for the details in unitinfo.txt (Protein name, deadline, percentage complete), and parses the log file looking for progress in the Percent completion lines to guestimate frame time, and how much wall clock time till complete.
My own scripts also utilize QD's output of the dat file for protein name, points per hour, deadline, etc.
So, QD and FahMon are important to me.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Standardizing the Core/Client to Third-Party Interface
Fahmon home: http://fahmon.net/
More about QD here... http://linuxminded.xs4all.nl/mirror/www ... h/fah.html
More about QD here... http://linuxminded.xs4all.nl/mirror/www ... h/fah.html
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Site Admin
- Posts: 1018
- Joined: Fri Oct 10, 2008 6:42 pm
- Location: Helsinki, Finland
- Contact:
Re: Standardizing the Core/Client to Third-Party Interface
QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.
As far as FahMon goes, I'm will update the cores so that they output 'unitinfo.txt' and the correctly log lines.
As far as I know the log lines needed for FahMon look like this:
As far as FahMon goes, I'm will update the cores so that they output 'unitinfo.txt' and the correctly log lines.
As far as I know the log lines needed for FahMon look like this:
Code: Select all
[12:06:50] Project: 2662 (Run 1, Clone 209, Gen 47)
[13:06:36] Completed 35000 out of 250000 steps (14%)
Cauldron Development LLC
http://cauldrondevelopment.com/
http://cauldrondevelopment.com/
-
- Site Admin
- Posts: 1018
- Joined: Fri Oct 10, 2008 6:42 pm
- Location: Helsinki, Finland
- Contact:
Re: Standardizing the Core/Client to Third-Party Interface
smartcat99s wrote:The key lines for log parsing with existing applications are the PRCG line and the step completion lines which can be seen below:
In addition, the unitinfo.txt file is used to track overall progress. (note that there's a progress bar on the progress line that I trim in code due to an old bug)Code: Select all
[12:06:50] Project: 2662 (Run 1, Clone 209, Gen 47) [13:06:36] Completed 35000 out of 250000 steps (14%)
One standard place to get all the information would be greatly appreciated, especially if features such as push notifications (i.e. signal on frame completion) were avaliable. Maybe JSON (or other similar-style format) is an option for status information in the future. However that is a discussion not for this thread.Code: Select all
Current Work Unit ----------------- Name: Gromacs Tag: P2669R0C34G165 Download time: October 14 04:24:45 Due time: October 17 04:24:45 Progress: 0%
Cauldron Development LLC
http://cauldrondevelopment.com/
http://cauldrondevelopment.com/
-
- Site Admin
- Posts: 1018
- Joined: Fri Oct 10, 2008 6:42 pm
- Location: Helsinki, Finland
- Contact:
Re: Standardizing the Core/Client to Third-Party Interface
jcoffland wrote:Do you need both the log lines and 'unitinfo.txt' or would one of these be sufficient?
smartcat99s wrote:I use both places for information about the current status, unit info for the current frame status, and the log for the frame time information. If the information was only in one place though, I'd rather have it in the log lines as that's what most systems expect.
Cauldron Development LLC
http://cauldrondevelopment.com/
http://cauldrondevelopment.com/
-
- Posts: 460
- Joined: Sun Dec 02, 2007 10:15 pm
- Location: Michigan
Re: Standardizing the Core/Client to Third-Party Interface
I've been using FahMon since I started folding (2 1/2 years) , so I can tell at a glance if my clients are playing nice.
If you can make the new cores talk to FahMon, I'll be a happy camper...eh, folder.
If you can make the new cores talk to FahMon, I'll be a happy camper...eh, folder.
Proud to crash my machines as a Beta Tester!
Re: Standardizing the Core/Client to Third-Party Interface
3) Fahmon or HFM to monitor my clients without needing to check logs individually as usually have 8 gpu's and 2 quad cores on the go at any given time. Anything else that does this would be fine for me.
2) At a glance: ppd, time to completion, time per frame, unit number etc.
Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.
kiore.
2) At a glance: ppd, time to completion, time per frame, unit number etc.
Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.
kiore.
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .
Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
-
- Site Admin
- Posts: 1018
- Joined: Fri Oct 10, 2008 6:42 pm
- Location: Helsinki, Finland
- Contact:
Re: Standardizing the Core/Client to Third-Party Interface
The new client will allow you to get snapshots of the protein position information as often or as rarely as you like. The GUI part of the new client will render this in 3D.kiore wrote: Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.
Cauldron Development LLC
http://cauldrondevelopment.com/
http://cauldrondevelopment.com/
Re: Standardizing the Core/Client to Third-Party Interface
3) FahMon & qfix.
Re: Standardizing the Core/Client to Third-Party Interface
3) Fahmon is my monitor of choice
Can I just say +1 for coming to third parties/the community for comment
Gives me warm fuzzies
Can I just say +1 for coming to third parties/the community for comment
Gives me warm fuzzies
-
- Posts: 85
- Joined: Fri Feb 13, 2009 12:38 pm
- Hardware configuration: Linux & CPUs
- Location: USA
Re: Standardizing the Core/Client to Third-Party Interface
I use "qd" and tail the FAHlog.txt , it works well over SSH
'unitinfo.txt' has had a few problems in the past. I still have a couple Linux boxes with small amounts of storage where I've changed permissions of 'unitinfo.txt' to "root" because the user account client would write half a Gigabyte to 'unitinfo.txt' and crash the system Hey , it works , don't knock it !
'unitinfo.txt' has had a few problems in the past. I still have a couple Linux boxes with small amounts of storage where I've changed permissions of 'unitinfo.txt' to "root" because the user account client would write half a Gigabyte to 'unitinfo.txt' and crash the system Hey , it works , don't knock it !
Re: Standardizing the Core/Client to Third-Party Interface
jcoffland wrote:The new client will allow you to get snapshots of the protein position information as often or as rarely as you like. The GUI part of the new client will render this in 3D.kiore wrote: Addit some sort of viewer to show the protein would be nice, not a screen saver but even just a snapshot/mockup, frequently have people interested in what I'm actually doing with my crazy looking rigs and showing just stats is a bit boring. Nothing that slows down the work though. The PS3 one is brilliant but anything would be useful.
Bring it on.
kiore.
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .
Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
Re: Standardizing the Core/Client to Third-Party Interface
Hi Joe,
harlam357... developer of HFM.NET. Which was at least mentioned once in this thread... Thanks kiore!
http://code.google.com/p/hfm-net/
viewtopic.php?f=14&t=9903
I'm the new kid (monitoring tool) on the block (released first beta ~4 months ago), but suffice it to say HFM.NET works on much the same principles as does FahMon. So much of what people are saying is necessary to support FahMon is also necessary to support HFM.NET.
I also appreciate you coming to the community regarding these changes and hope with all the input you receive that the transition to a new format can go decently smoothly for us all.
As far as the FAHlog.txt file is concerned, yes those are the primary lines of interest, but there are many others I use to key off of in HFM.NET.
The lines "] + Processing work unit", "] + Working ...", and "] *------------------------------*" are important to my parsing routines. These lines allow me to determine the bounds of where a WU log starts and subsequently stops... which allows me to isolate specific sections of log... and tie those sections to queue entries. Making the lines "] Working on Unit 0" and "] Working on queue slot 0" also very important.
Here's a not so brief list... some being more important to my application than others. This is just a section identifying the lines. For some lines, such as "- Received User ID =" HFM.NET also has some parsing code to pull the UserID from that log line. This is a bad example, but any such formats would also be important.
2) I'm not exactly sure what you mean here... are you talking about the textual output of the console client and/or the look/feel of the GUI client?
If that's what you mean... I don't really have a preference. My concerning is being able to read the log and queue information correctly.
3) Since we know the queue.dat will be broken, or gone. (yikes!) qd and qfix are already broken... however, I do use those in addition to HFM.NET.
harlam357... developer of HFM.NET. Which was at least mentioned once in this thread... Thanks kiore!
http://code.google.com/p/hfm-net/
viewtopic.php?f=14&t=9903
I'm the new kid (monitoring tool) on the block (released first beta ~4 months ago), but suffice it to say HFM.NET works on much the same principles as does FahMon. So much of what people are saying is necessary to support FahMon is also necessary to support HFM.NET.
I also appreciate you coming to the community regarding these changes and hope with all the input you receive that the transition to a new format can go decently smoothly for us all.
jcoffland wrote:I am starting this thread to open the discussion. Here are the main topics:
1) What is needed for a new core to support existing third-party software.
2) What would you like the core/client interface to look like?
3) What third-party software is important to you?
1) FAHlog.txt and unitinfo.txt. I've also just recently finished (though not released yet) the addition of queue.dat reading in HFM.NET. So it hits home a bit now that you've said the queue.dat will be going away (as I've spent quite a bit of time adding the code and unit tests) and I'm left with socket reads. I'd like to see some sort of queue file in the future clients, even if the structure is going to drastically change. After working through the qd.c code and rehashing everything in C#, the queue.dat could probably use a new start. As long as the structure is made public so we can read it... then I'm ok with that.jcoffland wrote:QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.
As far as FahMon goes, I'm will update the cores so that they output 'unitinfo.txt' and the correctly log lines.
As far as I know the log lines needed for FahMon look like this:
Code: Select all
[12:06:50] Project: 2662 (Run 1, Clone 209, Gen 47) [13:06:36] Completed 35000 out of 250000 steps (14%)
As far as the FAHlog.txt file is concerned, yes those are the primary lines of interest, but there are many others I use to key off of in HFM.NET.
The lines "] + Processing work unit", "] + Working ...", and "] *------------------------------*" are important to my parsing routines. These lines allow me to determine the bounds of where a WU log starts and subsequently stops... which allows me to isolate specific sections of log... and tie those sections to queue entries. Making the lines "] Working on Unit 0" and "] Working on queue slot 0" also very important.
Here's a not so brief list... some being more important to my application than others. This is just a section identifying the lines. For some lines, such as "- Received User ID =" HFM.NET also has some parsing code to pull the UserID from that log line. This is a bad example, but any such formats would also be important.
Code: Select all
if (logLine.Contains("--- Opening Log file"))
{
return LogLineType.LogOpen;
}
else if (logLine.Contains("###############################################################################"))
{
return LogLineType.LogHeader;
}
else if (logLine.Contains("] - Autosending finished units..."))
{
return LogLineType.ClientAutosendStart;
}
else if (logLine.Contains("] - Autosend completed"))
{
return LogLineType.ClientAutosendComplete;
}
else if (logLine.Contains("] + Attempting to send results"))
{
return LogLineType.ClientSendStart;
}
else if (logLine.Contains("] + Could not connect to Work Server"))
{
return LogLineType.ClientSendConnectFailed;
}
else if (logLine.Contains("] - Error: Could not transmit unit"))
{
return LogLineType.ClientSendFailed;
}
else if (logLine.Contains("] + Results successfully sent"))
{
return LogLineType.ClientSendComplete;
}
else if (logLine.Contains("Arguments:"))
{
return LogLineType.ClientArguments;
}
else if (logLine.Contains("] - User name:"))
{
return LogLineType.ClientUserNameTeam;
}
else if (logLine.Contains("] + Requesting User ID from server"))
{
return LogLineType.ClientRequestingUserID;
}
else if (logLine.Contains("- Received User ID ="))
{
return LogLineType.ClientReceivedUserID;
}
else if (logLine.Contains("] - User ID:"))
{
return LogLineType.ClientUserID;
}
else if (logLine.Contains("] - Machine ID:"))
{
return LogLineType.ClientMachineID;
}
else if (logLine.Contains("] + Attempting to get work packet"))
{
return LogLineType.ClientAttemptGetWorkPacket;
}
else if (logLine.Contains("] - Will indicate memory of"))
{
return LogLineType.ClientIndicateMemory;
}
else if (logLine.Contains("] - Detect CPU. Vendor:"))
{
return LogLineType.ClientDetectCpu;
}
else if (logLine.Contains("] + Processing work unit"))
{
return LogLineType.WorkUnitProcessing;
}
else if (logLine.Contains("] + Downloading new core"))
{
return LogLineType.WorkUnitCoreDownload;
}
else if (logLine.Contains("] Working on Unit 0"))
{
return LogLineType.WorkUnitIndex;
}
else if (logLine.Contains("] Working on queue slot 0"))
{
return LogLineType.WorkUnitQueueIndex;
}
else if (logLine.Contains("] + Working ..."))
{
return LogLineType.WorkUnitWorking;
}
else if (logLine.Contains("] *------------------------------*"))
{
return LogLineType.WorkUnitStart;
}
else if (logLine.Contains("] Version"))
{
return LogLineType.WorkUnitCoreVersion;
}
else if (IsLineTypeWorkUnitStarted(logLine))
{
return LogLineType.WorkUnitRunning;
}
else if (logLine.Contains("] Project:"))
{
return LogLineType.WorkUnitProject;
}
else if (logLine.Contains("] + Paused"))
{
return LogLineType.WorkUnitPaused;
}
else if (logLine.Contains("] - Shutting down core"))
{
return LogLineType.WorkUnitShuttingDownCore;
}
else if (logLine.Contains("] Folding@home Core Shutdown:"))
{
return LogLineType.WorkUnitCoreShutdown;
}
else if (logLine.Contains("] + Number of Units Completed:"))
{
return LogLineType.ClientNumberOfUnitsCompleted;
}
else if (logLine.Contains("] This is a sign of more serious problems, shutting down."))
{
return LogLineType.ClientCoreCommunicationsErrorShutdown;
}
else if (logLine.Contains("] EUE limit exceeded. Pausing 24 hours."))
{
return LogLineType.ClientEuePauseState;
}
else if (logLine.Contains("Folding@Home Client Shutdown"))
{
return LogLineType.ClientShutdown;
}
else
{
return LogLineType.Unknown;
}
If that's what you mean... I don't really have a preference. My concerning is being able to read the log and queue information correctly.
3) Since we know the queue.dat will be broken, or gone. (yikes!) qd and qfix are already broken... however, I do use those in addition to HFM.NET.
-
- Posts: 471
- Joined: Mon Dec 03, 2007 6:20 am
- Location: Amsterdam
- Contact:
Re: Standardizing the Core/Client to Third-Party Interface
As the developer of FCI, current maintainer/developer of most of Dick Howells utilities (qd, qfix, wuinfo, xyz2pdb, and friends), and author of A plea to open the queue.dat, I applaud this initiative wholeheartedly!jcoffland wrote:I recently created a new core numbered b4 which runs the ProtoMol MD code. This is still in beta but you will see it in the wild soon.
This core was written from a whole new code base so we immediately had incompatibilities with third-party monitoring applications. The upcoming Desmond core also uses this code base. This sparked a discussion about making an official interface for accessing information about running cores.
Dick Howells utilities use the information in queue.dat, work/logfile_<##>.txt, work/wuinfo_<##>.dat and work/current.xyz files.jcoffland wrote:1) What is needed for a new core to support existing third-party software.
FCI in turn depends on these utilities for its information to display, and additionally parses the FAHlog.txt and FAHlog-Prev.txt to detect issues, generate TPF graphs, etc.
For qd and qfix the queue.dat is the most important as it uses all information contained in this file.
The work/logfile_<##>.txt is secondary and used to retrieve additional information not (always) available in the queue.dat.
The work unit name is parsed from the lines like:
Code: Select all
Protein: p1234_my_pink_ribbon
Code: Select all
Folding@Home Client Core Version %19s %c
GPU cores:
Code: Select all
Completed %d%%
Code: Select all
Completed %d out of %d
Various cores:
Code: Select all
Iterations: %d of %d
Finished a frame (%d
Code: Select all
- Frames Completed: %d, Remaining: %d
[SP%*c] Designing protein sequence %d of %d
[SP%*c] %d.0 %c
Code: Select all
Timered checkpoint triggered
Writing checkpoint files
[SPG] %d positions in protei%c
[SP%*c] Writing current.pdb, chainlength = %u
The SMP cores are notable too, as they no longer log the Protein line in the logfile, nor store the name in wuinfo, whereby qd doesn't know the name of the current WU anymore. As a last resort qd uses the work/wuinfo_<##>.dat and work/current.xyz files to retrieve the work unit name if it was not logged in the work/logfile_<##>.txt (also progress from wuinfo_<##>.dat if there was no(ne in the) work/logfile_<##>.txt). The logfile is preferred for progress and other work unit info because the wuinfo_<##>.dat has historically proven to be unreliable.
The use of the current.xyz has been abandoned by most recent cores, whereby fpd and FCI lost the ability to visualize the current work unit. The GPU client appears to start a GUI server which I think can be used to get data to visualize the WU, but the protocol to interface with this server is not available to figure out how (I don't run windows so I can't run the GUI client and sniff the traffic from the GUI client to help reverse engineering).
Now that I mention reverse engineering. Not having to do so to get data from the FAH client and cores sounds great. The hint towards a socket interface looks very promising, but this is near useless without documentation. I had a lot of fun, but also did a lot of cursing, while figuring out what the new values written in the queue.dat by the v6 client where, and I'm still not sure if I got them all right.
I'm quite fond of the nvidia-settings utility on Linux which I can use to query various settings of the GPUs in the system. A query interface in the FAH client smiliar to `nvidia-settings -q <Property|all>` would be nice.jcoffland wrote:2) What would you like the core/client interface to look like?
But a simple memcached-like protocol (`GET <key>`) over a socket is nice too. The FAH client could use this protocol to fetch the data from the core, which it makes available via the query interface.
While I don't mind diving into socket programming, I think it's a bit too complex for the various "simpler" monitoring tools out there. Those usually just parse the unitinfo.txt, and the more advanced use qd to (also) parse the queue.dat.
Making it easy to get the various bits of information is important I think, and making it easy to parse as well. The nvidia-settings utility displays a semi-human readable format of its query answers by default, but when using the -t parameter it returns it in easy to parse format. Human readable output similar to what's currently used in the FAHlog.txt and unitinfo.txt by default, and XML output when specifically requested would be an option.
This query interface in the FAH client would be -queueinfo on steroids
Those I wrote and/or maintain as mentioned above. Their affiliated programs (i.e. those using qd), like InCrease, F@H WUdget, Protein Think and finstall. And the programs using ports of qd in other languages, like FahMon & HFM.NET.jcoffland wrote:3) What third-party software is important to you?
While a socket interface is cool, I can handle new formats of the queue.dat too. Dick Howell adapted qd over time, supporting even ancient clients like 3.24, and I've picked up the work since v5.91. Having documentation of the format this time would be nice thoughjcoffland wrote:QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.
Also, whatever you change in the queue.dat, please keep some sort of track of historic WUs. I find it quite valuable being able to detect faulty clients by seeing multiple deleted WUs in queue, so only providing info of the current WU is not enough.