jcoffland wrote:I recently created a new core numbered b4 which runs the ProtoMol MD code. This is still in beta but you will see it in the wild soon.
This core was written from a whole new code base so we immediately had incompatibilities with third-party monitoring applications. The upcoming Desmond core also uses this code base. This sparked a discussion about making an official interface for accessing information about running cores.
As the developer of
FCI, current maintainer/developer of most of
Dick Howells utilities (qd, qfix, wuinfo, xyz2pdb, and friends), and author of
A plea to open the queue.dat, I applaud this initiative wholeheartedly!
jcoffland wrote:1) What is needed for a new core to support existing third-party software.
Dick Howells utilities use the information in queue.dat, work/logfile_<##>.txt, work/wuinfo_<##>.dat and work/current.xyz files.
FCI in turn depends on these utilities for its information to display, and additionally parses the FAHlog.txt and FAHlog-Prev.txt to detect issues, generate TPF graphs, etc.
For qd and qfix the queue.dat is the most important as it uses all information contained in this file.
The work/logfile_<##>.txt is secondary and used to retrieve additional information not (always) available in the queue.dat.
The work unit name is parsed from the lines like:
The core version from the lines matching:
Code: Select all
Folding@Home Client Core Version %19s %c
And progress information from the various versions of the Completed x out of y steps lines.
GPU cores:
Most Gromacs cores:
Various cores:
Code: Select all
Iterations: %d of %d
Finished a frame (%d
Tinker and/or Genome core:
Code: Select all
- Frames Completed: %d, Remaining: %d
[SP%*c] Designing protein sequence %d of %d
[SP%*c] %d.0 %c
qd additionally uses the checkpoint lines to get a higher precision progress:
Code: Select all
Timered checkpoint triggered
Writing checkpoint files
[SPG] %d positions in protei%c
[SP%*c] Writing current.pdb, chainlength = %u
As you can see, qd has needed to adapt to every new format used by a certain core to determine the progress. The near standardization on the Gromacs format was a relief, but the GPU client breaking from tradition using a new format was a bit of a disappointment. Settling on a standard format for progress lines would be most welcome.
The SMP cores are notable too, as they no longer log the Protein line in the logfile, nor store the name in wuinfo, whereby qd doesn't know the name of the current WU anymore. As a last resort qd uses the work/wuinfo_<##>.dat and work/current.xyz files to retrieve the work unit name if it was not logged in the work/logfile_<##>.txt (also progress from wuinfo_<##>.dat if there was no(ne in the) work/logfile_<##>.txt). The logfile is preferred for progress and other work unit info because the wuinfo_<##>.dat has historically proven to be unreliable.
The use of the current.xyz has been abandoned by most recent cores, whereby fpd and FCI lost the ability to visualize the current work unit. The GPU client appears to start a GUI server which I think can be used to get data to visualize the WU, but the protocol to interface with this server is not available to figure out how (I don't run windows so I can't run the GUI client and sniff the traffic from the GUI client to help reverse engineering).
Now that I mention reverse engineering. Not having to do so to get data from the FAH client and cores sounds great. The
hint towards a socket interface looks very promising, but this is near useless without documentation. I had a lot of fun, but also did a lot of cursing, while figuring out what the new values written in the queue.dat by the v6 client where, and I'm still not sure if I got them all right.
jcoffland wrote:2) What would you like the core/client interface to look like?
I'm quite fond of the
nvidia-settings utility on Linux which I can use to query various settings of the GPUs in the system. A query interface in the FAH client smiliar to `nvidia-settings -q <Property|all>` would be nice.
But a simple memcached-like protocol (`GET <key>`) over a socket is nice too. The FAH client could use this protocol to fetch the data from the core, which it makes available via the query interface.
While I don't mind diving into socket programming, I think it's a bit too complex for the various "simpler" monitoring tools out there. Those usually just parse the unitinfo.txt, and the more advanced use qd to (also) parse the queue.dat.
Making it easy to get the various bits of information is important I think, and making it easy to parse as well. The nvidia-settings utility displays a semi-human readable format of its query answers by default, but when using the -t parameter it returns it in easy to parse format. Human readable output similar to what's currently used in the FAHlog.txt and unitinfo.txt by default, and XML output when specifically requested would be an option.
This query interface in the FAH client would be -queueinfo on steroids
jcoffland wrote:3) What third-party software is important to you?
Those I wrote and/or maintain as mentioned above.
Their affiliated programs (i.e. those using qd), like
InCrease,
F@H WUdget,
Protein Think and
finstall. And the programs using ports of qd in other languages, like
FahMon &
HFM.NET.
jcoffland wrote:QD wont have any issues with the new cores, but it will be affected by the new client. The queue format is going to completely change but the information will be made available through a socket interface.
While a socket interface is cool, I can handle new formats of the queue.dat too. Dick Howell adapted qd over time, supporting even ancient clients like 3.24, and I've picked up the work since v5.91. Having documentation of the format this time would be nice though
Also, whatever you change in the queue.dat, please keep some sort of track of historic WUs. I find it quite valuable being able to detect faulty clients by seeing multiple deleted WUs in queue, so only providing info of the current WU is not enough.