Xilikon wrote:I like how the file size match the % in unitinfo.txt. This mean that there is no cap check in the client (if the % is over 100, something is wrong), thus having non-stop pipe char repeated thru the file.
There is a | character for each % reported to be completed in unitinfo.txt, I counted them when first analyzing this bug when it caused FCI to slow to a crawl (it stored the progress bar in an XML file, parsing this XML file with one or more progress bars of several MB is not recommended

).
I've only seen this happen with projects for the a2 core. I have a copy of the folding directory with such a case saved for testing.
Dick Howells wuinfo (utility that parses the work/wuinfo_??.dat files) for this case shows:
Code: Select all
index 6:
Core: Core_a2
Name: Gromacs
Progress: 1718012% (4295029 of 250 steps)
This shows that either the completed number of steps or the total number of steps is not correct.
The corresponding work/logfile_06.txt:
Code: Select all
*------------------------------*
Folding@Home Gromacs SMP Core
Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
Preparing to commence simulation
- Ensuring status. Please wait.
- Assembly optimizations manually forced on.
- Not checking prior termination.
- Expanded 4838305 -> 24033653 (decompressed 496.7 percent)
Called DecompressByteArray: compressed_data_size=4838305 data_size=24033653, decompressed_data_size=24033653 diff=0
- Digital signature verified
Project: 2670 (Run 5, Clone 11, Gen 17)
Assembly optimizations on if available.
Entering M.D.
Completed 2510 out of 250001 steps (1%)
Completed 5010 out of 250001 steps (2%)
[....]
Completed 60010 out of 250001 steps (24%)
Completed 62510 out of 250001 steps (25%)
This progress looks sane, and allows qd to show the correct progress (it doesn't use wuinfo_??.dat if logfile_??.txt provides progress data).
Code: Select all
qd released 7 September 2008 (fr 071); qd info 10 November 2008 (update-qd.pl)
qd executed Wed Nov 12 22:20:35 CET 2008 (Wed Nov 12 21:20:35 UTC 2008)
Queue version 5.01
Current index: 6
[...]
Index 6: folding now 1920.00 pts (104.433 pt/hr) 3.92 X min speed; 25% complete
server: 171.67.108.24:8080; project: 2670
Folding: run 5, clone 11, generation 17; benchmark 0; misc: 500, 200
Project: 2670 (Run 5, Clone 11, Gen 17)
issue: Tue Oct 28 12:20:38 2008; begin: Tue Oct 28 12:21:00 2008
expect: Wed Oct 29 06:44:05 2008; due: Fri Oct 31 12:21:00 2008 (3 days)
preferred: Fri Oct 31 12:21:00 2008 (3 days)
core URL: http://www.stanford.edu/~pande/Linux/x86/Core_a2.fah (V2.01)
CPU: 1,0 x86; OS: 4,0 Linux
smp cores: 4; cores to use: 4
tag: P2670R5C11G17
flops: 1061161541 (1061.161541 megaflops)
assignment info (le): Tue Oct 28 12:20:37 2008; BBC399E2
CS: 171.67.108.25; P limit: 524286976
user: [DPC]_Fatal_Error_Group0smoking2000; team: 92; ID: 9E3B81209D0E757D; mach ID: 1
work/wudata_06.dat file size: 4838817; WU type: Folding@Home
Results successfully sent: Fri Jun 6 16:28:16 2008
Average download rate 377.409 KB/s (u=4); upload rate 65.087 KB/s (u=4)
Performance fraction 0.750157 (u=4)
Average pph: 75.427, ppd: 1810.25, ppw: 12671.7, ppy: 661176
The work/wudata_06.log file on the otherhand, shows it using the following steps:
Code: Select all
$ grep step work/wudata_06.log
nsteps = 250001
init_step = 4250000
em_stepsize = 0.01
fc_stepsize = 0
will use an extra communication step for exclusion forces for Reaction-Field
Charge group distribution at step 4250000: 13534 19290 16635 15187
DD step 4250009 vol min/aver 1.000 load imb.: force 23.9%
DD step 4250999 vol min/aver 0.754 load imb.: force 9.0%
Writing checkpoint, step 4251370 at Tue Oct 28 12:26:20 2008
DD step 4251999 vol min/aver 0.775 load imb.: force 5.2%
Writing checkpoint, step 4252470 at Tue Oct 28 12:31:22 2008
[...]
DD step 4308999 vol min/aver 0.719 load imb.: force 11.4%
Writing checkpoint, step 4309560 at Tue Oct 28 16:46:20 2008
DD step 4309999 vol min/aver 0.773 load imb.: force 3.0%
Writing checkpoint, step 4310950 at Tue Oct 28 16:51:20 2008
DD step 4310999 vol min/aver 0.780 load imb.: force 3.7%
DD step 4311999 vol min/aver 0.773 load imb.: force 3.5%
Writing checkpoint, step 4312370 at Tue Oct 28 16:56:19 2008
Interestingly the closest mention of the step reported as the completed step in the wuinfo_06.dat is the following, not near the end of the file:
Code: Select all
DD step 4294999 vol min/aver 0.799 load imb.: force 5.0%
Step Time Lambda
4295000 8590.00000 0.00000
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ-14 Coulomb-14
1.76815e+04 4.78336e+04 4.69625e+04 2.17307e+04 3.00686e+05
LJ (SR) Disper. corr. Coulomb (SR) RF excl. Potential
1.96170e+05 -1.70938e+04 -2.29351e+06 -2.00248e+05 -1.87979e+06
Kinetic En. Total Energy Temperature Pressure (bar) Cons. rmsd ()
3.98393e+05 -1.48140e+06 3.12924e+02 -2.40168e+02 3.67236e-06
Writing checkpoint, step 4295270 at Tue Oct 28 15:51:21 2008
I can send a copy of this folding directory for analysis if that would be appreciated.