brityank wrote:
I hit 76% on my last one, so you're not safe yet.
toTOW -- I also checked my logs -- I've done 3 and believe all three failed; two are listed above, I don't have the RCG for my first one.
Yeah, of course, it crapped out at 20% a couple of hours after I replied with an ener[13] error like the rest.
Then I got a 2484 (Run 176, Clone 27, Gen 0) that errored out with an 0x0 right off the bat. It's currently working on another one of the same 2484s but this one is still running...for now (it's at 5%.) What is with these new Gromacs cores and all of these errors? It almost is acting like I have bad RAM or a highly-overclocked CPU that
s as stable as a rhinoceros on PCP, but my machine is at 100% stock clocks and passes Memtest86+ and Mprime for as long as you wish to sit there and run them.
Hi there,
Sorry I just saw this post and thank you for all the reports. I will bring the project back to beta till I see what's going on with the WUs.
I do have 2488 total WUs returned from this project in very little time, so I thought it would be fine to release it to fah because it was doing well. But given all the EUE I think it has to come back to testing.
I am going to look those WUs up and see whether its just them, or there are more. I took note of all of these and I am running them on my machines.
Thanks for the patience.
[10:35:06] Completed 172500 out of 250000 steps (69)
[10:45:55] Quit 101 - Fatal error:
[10:45:55] Step 173183, time 346.366 (ps) LINCS WARNING
[10:45:55] relative constraint deviation after LINCS:
[10:45:55] max 22244861809039081000000000.000000 (between atoms 18554 and 18557) rms 170078998374000580000000.000000
[10:45:55]
[10:45:55] Simulation instability has been encountered. The run has entered a
[10:45:55] state from which no further progress can be made.
Let me know if you need the full log, as I see there are many similar endings for this Run.
Hope they get this sim corrected, great PPD return! Thanks.
Hi Brityank, could you please send me in a PM which WU you had EUE here? I am making a list and will stop this WUs.
Thanks
Paula
I've gotten a lot of LINCS WARNING messages from 2492 and 2496 as well. I'll also send you a PM with a few R/C/G numbers in case this is outside the expected realm of EUEs.
Maybe posting them is better so that people can say whether they had the same problem. I am running a couple WU myself and also found some of the reported WUS did end well:
PROJ 2492: Run 3, Clone 14, Gen 0 COMPLETED
PROJ2492 (Run 68, Clone 9, Gen 0 COMPLETED
The project itself does not seem to have problems since many of the other CLONES of the EUE WUs came back successfully. The LINCS algorithm for protein constraints could be a little problematic some times. For those interested in how it works, it establishes constraints (bonds) between pair of atoms that we would want to keep together. Basically it moves the atoms, ina way compatible with their bond. This algorithm sometimes can become unstable when the atoms are pulled apart by other forces. I am happy so many other WUs have been returned, so that I think after the ones that don't work are sorted out it could be ok. So far the list runs to ~20 WUs that have EUE.
Ok, than I will add some more...
2492/50/31/0 EUE at 61%
2492/48/5/0 EUE at 66%
and
2492/4/3/0 Completed!
These I fold so far on my office PCs - all stock speed.
At the moment I'm folding 3 other 2492 unit. I will see on Monday when I'm back in the office to see if the finished correct.
Ok. I am convinced. The project is currently out from F@H. We will figure out whether it is so unstable before it comes back. Sometimes things don't work as predicted... that's why we try to simulate them ...
In any case, 2492 is out. Thanks a lot to all of you for the effort and for the reports.
I've got a 2492 WU (Run 41, Clone 16, Gen 0) chugging along on an old Athlon 2600 and it's taking longer between % check points. Will this EUE too in the near future? (No scanning or anything else was/is running)
[06:01:55] Completed 205000 out of 250000 steps (82)
[07:49:17] Writing local files
[07:49:18] Completed 207500 out of 250000 steps (83)
[09:45:09] Writing local files
[09:45:09] Completed 210000 out of 250000 steps (84)
[12:33:38] Writing local files
[12:33:39] Completed 212500 out of 250000 steps (85)
[15:58:48] Writing local files
[15:58:49] Completed 215000 out of 250000 steps (86)
Hi John,
This one should be one of the lasts 2492 out there. The project will hopefully come back troubleshooted and with a different number.
Not sure whether it will EUE or not since there have been many 2492 that did well. The results for 2492 that did survive are still useful. However in the next version of the Project I will use a different (and hopefully more stable) algorithm.
Thanks for folding!
paula
That particular WU is still going along, but it will not be the last as I've got another 2492 (Run 61, Clone 5, Gen 0) on a P4 2.8 that's only completed 42%, so that'll take a few more days yet to finish.
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540 (3.2GHz) HT;Windows XP Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3 Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
[23:02:58] + Closed connections
[23:02:58]
[23:02:58] + Processing work unit
[23:02:58] Core required: FahCore_78.exe
[23:02:58] Core found.
[23:02:58] Working on Unit 05 [October 29 23:02:58]
[23:02:58] + Working ...
[23:02:58] - Calling 'FahCore_78.exe -dir work/ -suffix 05 -checkpoint 15 -forceasm -verbose -lifeline 3092 -version 503'
[23:02:58]
[23:02:58] *------------------------------*
[23:02:58] Folding@Home Gromacs Core
[23:02:58] Version 1.90 (March 8, 2006)
[23:02:58]
[23:02:58] Preparing to commence simulation
[23:02:58] - Assembly optimizations manually forced on.
[23:02:58] - Not checking prior termination.
[23:03:01] + New frame time estimate; Working...
[23:03:03] - Expanded 2219149 -> 15100141 (decompressed 680.4 percent)
[23:03:03] - Starting from initial work packet
[23:03:03]
[23:03:03] Project: 2492 (Run 5, Clone 18, Gen 0)
[23:03:03]
[23:03:03] Assembly optimizations on if available.
[23:03:03] Entering M.D.
[23:03:06] + New frame time estimate; Working...
[23:03:11] + New frame time estimate; Working...
[23:03:11] Protein: system
[23:03:11]
[23:03:11] Writing local files
[23:03:15] Extra SSE boost OK.
[23:03:16] + New frame time estimate; Working...
[23:03:17] Writing local files
[23:03:17] Completed 0 out of 250000 steps (0)
[23:03:21] + New frame time estimate; Working...
[23:04:45] - Autosending finished units...
{snip}
[23:08:09] - Autosend completed
[23:08:09] + Working...
[23:18:17] Timered checkpoint triggered.
[23:33:18] Timered checkpoint triggered.
[23:48:20] Timered checkpoint triggered.
[00:03:21] Timered checkpoint triggered.
[00:04:43] Writing local files
[00:04:44] Completed 2500 out of 250000 steps (1)
[00:04:47] + Writing 'sec_per_frame = 5709.500000' to config
{snip - two days later: }
[00:09:32] Timered checkpoint triggered.
[00:11:33] Writing local files
[00:11:34] Completed 117500 out of 250000 steps (47)
[00:11:37] + Writing 'sec_per_frame = 1865.000000' to config
[00:11:37] + Working ...Timered checkpoint triggered.
[00:41:36] Timered checkpoint triggered.
[00:56:37] Timered checkpoint triggered.
[01:11:38] Timered checkpoint triggered.
[01:13:36] Writing local files
[01:13:37] Completed 120000 out of 250000 steps (48)
[01:13:37] + Writing 'sec_per_frame = 1240.000000' to config
[01:13:37] + Working ...Timered checkpoint triggered.
[01:43:38] Timered checkpoint triggered.
[01:58:39] Timered checkpoint triggered.
[02:04:17] Quit 101 - Fatal error:
[02:04:17] Step 122040, time 244.08 (ps) LINCS WARNING
[02:04:17] relative constraint deviation after LINCS:
[02:04:17] max 18184030063903638000.000000 (between atoms 6478 and 6479) rms 136547730578210820.000000
[02:04:17]
[02:04:17] Simulation instability has been encountered. The run has entered a
[02:04:17] state from which no further progress can be made.
[02:04:17] This may be the correct result of the simulation, however if you
[02:04:17] often see other project units terminating early like this
[02:04:17] too, you may wish to check the stability of your computer (issues
[02:04:17] such as high temperature, overclocking, etc.).
[02:04:17] Going to send back what have done.
[02:04:17] logfile size: 21982
[02:04:17] - Writing 22701 bytes of core data to disk...
[02:04:17] ... Done.
[02:04:17]
[02:04:17] Folding@home Core Shutdown: EARLY_UNIT_END
[02:04:21] CoreStatus = 72 (114)
[02:04:21] Sending work to server
[02:04:21] + Attempting to send results
[02:04:21] - Reading file work/wuresults_05.dat from core
[02:04:21] (Read 22701 bytes from disk)
[02:04:21] Connecting to http://171.65.103.160:8080/
[02:04:22] Posted data.
[02:04:22] Initial: 0000; - Uploaded at ~23 kB/s
[02:04:22] - Averaged speed for that direction ~43 kB/s
[02:04:22] + Results successfully sent
[02:04:22] Thank you for your contribution to Folding@Home.
[06:12:56] Completed 107500 out of 250000 steps (43)
[06:22:59] Timered checkpoint triggered.
[06:34:00] Timered checkpoint triggered.
[06:44:01] Timered checkpoint triggered.
[06:54:02] Timered checkpoint triggered.
[07:04:03] Timered checkpoint triggered.
[07:05:30] Quit 101 - Fatal error:
[07:05:30] Step 108998, time 217.996 (ps) LINCS WARNING
[07:05:30] relative constraint deviation after LINCS:
[07:05:30] max 108841527710163040000000000000.000000 (between atoms 18518 and 18519) rms 820560170950860630000000000.000000
[07:05:30]
[07:05:30] Simulation instability has been encountered. The run has entered a
[07:05:30] state from which no further progress can be made.
[07:05:30] This may be the correct result of the simulation, however if you
[07:05:30] often see other project units terminating early like this
[07:05:30] too, you may wish to check the stability of your computer (issues
[07:05:30] such as high temperature, overclocking, etc.).
[07:05:30] Going to send back what have done.
[07:05:30] logfile size: 193963
[07:05:30] - Writing 194704 bytes of core data to disk...
[07:05:30] ... Done.
[07:05:30]
[07:05:30] Folding@home Core Shutdown: EARLY_UNIT_END
[07:05:34] CoreStatus = 72 (114)
[07:05:34] Sending work to server
Still chugging along on the other one ... now at 95% done
Nice! Well, You still got your points and the data is useful for sampling. But these project went on "renovation"... so many LINCS failures. Thanks betatesters for your help!