Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Moderators: Site Moderators, FAHC Science Team

Mizzou_Engineer
Posts: 13
Joined: Tue Dec 18, 2007 3:30 pm

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by Mizzou_Engineer »

brityank wrote:
I hit 76% on my last one, so you're not safe yet. :(

toTOW -- I also checked my logs -- I've done 3 and believe all three failed; two are listed above, I don't have the RCG for my first one.
Yeah, of course, it crapped out at 20% a couple of hours after I replied with an ener[13] error like the rest.

Then I got a 2484 (Run 176, Clone 27, Gen 0) that errored out with an 0x0 right off the bat. It's currently working on another one of the same 2484s but this one is still running...for now (it's at 5%.) What is with these new Gromacs cores and all of these errors? It almost is acting like I have bad RAM or a highly-overclocked CPU that
s as stable as a rhinoceros on PCP, but my machine is at 100% stock clocks and passes Memtest86+ and Mprime for as long as you wish to sit there and run them.
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by ppetrone »

Hi there,
Sorry I just saw this post and thank you for all the reports. I will bring the project back to beta till I see what's going on with the WUs.
I do have 2488 total WUs returned from this project in very little time, so I thought it would be fine to release it to fah because it was doing well. But given all the EUE I think it has to come back to testing.
I am going to look those WUs up and see whether its just them, or there are more. I took note of all of these and I am running them on my machines.
Thanks for the patience.

Pau
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2492 (Run 32, Clone 30, Gen 0) EUE

Post by ppetrone »

brityank wrote:Just got my first EUE on this box since I've had it live, and had to check if the :twisted: Gremlins :twisted: had invaded for the winter. :wink:

Code: Select all

[10:35:06] Completed 172500 out of 250000 steps  (69)
[10:45:55] Quit 101 - Fatal error: 
[10:45:55] Step 173183, time 346.366 (ps)  LINCS WARNING
[10:45:55] relative constraint deviation after LINCS:
[10:45:55] max 22244861809039081000000000.000000 (between atoms 18554 and 18557) rms 170078998374000580000000.000000
[10:45:55] 
[10:45:55] Simulation instability has been encountered. The run has entered a
[10:45:55]   state from which no further progress can be made.
Let me know if you need the full log, as I see there are many similar endings for this Run.
Hope they get this sim corrected, great PPD return! Thanks.
Hi Brityank, could you please send me in a PM which WU you had EUE here? I am making a list and will stop this WUs.
Thanks
Paula
Foxery
Posts: 118
Joined: Mon Mar 03, 2008 3:11 am
Hardware configuration: Intel Core2 Quad Q9300 (Intel P35 chipset)
Radeon 3850, 512MB model (Catalyst 8.10)
Windows XP, SP2
Location: Syracuse, NY

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by Foxery »

I've gotten a lot of LINCS WARNING messages from 2492 and 2496 as well. I'll also send you a PM with a few R/C/G numbers in case this is outside the expected realm of EUEs.
Core2 Quad/Q9300, Radeon 3850/512MB (WinXP SP2)
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by ppetrone »

Maybe posting them is better so that people can say whether they had the same problem. I am running a couple WU myself and also found some of the reported WUS did end well:
PROJ 2492: Run 3, Clone 14, Gen 0 COMPLETED
PROJ2492 (Run 68, Clone 9, Gen 0 COMPLETED

The project itself does not seem to have problems since many of the other CLONES of the EUE WUs came back successfully. The LINCS algorithm for protein constraints could be a little problematic some times. For those interested in how it works, it establishes constraints (bonds) between pair of atoms that we would want to keep together. Basically it moves the atoms, ina way compatible with their bond. This algorithm sometimes can become unstable when the atoms are pulled apart by other forces. I am happy so many other WUs have been returned, so that I think after the ones that don't work are sorted out it could be ok. So far the list runs to ~20 WUs that have EUE.

Thanks,
Paula
Thorsten_Q.
Posts: 8
Joined: Wed Jan 02, 2008 9:05 pm
Hardware configuration: McOSX 10.6.4: MacBook, Mac mini, MacBook Pro
Ubuntu Linux 8.04:Core2Quad, Nvidia 9800gtx
Windows XP: Core2Quad, Nvidia 295gtx, Core2Duo (4x), Nvidia 240gt (2x), Nvidia 9600gt
Windows 7: Core i7 (3x), Nvidia 285gtx, , Nvidia 240gt, Nvidia 460 gtx (3x)
Location: old europe

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by Thorsten_Q. »

Ok, than I will add some more...
2492/50/31/0 EUE at 61%
2492/48/5/0 EUE at 66%
and
2492/4/3/0 Completed!
These I fold so far on my office PCs - all stock speed.
At the moment I'm folding 3 other 2492 unit. I will see on Monday when I'm back in the office to see if the finished correct.

Fold on
Thorsten_Q.
goodyca
Posts: 187
Joined: Sun Dec 02, 2007 12:36 pm

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by goodyca »

I also have had two of these projects. Both EUE'd

project 2492 r 26, c 11, g 0
[22:12:58] Completed 75000 out of 250000 steps (30%)
[22:57:22] Quit 101 - Fatal error:
[22:57:22] Step 77199, time 154.398 (ps) LINCS WARNING
[22:57:22] relative constraint deviation after LINCS:
[22:57:22] max 1.574977 (between atoms 18548 and 18549) rms 0.011617

project 2492 r 58, c 29, g 0
[23:05:12] Completed 182500 out of 250000 steps (73%)
[23:44:24] Quit 101 - Fatal error:
[23:44:24] Step 184500, time 369 (ps) LINCS WARNING
[23:44:24] relative constraint deviation after LINCS:
[23:44:24] max 1.247144 (between atoms 18524 and 18526) rms 0.009200
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by ppetrone »

Ok. I am convinced. The project is currently out from F@H. We will figure out whether it is so unstable before it comes back. Sometimes things don't work as predicted... that's why we try to simulate them ... :)
In any case, 2492 is out. Thanks a lot to all of you for the effort and for the reports.

Paula
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by John_Weatherman »

I've got a 2492 WU (Run 41, Clone 16, Gen 0) chugging along on an old Athlon 2600 and it's taking longer between % check points. Will this EUE too in the near future? (No scanning or anything else was/is running)

Code: Select all

[06:01:55] Completed 205000 out of 250000 steps  (82)

[07:49:17] Writing local files
[07:49:18] Completed 207500 out of 250000 steps  (83)

[09:45:09] Writing local files
[09:45:09] Completed 210000 out of 250000 steps  (84)

[12:33:38] Writing local files
[12:33:39] Completed 212500 out of 250000 steps  (85)

[15:58:48] Writing local files
[15:58:49] Completed 215000 out of 250000 steps  (86)
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by ppetrone »

Hi John,
This one should be one of the lasts 2492 out there. The project will hopefully come back troubleshooted and with a different number.
Not sure whether it will EUE or not since there have been many 2492 that did well. The results for 2492 that did survive are still useful. However in the next version of the Project I will use a different (and hopefully more stable) algorithm.
Thanks for folding!
paula
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by John_Weatherman »

That particular WU is still going along, but it will not be the last as I've got another 2492 (Run 61, Clone 5, Gen 0) on a P4 2.8 that's only completed 42%, so that'll take a few more days yet to finish.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by anko1 »

A little late checking my logs, but I had one also EUE on Nov. 1:

Project: 2492 (Run 5, Clone 18, Gen 0)

Code: Select all

[23:02:58] + Closed connections
[23:02:58] 
[23:02:58] + Processing work unit
[23:02:58] Core required: FahCore_78.exe
[23:02:58] Core found.
[23:02:58] Working on Unit 05 [October 29 23:02:58]
[23:02:58] + Working ...
[23:02:58] - Calling 'FahCore_78.exe -dir work/ -suffix 05 -checkpoint 15 -forceasm -verbose -lifeline 3092 -version 503'

[23:02:58] 
[23:02:58] *------------------------------*
[23:02:58] Folding@Home Gromacs Core
[23:02:58] Version 1.90 (March 8, 2006)
[23:02:58] 
[23:02:58] Preparing to commence simulation
[23:02:58] - Assembly optimizations manually forced on.
[23:02:58] - Not checking prior termination.
[23:03:01] + New frame time estimate; Working...
[23:03:03] - Expanded 2219149 -> 15100141 (decompressed 680.4 percent)
[23:03:03] - Starting from initial work packet
[23:03:03] 
[23:03:03] Project: 2492 (Run 5, Clone 18, Gen 0)
[23:03:03] 
[23:03:03] Assembly optimizations on if available.
[23:03:03] Entering M.D.
[23:03:06] + New frame time estimate; Working...
[23:03:11] + New frame time estimate; Working...
[23:03:11] Protein: system
[23:03:11] 
[23:03:11] Writing local files
[23:03:15] Extra SSE boost OK.
[23:03:16] + New frame time estimate; Working...
[23:03:17] Writing local files
[23:03:17] Completed 0 out of 250000 steps  (0)
[23:03:21] + New frame time estimate; Working...
[23:04:45] - Autosending finished units...
   {snip}
[23:08:09] - Autosend completed
[23:08:09] + Working...
[23:18:17] Timered checkpoint triggered.
[23:33:18] Timered checkpoint triggered.
[23:48:20] Timered checkpoint triggered.
[00:03:21] Timered checkpoint triggered.
[00:04:43] Writing local files
[00:04:44] Completed 2500 out of 250000 steps  (1)
[00:04:47] + Writing 'sec_per_frame = 5709.500000' to config

   {snip - two days later: }

[00:09:32] Timered checkpoint triggered.
[00:11:33] Writing local files
[00:11:34] Completed 117500 out of 250000 steps  (47)
[00:11:37] + Writing 'sec_per_frame = 1865.000000' to config
[00:11:37] + Working ...Timered checkpoint triggered.
[00:41:36] Timered checkpoint triggered.
[00:56:37] Timered checkpoint triggered.
[01:11:38] Timered checkpoint triggered.
[01:13:36] Writing local files
[01:13:37] Completed 120000 out of 250000 steps  (48)
[01:13:37] + Writing 'sec_per_frame = 1240.000000' to config
[01:13:37] + Working ...Timered checkpoint triggered.
[01:43:38] Timered checkpoint triggered.
[01:58:39] Timered checkpoint triggered.
[02:04:17] Quit 101 - Fatal error: 
[02:04:17] Step 122040, time 244.08 (ps)  LINCS WARNING
[02:04:17] relative constraint deviation after LINCS:
[02:04:17] max 18184030063903638000.000000 (between atoms 6478 and 6479) rms 136547730578210820.000000
[02:04:17] 
[02:04:17] Simulation instability has been encountered. The run has entered a
[02:04:17]   state from which no further progress can be made.
[02:04:17] This may be the correct result of the simulation, however if you
[02:04:17]   often see other project units terminating early like this
[02:04:17]   too, you may wish to check the stability of your computer (issues
[02:04:17]   such as high temperature, overclocking, etc.).
[02:04:17] Going to send back what have done.
[02:04:17] logfile size: 21982
[02:04:17] - Writing 22701 bytes of core data to disk...
[02:04:17]   ... Done.
[02:04:17] 
[02:04:17] Folding@home Core Shutdown: EARLY_UNIT_END
[02:04:21] CoreStatus = 72 (114)
[02:04:21] Sending work to server


[02:04:21] + Attempting to send results
[02:04:21] - Reading file work/wuresults_05.dat from core
[02:04:21]   (Read 22701 bytes from disk)
[02:04:21] Connecting to http://171.65.103.160:8080/
[02:04:22] Posted data.
[02:04:22] Initial: 0000; - Uploaded at ~23 kB/s
[02:04:22] - Averaged speed for that direction ~43 kB/s
[02:04:22] + Results successfully sent
[02:04:22] Thank you for your contribution to Folding@Home.
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by John_Weatherman »

Seems like I spoke too soon ... the project 2492 WU (Run 61, Clone 5, Gen 0) on the P4 2.8 EUE'd

Code: Select all

[06:12:56] Completed 107500 out of 250000 steps  (43)
[06:22:59] Timered checkpoint triggered.
[06:34:00] Timered checkpoint triggered.
[06:44:01] Timered checkpoint triggered.
[06:54:02] Timered checkpoint triggered.
[07:04:03] Timered checkpoint triggered.
[07:05:30] Quit 101 - Fatal error: 
[07:05:30] Step 108998, time 217.996 (ps)  LINCS WARNING
[07:05:30] relative constraint deviation after LINCS:
[07:05:30] max 108841527710163040000000000000.000000 (between atoms 18518 and 18519) rms 820560170950860630000000000.000000
[07:05:30] 
[07:05:30] Simulation instability has been encountered. The run has entered a
[07:05:30]   state from which no further progress can be made.
[07:05:30] This may be the correct result of the simulation, however if you
[07:05:30]   often see other project units terminating early like this
[07:05:30]   too, you may wish to check the stability of your computer (issues
[07:05:30]   such as high temperature, overclocking, etc.).
[07:05:30] Going to send back what have done.
[07:05:30] logfile size: 193963
[07:05:30] - Writing 194704 bytes of core data to disk...
[07:05:30]   ... Done.
[07:05:30] 
[07:05:30] Folding@home Core Shutdown: EARLY_UNIT_END
[07:05:34] CoreStatus = 72 (114)
[07:05:34] Sending work to server
Still chugging along on the other one ... now at 95% done
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by John_Weatherman »

1 completed ! :) Project: 2492 (Run 41, Clone 16, Gen 0) ...
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 2492 (Run 99, Clone 25, Gen 0) EUE

Post by ppetrone »

Nice! Well, You still got your points and the data is useful for sampling. But these project went on "renovation"... so many LINCS failures. Thanks betatesters for your help!

paula
Post Reply