Normal thekraken Behavior? [Yes]

This forum contains information about 3rd party applications which may be of use to those who run the FAH client and one place where you might be able to get help when using one of those apps.

Moderator: Site Moderators

patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Normal thekraken Behavior? [Yes]

Post by patonb »

Just noticed when running bigadv, my client starts up, then after bout 30ish min into it, the core reboots and continues on as normal.

Code: Select all

 :42:03] Called DecompressByteArray: compressed_data_size=57245215 data_size=71846524, decompressed_data_size=71846524 diff=0
[12:42:04] - Digital signature verified
[12:42:04] 
[12:42:04] Project: 6903 (Run 4, Clone 4, Gen 50)
[12:42:04] 
[12:42:04] Assembly optimizations on if available.
[12:42:04] Entering M.D.
[12:42:13] Mapping NT from 24 to 24 
[12:42:44] Completed 0 out of 250000 steps  (0%)
[13:14:13] ng M.D.
[13:14:20] Using Gromacs checkpoints
[13:14:28] Mapping NT from 24 to 24 
[13:15:12] Resuming from checkpoint
[13:15:15] Verified work/wudata_07.log
[13:15:16] Verified work/wudata_07.trr
[13:15:16] Verified work/wudata_07.xtc
[13:15:16] Verified work/wudata_07.edr
[13:15:29] Completed 1360 out of 250000 steps  (0%)
[13:40:02] Completed 2500 out of 250000 steps  (1%)
[14:33:21] Completed 5000 out of 250000 steps  (2%)
[15:26:17] Completed 7500 out of 250000 steps  (3%)
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
MtM
Posts: 1579
Joined: Fri Jun 27, 2008 2:20 pm
Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot
Location: The Netherlands
Contact:

Re: Normal Behavior?

Post by MtM »

patonb wrote:Just noticed when running bigadv, my client starts up, then after bout 30ish min into it, the core reboots and continues on as normal.

Code: Select all

 :42:03] Called DecompressByteArray: compressed_data_size=57245215 data_size=71846524, decompressed_data_size=71846524 diff=0
[12:42:04] - Digital signature verified
[12:42:04] 
[12:42:04] Project: 6903 (Run 4, Clone 4, Gen 50)
[12:42:04] 
[12:42:04] Assembly optimizations on if available.
[12:42:04] Entering M.D.
[12:42:13] Mapping NT from 24 to 24 
[12:42:44] Completed 0 out of 250000 steps  (0%)
[13:14:13] ng M.D.
[13:14:20] Using Gromacs checkpoints
[13:14:28] Mapping NT from 24 to 24 
[13:15:12] Resuming from checkpoint
[13:15:15] Verified work/wudata_07.log
[13:15:16] Verified work/wudata_07.trr
[13:15:16] Verified work/wudata_07.xtc
[13:15:16] Verified work/wudata_07.edr
[13:15:29] Completed 1360 out of 250000 steps  (0%)
[13:40:02] Completed 2500 out of 250000 steps  (1%)
[14:33:21] Completed 5000 out of 250000 steps  (2%)
[15:26:17] Completed 7500 out of 250000 steps  (3%)
You got a to small core snippet there :) Show the last frame completed before the restart, the restart itself and the first frame after it. The snippet here shows only that it restarted at 1360 steps.
patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Re: Normal Behavior?

Post by patonb »

Thats the entire log.. Its a bigadv unit. It started chugging at 0%

The restart is right there where it reenters MD, and maps the cores again. Notice its there twice.. I know it reboots as my system shows the cpu drops to 0% the after a few minutes pegs back to near 100%, and spits out the check point stuff.
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Normal Behavior?

Post by ChelseaOilman »

Well, not quite an entire log. Can't tell if your running with the -verbosity 9 flag and we're seeing everything. Also can't tell if your running tear's Kraken. If your not I would. What I see looks normal though.

Here's the start of a 6903 WU on my 4P machine:
[09:27:52] Project: 6903 (Run 3, Clone 4, Gen 53)
[09:27:52]
[09:27:52] Assembly optimizations on if available.
[09:27:52] Entering M.D.
[09:28:00] Mapping NT from 48 to 48
[09:28:04] Completed 0 out of 250000 steps (0%)
[09:41:10] Completed 2500 out of 250000 steps (1%)
[09:46:06] int
[09:46:21] Verified work/wudata_07.log
[09:46:22] Verified work/wudata_07.trr
[09:46:22] Verified work/wudata_07.xtc
[09:46:22] Verified work/wudata_07.edr
[09:46:22] Completed 2900 out of 250000 steps (1%)
[09:56:26] Completed 5000 out of 250000 steps (2%)
patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Re: Normal Behavior?

Post by patonb »

Okay... Yha i didnt set verb 9 and definitatly I UNLEASHED THEKRAKEN!

Gotta love it, your three time as fast! damn 4p
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Normal Behavior?

Post by ChelseaOilman »

If you look in the terminal window you'll see more info than what prints out in the log. You can see what's happening during those pauses.
[10:38:29] Folding@Home Gromacs SMP Core
[10:38:29] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[10:38:29]
[10:38:29] Preparing to commence simulation
[10:38:29] - Assembly optimizations manually forced on.
[10:38:29] - Not checking prior termination.
[10:38:36] - Expanded 57239090 -> 71846524 (decompressed 50.4 percent)
[10:38:36] Called DecompressByteArray: compressed_data_size=57239090 data_size=71846524, decompressed_data_size=71846524 diff=0
[10:38:36] - Digital signature verified
[10:38:36]
[10:38:36] Project: 6903 (Run 4, Clone 19, Gen 38)
[10:38:36]
[10:38:36] Assembly optimizations on if available.
[10:38:36] Entering M.D.
:-) G R O M A C S (-:

Groningen Machine for Chemical Simulation

:-) VERSION 4.5.3 (-:

Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra,
Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff,
Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
Michael Shirts, Alfons Sijbers, Peter Tieleman,

Berk Hess, David van der Spoel, and Erik Lindahl.

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2010, The GROMACS development team at
Uppsala University & The Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.


:-) Gromacs (-:

Reading file work/wudata_06.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[10:38:45] Mapping NT from 48 to 48
Starting 48 threads
Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
9750000 steps, 39000.0 ps (continuing from step 9500000, 38000.0 ps).
[10:38:50] Completed 0 out of 250000 steps (0%)
[10:52:36] Completed 2500 out of 250000 steps (1%)
:-) G R O M A C S (-:

Groningen Machine for Chemical Simulation

:-) VERSION 4.5.3 (-:

Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra,
Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff,
Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
Michael Shirts, Alfons Sijbers, Peter Tieleman,

Berk Hess, David van der Spoel, and Erik Lindahl.

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2010, The GROMACS development team at
Uppsala University & The Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.


:-) Gromacs (-:

Reading file work/wudata_06.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
Starting 48 threads

Reading checkpoint file work/wudata_06.cpt generated: Sat Mar 17 04:53:52 2012


Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
9750000 steps, 39000.0 ps (continuing from step 9502730, 38010.9 ps).
[10:56:33] int
[10:57:28] Verified work/wudata_06.log
[10:57:29] Verified work/wudata_06.trr
[10:57:29] Verified work/wudata_06.xtc
[10:57:29] Verified work/wudata_06.edr
[10:57:30] Completed 2730 out of 250000 steps (1%)

NOTE: Turning on dynamic load balancing

[11:09:18] Completed 5000 out of 250000 steps (2%)
[11:22:24] Completed 7500 out of 250000 steps (3%)
4 x 6174 CPUs @ 2,519 MHz with tears OC BIOS
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Normal Behavior?

Post by Grandpa_01 »

It is normal behaviour for the kraken at least it is on my 4P rigs. I am not sure if the kraken, fah or linux is turning on DLB but I do know that when DLB turns on and the kraken is running fah restarts, but fah does not restart when DLB starts if the the kraken is not running. I am guessing that the kraken has to re wrap the core but I do not know, perhaps tear can better answer what is happening if he comes around. I do know that your time per frame will drop when both of them are running. :D
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Re: Normal Behavior?

Post by patonb »

As long as it's normal, then I'm good.
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
musky
Posts: 3
Joined: Wed Aug 11, 2010 1:17 am

Re: Normal Behavior?

Post by musky »

This is the correct behavior with thekraken using the autorestart functionality. The idea is that the core restarts after the first checkpoint is written, which usually causes dynamic load balancing to engage. DLB makes a significant difference in performance. If you installed thekraken with "thekraken -c autorestart=1 -i", that is what is going on. you can verify by going into your folding directory and typing "cat thekraken.cfg". If you see "autorestart=1" as the bottom of that file, that is what is happening.
patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Re: Normal Behavior?

Post by patonb »

Yup, it was your guide in the vbox, so if its wrong its your fault.
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
bollix47
Posts: 2957
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Normal Behavior?

Post by bollix47 »

Seems normal.

From thekraken guide:

Code: Select all

6.3. Autorestart feature

        Background: GROMACS employs Dynamic Load Balancing (DLB)
        feature that aims at improving performance.

        GROMACS configuration used by FahCores enables DLB the moment
        cumulative performance loss due to load imbalance exceeds 5%.

        When enabled, DLB reduces times of bigadv units by noticable
        amount of time. Reports include reduction of 30s with P6903
        and 45 seconds with P6904 (sometimes more).

        Depending on WU and system configuration (or even system state),
        DLB gets enabled in a way that may appear random (sometimes it's
        several minutes into WU; at other times it may be as late
        as 90% into WU, sometimes it doesn't engage at all).

        It has been determined that restarting WU from a checkpoint
        significantly increases probability of almost-instantaneous
        DLB engagement (with P6903 and P6904 units).

        Autorestart feature, when enabled, makes The Kraken restart
        FahCore upon completed write of first checkpoint (15 minutes
        in typical configuration).

        To enable autorestart feature add '-c autorestart=1' parameter
        to the command line, when installing, e.g. 'thekraken -i -c autorestart=1'.
        If already installed, uninstall, then install with '-c autorestart=1'.
        Stopping the client is not required.

        NOTE: when enabled, FahCore will appear to have "started twice"
              or restarted without user interaction; this is expected
              and normal

        NOTE: autorestart feature isn't guaranteed; DLB may not always engage

        NOTE: DLB enagagement on units other than P6903 and P6904
              is rare

Gehacktesmacher
Posts: 3
Joined: Mon Aug 02, 2010 12:18 am

Re: Normal thekraken Behavior? [Yes]

Post by Gehacktesmacher »

Hi!

I am running Ubuntu 12.04 LTS with thekraken on an 4P System.
Got a strange issue when WU's finishes.

Code: Select all

[14:48:46] Completed 235000 out of 250000 steps  (94%)
[14:59:00] Completed 237500 out of 250000 steps  (95%)
[15:09:14] Completed 240000 out of 250000 steps  (96%)
[15:19:30] Completed 242500 out of 250000 steps  (97%)
[15:29:44] Completed 245000 out of 250000 steps  (98%)
[15:39:58] Completed 247500 out of 250000 steps  (99%)
[15:50:16] Completed 250000 out of 250000 steps  (100%)
[15:50:31] DynamicWrapper: Finished Work Unit: sleep=10000
[15:50:41]
[15:50:41] Finished Work Unit:
[15:50:41] - Reading up to 64407792 from "work/wudata_01.trr": Read 64407792
[15:50:42] trr file hash check passed.
[15:50:42] - Reading up to 31686692 from "work/wudata_01.xtc": Read 31686692
[15:50:42] xtc file hash check passed.
[15:50:42] edr file hash check passed.
[15:50:42] logfile size: 188597
[15:50:42] Leaving Run
[15:50:43] - Writing 96443957 bytes of core data to disk...
[15:51:12] Done: 96443445 -> 91694831 (compressed to 6.0 percent)
[15:51:12]   ... Done.
[16:38:01] - Shutting down core
[16:38:01]
[16:38:01] Folding@home Core Shutdown: FINISHED_UNIT
[16:43:41] CoreStatus = 64 (100)
[16:43:41] Unit 1 finished with 81 percent of time to deadline remaining.
[16:43:41] Updated performance fraction: 0.852459
[16:43:41] Sending work to server
[16:43:41] Project: 8103 (Run 1, Clone 69, Gen 4)


[16:43:41] + Attempting to send results [April 1 16:43:41 UTC]
[16:43:41] - Reading file work/wuresults_01.dat from core
[16:43:41]   (Read 91695343 bytes from disk)
[16:43:41] Connecting to http://128.143.231.201:8080/
[16:46:28] - Couldn't send HTTP request to server
[16:46:28] + Could not connect to Work Server (results)
[16:46:28]     (128.143.231.201:8080)
[16:46:28] + Retrying using alternative port
[16:46:28] Connecting to http://128.143.231.201:80/
[16:59:14] Posted data.
[16:59:14] Initial: 0000; + Results successfully sent
[16:59:14] Thank you for your contribution to Folding@Home.
[16:59:14] + Number of Units Completed: 3

[17:16:24] Trying to send all finished work units
[17:16:24] + No unsent completed units remaining.
[17:16:24] - Preparing to get new work unit...
[17:16:24] Cleaning up work directory
[17:16:24] + Attempting to get work packet
[17:16:24] Passkey found
[17:16:24] - Will indicate memory of 64346 MB
[17:16:24] - Connecting to assignment server
[17:16:24] Connecting to http://assign.stanford.edu:8080/
[17:16:25] Posted data.
[17:16:25] Initial: 8F80; - Successful: assigned to (128.143.199.96).
[17:16:26] + News From Folding@Home: Welcome to Folding@Home
[17:16:26] Loaded queue successfully.
[17:16:26] Sent data
[17:16:26] Connecting to http://128.143.199.96:8080/
[17:16:27] Posted data.
[17:16:27] Initial: 0000; - Receiving payload (expected size: 1764558)
[17:16:29] - Downloaded at ~861 kB/s
[17:16:29] - Averaged speed for that direction ~1059 kB/s
[17:16:29] + Received work.
[17:16:29] Trying to send all finished work units
[17:16:29] + No unsent completed units remaining.
[17:16:29] + Closed connections
I don't have any idea why it takes about 1 hour between "[15:51:12] ... Done." and "[16:38:01] - Shutting down core"
and about 15 minutes between "[16:59:14] + Number of Units Completed: 3" and "[17:16:24] + Attempting to get work packet"

I switched from Ubuntu 12.04 running on an ESXi to a native Ubuntu Installation. While am running Ubuntu in a VM I don't receive this error.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Normal thekraken Behavior? [Yes]

Post by PantherX »

What is the file system that you are using?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Normal thekraken Behavior? [Yes]

Post by Nathan_P »

By the looks of things something with barriers enabled, there was a piece of code that you could run to disable them but I can't find it.
Image
bollix47
Posts: 2957
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Normal thekraken Behavior? [Yes]

Post by bollix47 »

Perhaps this one?

Using that auto-fix will only work if the current file system is ext3 or ext4. If it's something else then you would have to edit /etc/fstab manually and add barrier=0 to the options for the disk containing the folding files.
Post Reply