Page 3 of 4

Re: Merged problems with projects 6903/6904

Posted: Fri Jun 01, 2012 7:17 pm
by Grandpa_01
kasson wrote:Stopped. BTW the v7 client will report these properly so you get fewer re-assigns.
Unfortunately there are not any good guides out there for 3rd party software that optimises F@H performance on the bigadv WU's so if you use v7 most are going to loose production. I do not even know if the Kraken even works on v7 or not. Have not even tried yet. And v7 is still a hit and miss situation when it come to installing it on Linux which is what most of the bigadv are run on.

Re: Merged problems with projects 6903/6904

Posted: Fri Jun 01, 2012 11:01 pm
by markfw
Thanks, I am back working, but its only a 75k ppd unit;

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 5:35 pm
by bollix47
Grandpa_01 wrote:I do not even know if the Kraken even works on v7 or not.

theKraken works in v7. Top shows thekraken wrapped core using all the cpu resources and system monitor shows all cpus running at 100%. I can't be positive about the autorestart feature though as I've never seen it work in v7 but have in v6.

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 8:23 pm
by Nathan_P
bollix47 wrote:
Grandpa_01 wrote:I do not even know if the Kraken even works on v7 or not.

theKraken works in v7. Top shows thekraken wrapped core using all the cpu resources and system monitor shows all cpus running at 100%. I can't be positive about the autorestart feature though as I've never seen it work in v7 but have in v6.

There is a new version of the kraken out that should fix the issue where DLB doesn't engage.

@Grandpa - I just saw a post on one of the forum's where they have used linux mint to install v7 without out issue, if i find the the forum again i'll let you know.
It would be useful if a different distro works with v7 - kasson can then spend less time chasing these rogue units and more time creating new projects.

What i find strange is these issues it only really happen with these projects - is it the server, the size of the WU or a combination of both?

Edit:- Here we go from the good folks at OCF:- http://www.overclockers.com/forums/show ... stcount=17

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 9:12 pm
by Grandpa_01
Nathan_P wrote:
bollix47 wrote:
Grandpa_01 wrote:I do not even know if the Kraken even works on v7 or not.

theKraken works in v7. Top shows thekraken wrapped core using all the cpu resources and system monitor shows all cpus running at 100%. I can't be positive about the autorestart feature though as I've never seen it work in v7 but have in v6.

There is a new version of the kraken out that should fix the issue where DLB doesn't engage.

@Grandpa - I just saw a post on one of the forum's where they have used linux mint to install v7 without out issue, if i find the the forum again i'll let you know.
It would be useful if a different distro works with v7 - kasson can then spend less time chasing these rogue units and more time creating new projects.

What i find strange is these issues it only really happen with these projects - is it the server, the size of the WU or a combination of both?

Edit:- Here we go from the good folks at OCF:- http://www.overclockers.com/forums/show ... stcount=17
Thanks Nathan_P

The problem here is that v7 needs to be fixed to install easily on all versions of Linux. v7 is supposed to be an easy install and use which for Windows (all versions) it is, but Linux is a different story mainly because there are so many different versions of Linux and they never remain constant. But at this time I do not believe v7 is something I would recommend for a Linux user unless you have the version of Linux that it happens to be working on. It used to work fine on 10.10 until they fixed it to work on 12.04 then they borked it for 10.10 O-well that is the way it goes. I have used and do use v7 on Linux and it is great, it is easy to use has auto start, simple to set up etc. That is if you can get it to install without having to write special scripts and editing the package many of us do not have any idea of how to get it installed if the package from Stanford does not work the first time consistently.

99.9% of all the bigadv WU's can only be run on Linux and v7 would be a real plus here but from the comments that have been made before about the Linux installer issues here on the forum it does not have a very high importance. So I would say the average Linux user will not be using v7 for a while yet, if v6 is easier to use due to the simple guides that are available why would they.

In order for v7 to reach it's goal of a simple to use client some things still need to be addressed a user should not have to change his OS or go through a special install in order to get a user friendly client to work. :wink:

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 11:03 pm
by bollix47
There is a new version of the kraken out that should fix the issue where DLB doesn't engage.
Okay, I upgraded to 0.7-pre11 and, although I see no visual evidence in the log that DLB has started, the TPF appears to be ~20 seconds faster (on the exact same WU which I paused to update theKraken) so it might be working. :e?: It does state in the README.txt file that DLB works by default in this version of theKraken:

http://www.amdzone.com/phpbb3/viewtopic ... 44#p218944

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 11:26 pm
by kasson
Thanks--there has been a discussion of why more people aren't using v7, so I'm passing some of this feedback on to Dr. Pande. It's good for those of us involved in the project (but not developing the v7 client per se) to get a sense of what the issues are from the users' point of view.

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 11:28 pm
by ChelseaOilman
You need to look in the terminal window to see DLB turning on.

Re: Merged problems with projects 6903/6904

Posted: Sat Jun 02, 2012 11:34 pm
by bollix47
ChelseaOilman wrote:You need to look in the terminal window to see DLB turning on.
Yes, I should have said that I looked there and no message re DLB. I'll have a look again when the next WU starts on Monday.

Current Terminal window:

Code: Select all

21:06:34:************************* Folding@home Client *************************
21:06:34:    Website: http://folding.stanford.edu/
21:06:34:  Copyright: (c) 2009-2012 Stanford University
21:06:34:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:06:34:       Args: 
21:06:34:     Config: /home/bollix/fah/config.xml
21:06:34:******************************** Build ********************************
21:06:34:    Version: 7.1.52
21:06:34:       Date: Mar 20 2012
21:06:34:       Time: 13:19:11
21:06:34:    SVN Rev: 3515
21:06:34:     Branch: fah/trunk/client
21:06:34:   Compiler: GNU 4.6.2
21:06:34:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
21:06:34:             -fno-unsafe-math-optimizations -msse2
21:06:34:   Platform: linux2 3.2.0-1-amd64
21:06:34:       Bits: 64
21:06:34:       Mode: Release
21:06:34:******************************* System ********************************
21:06:34:        CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
21:06:34:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
21:06:34:       CPUs: 12
21:06:34:     Memory: 11.75GiB
21:06:34:Free Memory: 10.00GiB
21:06:34:    Threads: POSIX_THREADS
21:06:34: On Battery: false
21:06:34: UTC offset: -4
21:06:34:        PID: 3216
21:06:34:        CWD: /home/bollix/fah
21:06:34:         OS: Linux 3.0.0-19-generic x86_64
21:06:34:    OS Arch: AMD64
21:06:34:       GPUs: 3
21:06:34:      GPU 0: UNSUPPORTED: Rage XL (Intel Corporation)
21:06:34:      GPU 1: UNSUPPORTED: Rage XL (Intel Corporation)
21:06:34:      GPU 2: NVIDIA:1 G92 [GeForce GTS 250]
21:06:34:       CUDA: 1.1
21:06:34:CUDA Driver: 4000
21:06:34:***********************************************************************
21:06:34:<config>
21:06:34:  <!-- FahCore Control -->
21:06:34:  <core-priority v='low'/>
21:06:34:
21:06:34:  <!-- Network -->
21:06:34:  <proxy v=':8080'/>
21:06:34:
21:06:34:  <!-- Remote Command Server -->
21:06:34:  <command-allow v='127.0.0.1,192.168.2.100-192.168.2.149'/>
21:06:34:  <command-allow-no-pass v='127.0.0.1,192.168.2.100-192.168.2.149'/>
21:06:34:
21:06:34:  <!-- User Information -->
21:06:34:  <passkey v='********************************'/>
21:06:34:  <team v='39340'/>
21:06:34:  <user v='bollix47'/>
21:06:34:
21:06:34:  <!-- Folding Slots -->
21:06:34:  <slot id='0' type='SMP'>
21:06:34:    <client-type v='bigadv'/>
21:06:34:    <cpus v='-1'/>
21:06:34:    <max-packet-size v='big'/>
21:06:34:    <next-unit-percentage v='98'/>
21:06:34:  </slot>
21:06:34:</config>
21:06:34:Trying to access database...
21:06:34:Successfully acquired database lock
21:06:34:Enabled folding slot 00: READY smp:12
21:06:34:WU00:FS00:Starting
21:06:34:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/bollix/fah/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 00 -suffix 01 -version 701 -lifeline 3216 -checkpoint 15 -np 12
21:06:34:WU00:FS00:Started FahCore on PID 3224
21:06:34:WU00:FS00:Core PID:3228
21:06:34:WU00:FS00:FahCore 0xa5 started
21:06:34:Server connection id=1 on 0.0.0.0:36330 from 192.168.2.103
21:06:35:WU00:FS00:0xa5:
21:06:35:WU00:FS00:0xa5:*------------------------------*
21:06:35:WU00:FS00:0xa5:Folding@Home Gromacs SMP Core
21:06:35:WU00:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
21:06:35:WU00:FS00:0xa5:
21:06:35:WU00:FS00:0xa5:Preparing to commence simulation
21:06:35:WU00:FS00:0xa5:- Looking at optimizations...
21:06:35:WU00:FS00:0xa5:- Files status OK
21:06:39:WU00:FS00:0xa5:- Expanded 57238645 -> 71846524 (decompressed 50.4 percent)
21:06:39:WU00:FS00:0xa5:Called DecompressByteArray: compressed_data_size=57238645 data_size=71846524, decompressed_data_size=71846524 diff=0
21:06:40:WU00:FS00:0xa5:- Digital signature verified
21:06:40:WU00:FS00:0xa5:
21:06:40:WU00:FS00:0xa5:Project: 6903 (Run 3, Clone 16, Gen 87)
21:06:40:WU00:FS00:0xa5:
21:06:40:WU00:FS00:0xa5:Assembly optimizations on if available.
21:06:40:WU00:FS00:0xa5:Entering M.D.
21:06:46:WU00:FS00:0xa5:Using Gromacs checkpoints
21:06:51:WU00:FS00:0xa5:Mapping NT from 12 to 12 
21:07:12:WU00:FS00:0xa5:Resuming from checkpoint
21:07:13:WU00:FS00:0xa5:Verified 00/wudata_01.log
21:07:14:WU00:FS00:0xa5:Verified 00/wudata_01.trr
21:07:14:WU00:FS00:0xa5:Verified 00/wudata_01.xtc
21:07:14:WU00:FS00:0xa5:Verified 00/wudata_01.edr
21:07:15:WU00:FS00:0xa5:Completed 78935 out of 250000 steps  (31%)
21:15:14:Server connection id=2 on 0.0.0.0:36330 from 192.168.2.103
21:15:17:Server connection id=3 on 0.0.0.0:36330 from 192.168.2.103
21:22:18:WU00:FS00:0xa5:Completed 80000 out of 250000 steps  (32%)
21:57:46:WU00:FS00:0xa5:Completed 82500 out of 250000 steps  (33%)
22:33:10:WU00:FS00:0xa5:Completed 85000 out of 250000 steps  (34%)
23:08:37:WU00:FS00:0xa5:Completed 87500 out of 250000 steps  (35%)
23:43:58:WU00:FS00:0xa5:Completed 90000 out of 250000 steps  (36%)

Current kraken log:

Code: Select all

thekraken: The Kraken 0.7-pre11 (compiled Sat Jun  2 17:04:29 EDT 2012 by bollix@Enterprise)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 3228
thekraken: launch binary: /home/bollix/fah/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/thekraken-FahCore_a5
thekraken: config file: /home/bollix/fah/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/thekraken.cfg  <=======  This file doesn't appear to exist anywhere 
thekraken: Forked 3229.
thekraken: child: ptrace(PTRACE_TRACEME) returns 0
thekraken: child: Executing...
thekraken: 3229: initial attach
thekraken: 3229: Continuing.
thekraken: 3229: cloned 3231
thekraken: 3229: binding 3231 to cpu 0
thekraken: 3229: talkative FahCore process identified (3231); listening to syscalls
thekraken: 3229: Continuing (SYSCALL).
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3232 with status 0x0000137f
thekraken: 3232: stopped with signal 0x00000013
thekraken: 3232: Continuing.
thekraken: 3229: cloned 3232
thekraken: 3229: Continuing (SYSCALL).
thekraken: 3229: cloned 3233
thekraken: 3229: Continuing (SYSCALL).
thekraken: waitpid() returns 3233 with status 0x0000137f
thekraken: 3233: stopped with signal 0x00000013
thekraken: 3233: Continuing.
thekraken: waitpid() returns 3234 with status 0x0000137f
thekraken: 3234: stopped with signal 0x00000013
thekraken: 3234: Continuing.
thekraken: 3231: cloned 3234
thekraken: 3231: binding 3234 to cpu 1
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3235 with status 0x0000137f
thekraken: 3235: stopped with signal 0x00000013
thekraken: 3235: Continuing.
thekraken: 3231: cloned 3235
thekraken: 3231: binding 3235 to cpu 2
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3236 with status 0x0000137f
thekraken: 3236: stopped with signal 0x00000013
thekraken: 3236: Continuing.
thekraken: 3231: cloned 3236
thekraken: 3231: binding 3236 to cpu 3
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3237 with status 0x0000137f
thekraken: 3237: stopped with signal 0x00000013
thekraken: 3237: Continuing.
thekraken: 3231: cloned 3237
thekraken: 3231: binding 3237 to cpu 4
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3238 with status 0x0000137f
thekraken: 3238: stopped with signal 0x00000013
thekraken: 3238: Continuing.
thekraken: 3231: cloned 3238
thekraken: 3231: binding 3238 to cpu 5
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3239 with status 0x0000137f
thekraken: 3239: stopped with signal 0x00000013
thekraken: 3239: Continuing.
thekraken: 3231: cloned 3239
thekraken: 3231: binding 3239 to cpu 6
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3240 with status 0x0000137f
thekraken: 3240: stopped with signal 0x00000013
thekraken: 3240: Continuing.
thekraken: 3231: cloned 3240
thekraken: 3231: binding 3240 to cpu 7
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3241 with status 0x0000137f
thekraken: 3241: stopped with signal 0x00000013
thekraken: 3241: Continuing.
thekraken: 3231: cloned 3241
thekraken: 3231: binding 3241 to cpu 8
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3242 with status 0x0000137f
thekraken: 3242: stopped with signal 0x00000013
thekraken: 3242: Continuing.
thekraken: 3231: cloned 3242
thekraken: 3231: binding 3242 to cpu 9
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3243 with status 0x0000137f
thekraken: 3243: stopped with signal 0x00000013
thekraken: 3243: Continuing.
thekraken: 3231: cloned 3243
thekraken: 3231: binding 3243 to cpu 10
thekraken: 3231: Continuing (SYSCALL).
thekraken: waitpid() returns 3244 with status 0x0000137f
thekraken: 3244: stopped with signal 0x00000013
thekraken: 3244: Continuing.
thekraken: 3231: cloned 3244
thekraken: 3231: binding 3244 to cpu 11
thekraken: 3231: Continuing (SYSCALL).

Re: Merged problems with projects 6903/6904

Posted: Sun Jun 03, 2012 12:46 am
by Grandpa_01
bollix47 it is working and is doing what it is supposed to. When the WU first starts the Kraken starts sending out messages to interupt threads and create instability until DLB turns on once it turns on the Kraken quits sending the interrupt signals your log shows it is working properly.

Re: Merged problems with projects 6903/6904

Posted: Sun Jun 03, 2012 5:34 am
by KMac
P6904 Run 2, Clone 19, Gen 84 CoreStatus = 8B (139) Client-core communications error: ERROR 0x8b
Unit did not delete automatically but stopped processing. Manual deletion successful.

Code: Select all

[13:16:43] Connecting to http://130.237.232.237:8080/
[13:32:22] Posted data.
[13:32:22] Initial: 0000; - Uploaded at ~231 kB/s
[13:32:22] - Averaged speed for that direction ~216 kB/s
[13:32:22] + Results successfully sent
[13:32:22] Thank you for your contribution to Folding@Home.
[13:32:22] + Number of Units Completed: 266

[13:32:26] Trying to send all finished work units
[13:32:26] + No unsent completed units remaining.
[13:32:26] - Preparing to get new work unit...
[13:32:26] Cleaning up work directory
[13:32:27] + Attempting to get work packet
[13:32:27] Passkey found
[13:32:27] - Will indicate memory of 12033 MB
[13:32:27] - Connecting to assignment server
[13:32:27] Connecting to http://assign.stanford.edu:8080/
[13:32:28] Posted data.
[13:32:28] Initial: ED82; - Successful: assigned to (130.237.232.237).
[13:32:28] + News From Folding@Home: Welcome to Folding@Home
[13:32:28] Loaded queue successfully.
[13:32:28] Sent data
[13:32:28] Connecting to http://130.237.232.237:8080/
[13:32:39] Posted data.
[13:32:39] Initial: 0000; - Receiving payload (expected size: 45659464)
[13:37:12] - Downloaded at ~163 kB/s
[13:37:12] - Averaged speed for that direction ~329 kB/s
[13:37:12] + Received work.
[13:37:12] Trying to send all finished work units
[13:37:12] + No unsent completed units remaining.
[13:37:12] + Closed connections
[13:37:12] 
[13:37:12] + Processing work unit
[13:37:12] Core required: FahCore_a5.exe
[13:37:12] Core found.
[13:37:12] Working on queue slot 08 [June 2 13:37:12 UTC]
[13:37:12] + Working ...
[13:37:12] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 08 -np 24 -priority 96 -checkpoint 15 -verbose -lifeline 1821 -version 634'

[13:37:12] 
[13:37:12] *------------------------------*
[13:37:12] Folding@Home Gromacs SMP Core
[13:37:12] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[13:37:12] 
[13:37:12] Preparing to commence simulation
[13:37:12] - Looking at optimizations...
[13:37:12] - Created dyn
[13:37:12] - Files status OK
[13:37:15] - Expanded 45658952 -> 70963200 (decompressed 61.3 percent)
[13:37:15] Called DecompressByteArray: compressed_data_size=45658952 data_size=70963200, decompressed_data_size=70963200 diff=0
[13:37:15] - Digital signature verified
[13:37:15] 
[13:37:15] Project: 6904 (Run 2, Clone 19, Gen 84)
[13:37:15] 
[13:37:15] Assembly optimizations on if available.
[13:37:15] Entering M.D.
[13:37:22] Mapping NT from 24 to 24 
[13:53:28] CoreStatus = 8B (139)
[13:53:28] Client-core communications error: ERROR 0x8b
[13:53:28] Deleting current work unit & continuing...

Re: Merged problems with projects 6903/6904

Posted: Mon Jun 04, 2012 5:12 am
by firedfly
Grandpa_01 wrote:bollix47 it is working and is doing what it is supposed to. When the WU first starts the Kraken starts sending out messages to interupt threads and create instability until DLB turns on once it turns on the Kraken quits sending the interrupt signals your log shows it is working properly.
It is not working correctly. The synthetic load is never starting since the kraken never detects that folding has started. I have not tested the new code on v7, so perhaps that is of the issue. I'll try to do some testing on v7 this week.

If the new DLB load feature was working correctly, you would have extra entries in thekraken.log like below:

Code: Select all

thekraken: 29501: binding 29550 to cpu 47
thekraken: 29501: Continuing (SYSCALL).
thekraken: 29501: first step identified, creating 24 synthload workers: on 8000ms, off 200ms, deadline 300000ms
thekraken: 29501: synthload manager created (29551)
thekraken: 29501: DLB has engaged; killing synthetic load manager
thekraken: waitpid() returns 29501 with status 0x0000057f
thekraken: 29501: stopped with signal 0x00000005
thekraken: 29501: Continuing (unhandled trap).
thekraken: waitpid() returns 29551 with status 0x0000000f
thekraken: 29551: terminated by signal 15
thekraken: 29551: synthetic load manager terminated (run time: 16 seconds)

Re: Merged problems with projects 6903/6904

Posted: Mon Jun 04, 2012 5:41 am
by tear
Hi bollix47,

Actually, 0.7-pre11 hasn't been tested w/V7 and you... have found a bug (which causes
DLB engagement code not to start under V7).

I've fixed it in 0.7-pre13 -- http://darkswarm.org/thekraken-0.7-pre13.tar.gz, so feel free to give it a shot
-- should work much better. Please follow up via PM, in Kraken's home thread in AMDzone
or in 3rd party software subforum so we don't hijack this thread.

0.7-pre13 also includes startup deadline feature so b0rked units fail within 5 mintues
instead of tens of minutes.

Thanks,
tear

Re: Merged problems with projects 6903/6904

Posted: Fri Jun 08, 2012 11:50 am
by EXT64
Hi,

I got this ( Project 6904 (Run 2, Clone 19, Gen 84) ) WU, and when it started, it stalled at Mapping 48 to 48. I think it sat for around an hour before I stopped it. I switched to 8101s after I first got it and they worked great, but then when I switched back to 690xs I got it again, and again it would not work. I then deleted my machinedependent.dat, and got a 6901 which is working.

Re: Merged problems with projects 6903/6904

Posted: Fri Jun 08, 2012 2:55 pm
by Grandpa_01
That particular WU has been running around for a few days now playing havoc with bigadv. I have a feeling Kasson is having a hard time getting his net around that one. The only cure fore it is to manually delete the machinedependent.dat file you may want to try the latest version of the Kraken mentioned above and see if it works for you. I am running it on all of mine right now and it appears to to have no problems, I do not know if it will take care of the problem this particular WU creates or not yet since I have not downloaded that particular WU since I installed the latest version of the Kraken.