Page 4 of 6
Re: Merged problems with projects 6903/6904
Posted: Sun Feb 12, 2012 9:56 pm
by Grandpa_01
According to Kasson the new server code does not allow nukeing them the way the old code did, when they try the old way the server regenerates the 512-byte download + missing file issue. I think he may be running into the same issue some of us are. I tried running one of them on my 4P I figured it might help, I was running the new WU's any way so it really was not going to be a big loss anyway as far as PPD goes. It took a little under 2 days to run it but it would not send it just died at the end (same error as harlam and probably Patriot). I do not know if Kason is having the same problem or not but I am sure he is working as fast as he can. I do not think there is anything any of us can do to help but if there is I am willing to do what I can.
MtM I do not think any new ones are being generated I think it is just the old ones are not getting completed and keep getting regenerated and as they time out on different folders accounts the problem just keeps increasing. Me thinks Kasson may need a bigger mouse trap.
WU dumped after completion and next one seems to be hung
Posted: Mon Feb 13, 2012 5:23 pm
by SKeptical_Thinker
Code: Select all
*********************** Log Started 2012-02-10T21:40:33 ************************
21:40:33:************************* Folding@home Client *************************
21:40:33: Website: http://folding.stanford.edu/
21:40:33: Copyright: (c) 2009-2012 Stanford University
21:40:33: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:40:33: Args: --child --lifeline 6630 /etc/fahclient/config.xml --run-as
21:40:33: fahclient --pid-file=/var/run/fahclient.pid --daemon
21:40:33: Config: /etc/fahclient/config.xml
21:40:33:******************************** Build ********************************
21:40:33: Version: 7.1.43
21:40:33: Date: Jan 2 2012
21:40:33: Time: 04:27:48
21:40:33: SVN Rev: 3223
21:40:33: Branch: fah/trunk/client
21:40:33: Compiler: GNU 4.1.2 20080704 (Red Hat 4.1.2-46)
21:40:33: Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
21:40:33: -fno-unsafe-math-optimizations -msse2
21:40:33: Platform: linux2 2.6.18-164.11.1.el5
21:40:33: Bits: 64
21:40:33: Mode: Release
21:40:33:******************************* System ********************************
21:40:33: CPU: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
21:40:33: CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
21:40:33: CPUs: 24
21:40:33: Memory: 47.13GiB
21:40:33:Free Memory: 46.89GiB
21:40:33: Threads: POSIX_THREADS
21:40:33: On Battery: false
21:40:33: UTC offset: -5
21:40:33: PID: 6637
21:40:33: CWD: /var/lib/fahclient
21:40:33: OS: Linux 2.6.38.2-f x86_64
21:40:33: OS Arch: AMD64
21:40:33: GPUs: 2
21:40:33: GPU 0: UNSUPPORTED: Rage XL (Intel Corporation)
21:40:33: GPU 1: UNSUPPORTED: ES1000
21:40:33: CUDA: Not detected
21:40:33:***********************************************************************
21:40:33:Started thread 1 on PID 6637
21:40:33:<config>
21:40:33: <!-- Client Control -->
21:40:33: <cycle-rate v='4'/>
21:40:33: <cycles v='-1'/>
21:40:33: <data-directory v='.'/>
21:40:33: <disable-project-lookup v='false'/>
21:40:33: <exec-directory v='/usr/bin'/>
21:40:33: <exit-when-done v='false'/>
21:40:33: <threads v='4'/>
21:40:33:
21:40:33: <!-- Configuration -->
21:40:33: <config-rotate v='true'/>
21:40:33: <config-rotate-dir v='configs'/>
21:40:33: <config-rotate-max v='16'/>
21:40:33:
21:40:33: <!-- Debugging -->
21:40:33: <assignment-servers>
21:40:33: assign3.stanford.edu:8080 assign4.stanford.edu:80
21:40:33: </assignment-servers>
21:40:33: <capture-directory v='capture'/>
21:40:33: <capture-sockets v='false'/>
21:40:33: <debug-sockets v='false'/>
21:40:33: <exception-locations v='true'/>
21:40:33: <gpu-assignment-servers>
21:40:33: assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
21:40:33: </gpu-assignment-servers>
21:40:33: <stack-traces v='false'/>
21:40:33:
21:40:33: <!-- Error Handling -->
21:40:33: <max-slot-errors v='5'/>
21:40:33: <max-unit-errors v='5'/>
21:40:33:
21:40:33: <!-- FahCore Control -->
21:40:33: <checkpoint v='15'/>
21:40:33: <core-dir v='cores'/>
21:40:33: <core-priority v='idle'/>
21:40:33: <cpu-affinity v='false'/>
21:40:33: <cpu-usage v='100'/>
21:40:33: <no-assembly v='false'/>
21:40:33:
21:40:33: <!-- Folding Slot Configuration -->
21:40:33: <client-subtype v='LINUX'/>
21:40:33: <client-type v='bigadv'/>
21:40:33: <cpu-species v='X86_PENTIUM_II'/>
21:40:33: <cpu-type v='AMD64'/>
21:40:33: <cpus v='-1'/>
21:40:33: <cuda-index v='0'/>
21:40:33: <gpu v='false'/>
21:40:33: <gpu-usage v='100'/>
21:40:33: <max-packet-size v='big'/>
21:40:33: <opencl-index v='0'/>
21:40:33: <os-species v='UNKNOWN'/>
21:40:33: <os-type v='LINUX'/>
21:40:33: <project-key v='0'/>
21:40:33: <smp v='true'/>
21:40:33:
21:40:33: <!-- Logging -->
21:40:33: <log v='log.txt'/>
21:40:33: <log-color v='true'/>
21:40:33: <log-crlf v='false'/>
21:40:33: <log-date v='false'/>
21:40:33: <log-date-periodically v='21600'/>
21:40:33: <log-debug v='true'/>
21:40:33: <log-domain v='false'/>
21:40:33: <log-header v='true'/>
21:40:33: <log-level v='true'/>
21:40:33: <log-no-info-header v='true'/>
21:40:33: <log-redirect v='false'/>
21:40:33: <log-rotate v='true'/>
21:40:33: <log-rotate-dir v='logs'/>
21:40:33: <log-rotate-max v='16'/>
21:40:33: <log-short-level v='false'/>
21:40:33: <log-simple-domains v='true'/>
21:40:33: <log-thread-id v='false'/>
21:40:33: <log-thread-prefix v='true'/>
21:40:33: <log-time v='true'/>
21:40:33: <log-to-screen v='true'/>
21:40:33: <log-truncate v='false'/>
21:40:33: <verbosity v='7'/>
21:40:33:
21:40:33: <!-- Network -->
21:40:33: <proxy v=''/>
21:40:33: <proxy-enable v='false'/>
21:40:33: <proxy-pass v=''/>
21:40:33: <proxy-user v=''/>
21:40:33:
21:40:33: <!-- Process Control -->
21:40:33: <child v='true'/>
21:40:33: <daemon v='true'/>
21:40:33: <pid v='false'/>
21:40:33: <pid-file v='/var/run/fahclient.pid'/>
21:40:33: <respawn v='false'/>
21:40:33: <service v='false'/>
21:40:33:
21:40:33: <!-- Remote Command Server -->
21:40:33: <command-address v='0.0.0.0'/>
21:40:33: <command-allow v='127.0.0.1'/>
21:40:33: <command-allow-no-pass v='127.0.0.1'/>
21:40:33: <command-deny v='0.0.0.0/0'/>
21:40:33: <command-deny-no-pass v='0.0.0.0/0'/>
21:40:33: <command-port v='36330'/>
21:40:33:
21:40:33: <!-- Slot Control -->
21:40:33: <max-shutdown-wait v='60'/>
21:40:33: <pause-on-battery v='false'/>
21:40:33: <pause-on-start v='false'/>
21:40:33:
21:40:33: <!-- User Information -->
21:40:33: <machine-id v='0'/>
21:40:33: <passkey v='********************************'/>
21:40:33: <team v='31574'/>
21:40:33: <user v='Skeptical_Thinker'/>
21:40:33:
21:40:33: <!-- Work Unit Control -->
21:40:33: <dump-after-deadline v='true'/>
21:40:33: <max-queue v='16'/>
21:40:33: <max-units v='0'/>
21:40:33: <next-unit-percentage v='99'/>
21:40:33:
21:40:33: <!-- Folding Slots -->
21:40:33:</config>
21:40:33:Switching to user fahclient
21:40:33:Trying to access database...
21:40:33:Successfully acquired database lock
21:40:33:Enabled folding slot 00: READY smp:24
21:40:33:Started thread 4 on PID 6637
21:40:33:Started thread 3 on PID 6637
21:40:33:Started thread 5 on PID 6637
21:40:33:Started thread 6 on PID 6637
21:40:33:Started thread 7 on PID 6637
21:40:33:WU01:FS00:Starting
21:40:33:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 01 -suffix 01 -version 701 -checkpoint 15 -np 24
21:40:33:WU01:FS00:Started FahCore on PID 6645
21:40:33:Started thread 8 on PID 6637
21:40:33:WU01:FS00:Core PID:6649
21:40:33:WU01:FS00:FahCore 0xa5 started
21:40:34:WU01:FS00:0xa5:
21:40:34:WU01:FS00:0xa5:*------------------------------*
21:40:34:WU01:FS00:0xa5:Folding@Home Gromacs SMP Core
21:40:34:WU01:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
21:40:34:WU01:FS00:0xa5:
21:40:34:WU01:FS00:0xa5:Preparing to commence simulation
21:40:34:WU01:FS00:0xa5:- Ensuring status. Please wait.
21:40:43:WU01:FS00:0xa5:- Looking at optimizations...
21:40:43:WU01:FS00:0xa5:- Working with standard loops on this execution.
21:40:43:WU01:FS00:0xa5:- Previous termination of core was improper.
21:40:43:WU01:FS00:0xa5:- Going to use standard loops.
21:40:43:WU01:FS00:0xa5:- Files status OK
21:40:49:WU01:FS00:0xa5:- Expanded 57246854 -> 71846524 (decompressed 50.4 percent)
21:40:49:WU01:FS00:0xa5:Called DecompressByteArray: compressed_data_size=57246854 data_size=71846524, decompressed_data_size=71846524 diff=0
21:40:49:WU01:FS00:0xa5:- Digital signature verified
21:40:49:WU01:FS00:0xa5:
21:40:49:WU01:FS00:0xa5:Project: 6903 (Run 5, Clone 13, Gen 69)
21:40:49:WU01:FS00:0xa5:
21:40:50:WU01:FS00:0xa5:Entering M.D.
21:40:56:WU01:FS00:0xa5:Using Gromacs checkpoints
21:41:02:WU01:FS00:0xa5:Mapping NT from 24 to 24
21:41:42:WU01:FS00:0xa5:Resuming from checkpoint
21:41:45:WU01:FS00:0xa5:Verified 01/wudata_01.log
21:41:48:WU01:FS00:0xa5:Verified 01/wudata_01.trr
21:41:53:WU01:FS00:0xa5:Verified 01/wudata_01.xtc
21:41:54:WU01:FS00:0xa5:Verified 01/wudata_01.edr
21:41:54:WU01:FS00:0xa5:Completed 216335 out of 500000 steps (43%)
22:29:19:WU01:FS00:0xa5:Completed 220000 out of 500000 steps (44%)
23:33:32:WU01:FS00:0xa5:Completed 225000 out of 500000 steps (45%)
00:36:41:WU01:FS00:0xa5:Completed 230000 out of 500000 steps (46%)
01:40:12:WU01:FS00:0xa5:Completed 235000 out of 500000 steps (47%)
02:43:37:WU01:FS00:0xa5:Completed 240000 out of 500000 steps (48%)
******************************** Date: 11/02/12 ********************************
03:47:12:WU01:FS00:0xa5:Completed 245000 out of 500000 steps (49%)
04:51:20:WU01:FS00:0xa5:Completed 250000 out of 500000 steps (50%)
05:54:38:WU01:FS00:0xa5:Completed 255000 out of 500000 steps (51%)
06:58:02:WU01:FS00:0xa5:Completed 260000 out of 500000 steps (52%)
08:01:31:WU01:FS00:0xa5:Completed 265000 out of 500000 steps (53%)
09:05:25:WU01:FS00:0xa5:Completed 270000 out of 500000 steps (54%)
******************************** Date: 11/02/12 ********************************
10:08:37:WU01:FS00:0xa5:Completed 275000 out of 500000 steps (55%)
11:12:08:WU01:FS00:0xa5:Completed 280000 out of 500000 steps (56%)
12:15:46:WU01:FS00:0xa5:Completed 285000 out of 500000 steps (57%)
13:19:16:WU01:FS00:0xa5:Completed 290000 out of 500000 steps (58%)
14:21:31:WU01:FS00:Downloading project 6903 description
14:21:31:WU01:FS00:Connecting to fah-web.stanford.edu:80
14:21:32:WU01:FS00:Project 6903 description downloaded successfully
14:23:06:WU01:FS00:0xa5:Completed 295000 out of 500000 steps (59%)
15:26:35:WU01:FS00:0xa5:Completed 300000 out of 500000 steps (60%)
******************************** Date: 11/02/12 ********************************
16:30:17:WU01:FS00:0xa5:Completed 305000 out of 500000 steps (61%)
17:34:10:WU01:FS00:0xa5:Completed 310000 out of 500000 steps (62%)
18:37:42:WU01:FS00:0xa5:Completed 315000 out of 500000 steps (63%)
19:40:54:WU01:FS00:0xa5:Completed 320000 out of 500000 steps (64%)
20:45:00:WU01:FS00:0xa5:Completed 325000 out of 500000 steps (65%)
21:48:13:WU01:FS00:0xa5:Completed 330000 out of 500000 steps (66%)
******************************** Date: 11/02/12 ********************************
22:51:33:WU01:FS00:0xa5:Completed 335000 out of 500000 steps (67%)
23:54:58:WU01:FS00:0xa5:Completed 340000 out of 500000 steps (68%)
00:58:19:WU01:FS00:0xa5:Completed 345000 out of 500000 steps (69%)
02:02:00:WU01:FS00:0xa5:Completed 350000 out of 500000 steps (70%)
03:06:00:WU01:FS00:0xa5:Completed 355000 out of 500000 steps (71%)
04:09:23:WU01:FS00:0xa5:Completed 360000 out of 500000 steps (72%)
******************************** Date: 12/02/12 ********************************
05:13:53:WU01:FS00:0xa5:Completed 365000 out of 500000 steps (73%)
06:17:01:WU01:FS00:0xa5:Completed 370000 out of 500000 steps (74%)
07:20:12:WU01:FS00:0xa5:Completed 375000 out of 500000 steps (75%)
08:23:30:WU01:FS00:0xa5:Completed 380000 out of 500000 steps (76%)
09:27:29:WU01:FS00:0xa5:Completed 385000 out of 500000 steps (77%)
10:31:15:WU01:FS00:0xa5:Completed 390000 out of 500000 steps (78%)
******************************** Date: 12/02/12 ********************************
11:35:29:WU01:FS00:0xa5:Completed 395000 out of 500000 steps (79%)
12:38:45:WU01:FS00:0xa5:Completed 400000 out of 500000 steps (80%)
13:42:27:WU01:FS00:0xa5:Completed 405000 out of 500000 steps (81%)
14:45:48:WU01:FS00:0xa5:Completed 410000 out of 500000 steps (82%)
15:49:06:WU01:FS00:0xa5:Completed 415000 out of 500000 steps (83%)
16:52:25:WU01:FS00:0xa5:Completed 420000 out of 500000 steps (84%)
******************************** Date: 12/02/12 ********************************
17:55:36:WU01:FS00:0xa5:Completed 425000 out of 500000 steps (85%)
18:59:04:WU01:FS00:0xa5:Completed 430000 out of 500000 steps (86%)
20:02:58:WU01:FS00:0xa5:Completed 435000 out of 500000 steps (87%)
21:06:13:WU01:FS00:0xa5:Completed 440000 out of 500000 steps (88%)
22:09:35:WU01:FS00:0xa5:Completed 445000 out of 500000 steps (89%)
23:12:29:WU01:FS00:0xa5:Completed 450000 out of 500000 steps (90%)
******************************** Date: 13/02/12 ********************************
00:15:40:WU01:FS00:0xa5:Completed 455000 out of 500000 steps (91%)
01:19:54:WU01:FS00:0xa5:Completed 460000 out of 500000 steps (92%)
02:23:10:WU01:FS00:0xa5:Completed 465000 out of 500000 steps (93%)
03:26:26:WU01:FS00:0xa5:Completed 470000 out of 500000 steps (94%)
04:30:22:WU01:FS00:0xa5:Completed 475000 out of 500000 steps (95%)
05:33:26:WU01:FS00:0xa5:Completed 480000 out of 500000 steps (96%)
******************************** Date: 13/02/12 ********************************
06:37:17:WU01:FS00:0xa5:Completed 485000 out of 500000 steps (97%)
07:40:32:WU01:FS00:0xa5:Completed 490000 out of 500000 steps (98%)
08:44:18:WU01:FS00:0xa5:Completed 495000 out of 500000 steps (99%)
08:44:19:WU00:FS00:Connecting to assign3.stanford.edu:8080
08:44:20:WU00:FS00:News: Welcome to Folding@Home
08:44:20:WU00:FS00:Assigned to work server 130.237.232.237
08:44:20:WU00:FS00:Requesting new work unit for slot 00: RUNNING smp:24 from 130.237.232.237
08:44:20:WU00:FS00:Connecting to 130.237.232.237:8080
08:44:31:WU00:FS00:Downloading 44.36MiB
08:44:37:WU00:FS00:Download 48.33%
08:44:41:WU00:FS00:Download complete
08:44:41:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:OK project:6904 run:2 clone:18 gen:54 core:0xa5 unit:0x0000005d52be746d4dfbca2cd51e4bf9
08:44:41:WU00:FS00:Downloading project 6904 description
08:44:41:WU00:FS00:Connecting to fah-web.stanford.edu:80
08:44:41:WU00:FS00:Project 6904 description downloaded successfully
09:47:35:WU01:FS00:0xa5:Completed 500000 out of 500000 steps (100%)
09:48:02:WU01:FS00:0xa5:DynamicWrapper: Finished Work Unit: sleep=10000
09:48:12:WU01:FS00:0xa5:
09:48:12:WU01:FS00:0xa5:Finished Work Unit:
09:48:16:WU01:FS00:0xa5:- Reading up to 182433744 from "01/wudata_01.trr": Read 182433744
09:48:17:WU01:FS00:0xa5:trr file hash check passed.
09:48:23:WU01:FS00:0xa5:- Reading up to 207685912 from "01/wudata_01.xtc": Read 207685912
09:48:24:WU01:FS00:0xa5:xtc file hash check passed.
09:48:24:WU01:FS00:0xa5:edr file hash check passed.
09:48:24:WU01:FS00:0xa5:logfile size: 414859
09:48:24:WU01:FS00:0xa5:Leaving Run
09:48:28:WU01:FS00:0xa5:- Writing 390878507 bytes of core data to disk...
09:49:39:WU01:FS00:0xa5:Done: 390877995 -> 378477591 (compressed to 8.9 percent)
09:49:39:WU01:FS00:0xa5:- Compressed data size (378477591) exceeds limit.
09:49:39:WU01:FS00:0xa5:- Error: Could not write out results to file
09:49:39:WU01:FS00:0xa5:- Shutting down core
09:49:39:WU01:FS00:0xa5:
09:49:39:WU01:FS00:0xa5:Folding@home Core Shutdown: FILE_IO_ERROR
09:49:39:WU01:FS00:FahCore returned: FILE_IO_ERROR (117 = 0x75)
09:49:39:WARNING:WU01:FS00:Fatal error, dumping
09:49:39:WU01:FS00:Sending unit results: id:01 state:SEND error:DUMPED project:6903 run:5 clone:13 gen:69 core:0xa5 unit:0x0000005452be746d4de923422e50378d
09:49:39:WU01:FS00:Uploading 512B to 130.237.232.237
09:49:39:WU01:FS00:Connecting to 130.237.232.237:8080
09:49:39:WU00:FS00:Starting
09:49:39:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 00 -suffix 01 -version 701 -checkpoint 15 -np 24
09:49:39:WU00:FS00:Started FahCore on PID 13497
09:49:39:Started thread 9 on PID 6637
09:49:39:WU00:FS00:Core PID:13501
09:49:39:WU00:FS00:FahCore 0xa5 started
09:49:40:WU01:FS00:Upload complete
09:49:40:WU01:FS00:Server responded WORK_ACK (400)
09:49:40:WU01:FS00:Cleaning up
09:49:40:WU00:FS00:0xa5:
09:49:40:WU00:FS00:0xa5:*------------------------------*
09:49:40:WU00:FS00:0xa5:Folding@Home Gromacs SMP Core
09:49:40:WU00:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:49:40:WU00:FS00:0xa5:
09:49:40:WU00:FS00:0xa5:Preparing to commence simulation
09:49:40:WU00:FS00:0xa5:- Looking at optimizations...
09:49:40:WU00:FS00:0xa5:- Created dyn
09:49:40:WU00:FS00:0xa5:- Files status OK
09:49:45:WU00:FS00:0xa5:- Expanded 46509365 -> 71843392 (decompressed 62.1 percent)
09:49:45:WU00:FS00:0xa5:Called DecompressByteArray: compressed_data_size=46509365 data_size=71843392, decompressed_data_size=71843392 diff=0
09:49:45:WU00:FS00:0xa5:- Digital signature verified
09:49:45:WU00:FS00:0xa5:
09:49:45:WU00:FS00:0xa5:Project: 6904 (Run 2, Clone 18, Gen 54)
09:49:45:WU00:FS00:0xa5:
09:49:45:WU00:FS00:0xa5:Assembly optimizations on if available.
09:49:45:WU00:FS00:0xa5:Entering M.D.
09:49:53:WU00:FS00:0xa5:Mapping NT from 24 to 24
09:49:58:WU00:FS00:0xa5:Completed 0 out of 13750000 steps (0%)
The last entry is nearly 8 hours old
This is what I see when I restart the service:
Code: Select all
cat /var/lib/fahclient/log.txt
*********************** Log Started 2012-02-13T17:24:16 ************************
17:24:16:************************* Folding@home Client *************************
17:24:16: Website: http://folding.stanford.edu/
17:24:16: Copyright: (c) 2009-2012 Stanford University
17:24:16: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:24:16: Args: --child --lifeline 31158 /etc/fahclient/config.xml --run-as
17:24:16: fahclient --pid-file=/var/run/fahclient.pid --daemon
17:24:16: Config: /etc/fahclient/config.xml
17:24:16:******************************** Build ********************************
17:24:16: Version: 7.1.43
17:24:16: Date: Jan 2 2012
17:24:16: Time: 04:27:48
17:24:16: SVN Rev: 3223
17:24:16: Branch: fah/trunk/client
17:24:16: Compiler: GNU 4.1.2 20080704 (Red Hat 4.1.2-46)
17:24:16: Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
17:24:16: -fno-unsafe-math-optimizations -msse2
17:24:16: Platform: linux2 2.6.18-164.11.1.el5
17:24:16: Bits: 64
17:24:16: Mode: Release
17:24:16:******************************* System ********************************
17:24:16: CPU: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
17:24:16: CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
17:24:16: CPUs: 24
17:24:16: Memory: 47.13GiB
17:24:16:Free Memory: 44.68GiB
17:24:16: Threads: POSIX_THREADS
17:24:16: On Battery: false
17:24:16: UTC offset: -5
17:24:16: PID: 31165
17:24:16:Started thread 1 on PID 31165
17:24:16: CWD: /var/lib/fahclient
17:24:16: OS: Linux 2.6.38.2-f x86_64
17:24:16: OS Arch: AMD64
17:24:16: GPUs: 2
17:24:16: GPU 0: UNSUPPORTED: Rage XL (Intel Corporation)
17:24:16: GPU 1: UNSUPPORTED: ES1000
17:24:16: CUDA: Not detected
17:24:16:***********************************************************************
17:24:16:<config>
17:24:16: <!-- Client Control -->
17:24:16: <cycle-rate v='4'/>
17:24:16: <cycles v='-1'/>
17:24:16: <data-directory v='.'/>
17:24:16: <disable-project-lookup v='false'/>
17:24:16: <exec-directory v='/usr/bin'/>
17:24:16: <exit-when-done v='false'/>
17:24:16: <threads v='4'/>
17:24:16:
17:24:16: <!-- Configuration -->
17:24:16: <config-rotate v='true'/>
17:24:16: <config-rotate-dir v='configs'/>
17:24:16: <config-rotate-max v='16'/>
17:24:16:
17:24:16: <!-- Debugging -->
17:24:16: <assignment-servers>
17:24:16: assign3.stanford.edu:8080 assign4.stanford.edu:80
17:24:16: </assignment-servers>
17:24:16: <capture-directory v='capture'/>
17:24:16: <capture-sockets v='false'/>
17:24:16: <debug-sockets v='false'/>
17:24:16: <exception-locations v='true'/>
17:24:16: <gpu-assignment-servers>
17:24:16: assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
17:24:16: </gpu-assignment-servers>
17:24:16: <stack-traces v='false'/>
17:24:16:
17:24:16: <!-- Error Handling -->
17:24:16: <max-slot-errors v='5'/>
17:24:16: <max-unit-errors v='5'/>
17:24:16:
17:24:16: <!-- FahCore Control -->
17:24:16: <checkpoint v='15'/>
17:24:16: <core-dir v='cores'/>
17:24:16: <core-priority v='idle'/>
17:24:16: <cpu-affinity v='false'/>
17:24:16: <cpu-usage v='100'/>
17:24:16: <no-assembly v='false'/>
17:24:16:
17:24:16: <!-- Folding Slot Configuration -->
17:24:16: <client-subtype v='LINUX'/>
17:24:16: <client-type v='bigadv'/>
17:24:16: <cpu-species v='X86_PENTIUM_II'/>
17:24:16: <cpu-type v='AMD64'/>
17:24:16: <cpus v='-1'/>
17:24:16: <cuda-index v='0'/>
17:24:16: <gpu v='false'/>
17:24:16: <gpu-usage v='100'/>
17:24:16: <max-packet-size v='big'/>
17:24:16: <opencl-index v='0'/>
17:24:16: <os-species v='UNKNOWN'/>
17:24:16: <os-type v='LINUX'/>
17:24:16: <project-key v='0'/>
17:24:16: <smp v='true'/>
17:24:16:
17:24:16: <!-- Logging -->
17:24:16: <log v='log.txt'/>
17:24:16: <log-color v='true'/>
17:24:16: <log-crlf v='false'/>
17:24:16: <log-date v='false'/>
17:24:16: <log-date-periodically v='21600'/>
17:24:16: <log-debug v='true'/>
17:24:16: <log-domain v='false'/>
17:24:16: <log-header v='true'/>
17:24:16: <log-level v='true'/>
17:24:16: <log-no-info-header v='true'/>
17:24:16: <log-redirect v='false'/>
17:24:16: <log-rotate v='true'/>
17:24:16: <log-rotate-dir v='logs'/>
17:24:16: <log-rotate-max v='16'/>
17:24:16: <log-short-level v='false'/>
17:24:16: <log-simple-domains v='true'/>
17:24:16: <log-thread-id v='false'/>
17:24:16: <log-thread-prefix v='true'/>
17:24:16: <log-time v='true'/>
17:24:16: <log-to-screen v='true'/>
17:24:16: <log-truncate v='false'/>
17:24:16: <verbosity v='7'/>
17:24:16:
17:24:16: <!-- Network -->
17:24:16: <proxy v=''/>
17:24:16: <proxy-enable v='false'/>
17:24:16: <proxy-pass v=''/>
17:24:16: <proxy-user v=''/>
17:24:16:
17:24:16: <!-- Process Control -->
17:24:16: <child v='true'/>
17:24:16: <daemon v='true'/>
17:24:16: <pid v='false'/>
17:24:16: <pid-file v='/var/run/fahclient.pid'/>
17:24:16: <respawn v='false'/>
17:24:16: <service v='false'/>
17:24:16:
17:24:16: <!-- Remote Command Server -->
17:24:16: <command-address v='0.0.0.0'/>
17:24:16: <command-allow v='127.0.0.1'/>
17:24:16: <command-allow-no-pass v='127.0.0.1'/>
17:24:16: <command-deny v='0.0.0.0/0'/>
17:24:16: <command-deny-no-pass v='0.0.0.0/0'/>
17:24:16: <command-port v='36330'/>
17:24:16:
17:24:16: <!-- Slot Control -->
17:24:16: <max-shutdown-wait v='60'/>
17:24:16: <pause-on-battery v='false'/>
17:24:16: <pause-on-start v='false'/>
17:24:16:
17:24:16: <!-- User Information -->
17:24:16: <machine-id v='0'/>
17:24:16: <passkey v='********************************'/>
17:24:16: <team v='31574'/>
17:24:16: <user v='Skeptical_Thinker'/>
17:24:16:
17:24:16: <!-- Work Unit Control -->
17:24:16: <dump-after-deadline v='true'/>
17:24:16: <max-queue v='16'/>
17:24:16: <max-units v='0'/>
17:24:16: <next-unit-percentage v='99'/>
17:24:16:
17:24:16: <!-- Folding Slots -->
17:24:16:</config>
17:24:16:Switching to user fahclient
17:24:16:Trying to access database...
17:24:16:Successfully acquired database lock
17:24:16:Enabled folding slot 00: READY smp:24
17:24:17:Started thread 3 on PID 31165
17:24:17:WU00:FS00:Starting
17:24:17:Started thread 5 on PID 31165
17:24:17:Started thread 6 on PID 31165
17:24:17:Started thread 4 on PID 31165
17:24:17:Started thread 7 on PID 31165
17:24:17:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a5.fah/FahCore_a5 -dir 00 -suffix 01 -version 701 -checkpoint 15 -np 24
17:24:17:WU00:FS00:Started FahCore on PID 31173
17:24:17:Started thread 8 on PID 31165
17:24:17:WU00:FS00:Core PID:31177
17:24:17:WU00:FS00:FahCore 0xa5 started
17:24:17:WU00:FS00:0xa5:
17:24:17:WU00:FS00:0xa5:*------------------------------*
17:24:17:WU00:FS00:0xa5:Folding@Home Gromacs SMP Core
17:24:17:WU00:FS00:0xa5:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
17:24:17:WU00:FS00:0xa5:
17:24:17:WU00:FS00:0xa5:Preparing to commence simulation
17:24:17:WU00:FS00:0xa5:- Looking at optimizations...
17:24:17:WU00:FS00:0xa5:- Files status OK
17:24:22:WU00:FS00:0xa5:- Expanded 46509365 -> 71843392 (decompressed 62.1 percent)
17:24:22:WU00:FS00:0xa5:Called DecompressByteArray: compressed_data_size=46509365 data_size=71843392, decompressed_data_size=71843392 diff=0
17:24:23:WU00:FS00:0xa5:- Digital signature verified
17:24:23:WU00:FS00:0xa5:
17:24:23:WU00:FS00:0xa5:Project: 6904 (Run 2, Clone 18, Gen 54)
17:24:23:WU00:FS00:0xa5:
17:24:23:WU00:FS00:0xa5:Assembly optimizations on if available.
17:24:23:WU00:FS00:0xa5:Entering M.D.
17:24:29:WU00:FS00:0xa5:Using Gromacs checkpoints
17:24:33:WU00:FS00:0xa5:Mapping NT from 24 to 24
17:24:42:WU00:FS00:0xa5:Resuming from checkpoint
17:24:58:WU00:FS00:0xa5:Verified 00/wudata_01.log
17:25:00:WU00:FS00:0xa5:Verified 00/wudata_01.trr
17:25:00:WU00:FS00:0xa5:Verified 00/wudata_01.xtc
17:25:00:WU00:FS00:0xa5:Verified 00/wudata_01.edr
17:25:03:WU00:FS00:0xa5:Completed 24765 out of 13750000 steps (0%)
Is this WU really expected to take 173.5 days?
Re: WU dumped after completion and next one seems to be hung
Posted: Mon Feb 13, 2012 6:00 pm
by 7im
No, the ETA's on V7 are currently hosed. From bug ticket history, it looks like the next beta version will have much improved ETA and PPD information.
It also looks like PG is still having issues with the 6903/6904 WUs. There was a combined thread on that here. Maybe watch that thread for updates/ideas.
Re: Merged problems with projects 6903/6904
Posted: Wed Feb 15, 2012 4:47 pm
by kasson
Bad WU's should be offline but the rest of the project should be up and assigning now. Please post if you see further problems. Thanks.
Re: Merged problems with projects 6903/6904
Posted: Wed Feb 15, 2012 5:31 pm
by Nathan_P
6903 seems to be assigning fine, i had a string of SMP last night and this morning but picked up a 6903 about 7 hours ago
6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps
Posted: Sun Feb 19, 2012 6:51 pm
by Leonardo
Project 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps, hung at 0 completed for nearly four hours.
Linux, Client Version 6.34
Code: Select all
[14:46:51] Completed 250000 out of 250000 steps (100%)
...
[14:48:11] Sending work to server
...
[14:48:13] Connecting to http://130.237.232.237:8080/
[14:48:25] Posted data.
[14:48:25] Initial: 0000; - Receiving payload (expected size: 46513362)
...
[14:49:22] Project: 6903 (Run 2, Clone 13, Gen 39)
[14:49:22]...
[14:49:32] Completed 0 out of 10000000 steps (0%)
I stopped the client at approximately 18:30 and deleted work, queue.dat, machinedependent.dat, and unitinfo.txt. Upon client restart, the system downloaded a fresh unit and started processing normally, as far as I could tell, based on CPU core temps and utilization.
Re: 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps
Posted: Sun Feb 19, 2012 7:00 pm
by Grandpa_01
Congrats you are the winner,
you got the first on since the fix. Hopefully Kasson will be able to catch it and remove it before it spreads too far. Just delete it and move on. You could fold it if you wanted to it will only take a month or several then it will not send after it completes. Just joking about the last part.
Edit
I am surprised that one is still floating around I had it a while back the report was merged into another thread. I thought Kasson had jot all of them. viewtopic.php?f=19&t=20692&start=0#p206671
Re: 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps
Posted: Sun Feb 19, 2012 7:07 pm
by Leonardo
Is there a prize, Grandpa?
EDIT: Moderators, sorry about not posting in the existing thread. Without my morning coffee, I couldn't find the subject thread's location.
Re: 6903 (Run 2, Clone 13, Gen 39); 10,000,000 steps
Posted: Sun Feb 19, 2012 7:09 pm
by Grandpa_01
Yes there is you get 0 points for the CPU cycles you used.
Re: Merged problems with projects 6903/6904
Posted: Mon Feb 20, 2012 6:17 am
by kasson
Thanks--stopped that one. Sorry you encountered it.
Re: Merged problems with projects 6903/6904
Posted: Fri Feb 24, 2012 3:54 pm
by ChelseaOilman
I got Project: 6903 (Run 6, Clone 0, Gen 72) last night which was reported by Amaruk on page 2 of this thread back on Feb. 9.
Code: Select all
[20:48:38] Project: 6903 (Run 6, Clone 0, Gen 72)
[20:48:38]
[20:48:38] Assembly optimizations on if available.
[20:48:38] Entering M.D.
[20:48:46] Mapping NT from 48 to 48
[20:48:51] Completed 0 out of 500000 steps (0%)
[21:05:30] g NT from 48 to 48
[21:06:34] Resuming from checkpoint
[21:07:05] Verified work/wudata_06.log
[21:07:05] Verified work/wudata_06.trr
[21:07:05] Verified work/wudata_06.xtc
[21:07:05] Verified work/wudata_06.edr
[21:07:06] Completed 2615 out of 500000 steps (0%)
[21:19:35] Completed 5000 out of 500000 steps (1%)
[21:45:57] Completed 10000 out of 500000 steps (2%)
[22:12:18] Completed 15000 out of 500000 steps (3%)
[22:38:45] Completed 20000 out of 500000 steps (4%)
[23:05:01] Completed 25000 out of 500000 steps (5%)
Re: Merged problems with projects 6903/6904
Posted: Sat Feb 25, 2012 7:04 pm
by bruce
ChelseaOilman wrote:I got Project: 6903 (Run 6, Clone 0, Gen 72) last night which was reported by Amaruk on page 2 of this thread back on Feb. 9.
As I understand it, this is good. The bad projects had too many steps and they've been corrected to have only 500 000 steps.
Re: Merged problems with projects 6903/6904
Posted: Sat Feb 25, 2012 7:20 pm
by Joe_H
bruce wrote:ChelseaOilman wrote:I got Project: 6903 (Run 6, Clone 0, Gen 72) last night which was reported by Amaruk on page 2 of this thread back on Feb. 9.
As I understand it, this is good. The bad projects had too many steps and they've been corrected to have only 500 000 steps.
Not good as the correct number of steps is supposed to be 250,000. This was one of the least wrong at only 2x.
Re: Merged problems with projects 6903/6904
Posted: Sat Feb 25, 2012 7:38 pm
by ChelseaOilman
Joe_H wrote:Not good as the correct number of steps is supposed to be 250,000. This was one of the least wrong at only 2x.
I believe that's correct. This WU has been out for a while. Dr. Kasson should have caught it on his first go around.
Re: Merged problems with projects 6903/6904
Posted: Sun Feb 26, 2012 7:16 pm
by Schmidde
No, the Project: 6903 (Run 6, Clone 0, Gen 72) is a "bad" WU.
I´m folding it actually self with that 500 000 steps. Normally are 250 000 steps.
Please fix, it´s disappointing folding 4-5 Days and only geht Base Points.