Project: 3866 (Run 589, Clone 0, Gen 6) mdrun returned 255

Moderators: Site Moderators, FAHC Science Team

Post Reply
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 2600K@4.2 GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 HT@3.2 GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Project: 3866 (Run 589, Clone 0, Gen 6) mdrun returned 255

Post by GreyWhiskers »

I had been working on this WU for a couple of days, and it seemed stable. I had two reboots fairly close together this morning for Windows updates - one automatic overnight, which seems to have come up fine, and another this morning when I manually selected a huge update for MS Office 2010 SP1.

The Windows restart after the Office update seemed to be taking a long time - I guess it was configuring Office in the background as it started up. I just left everything alone - other than to start Windows Task manager. Eventually, the FAHControl started up - and it tried to restart both the Core 11 GPU WU and the Core a6 Uniprocessor WU.

It came up almost right after the reboot with:
16:43:19:Unit 01:mdrun returned 255
16:43:19:Unit 01:Going to send back what have done -- stepsTotalG=250000
16:43:19:Unit 01:Work fraction=0.5305 steps=250000.

It seemed, finally, to upload partial results to the server, after some additional unfamiliar activities - trying to restart the core after seemingly abandoning it, reporting on "MISSING_WORK_FILES (116)". Finally, it seemed satisfied with getting rid of the old WU, and, wonder of wonders, picked up another 7 day marathon p3866.

The log does report on an "UNSTABLE MACHINE" - which may have come from all the things that were going on during restart. The CPU isn't overclocked (very hard to do with a commodity HP COSTCO desktop).

At the moment, it seems stable. The recovered Core 11 GPU WU is processing along normally. I haven't gotten a completed frame on the new 3866 yet - the TPF is about one hour 25 minutes. The task manager is showing the Core a6 to be getting about 40% with Chrome and an instance of MS Word also running (this is a Pent 4/HT uniprocessor, so one full thread will get 50% of the CPU in TaskMan. )

Code: Select all

*********************** Log Started 29/Jun/2011-16:42:05 ***********************
16:42:05:************************* Folding@home Client *************************
16:42:05:      Website: http://folding.stanford.edu/
16:42:05:    Copyright: (c) 2009,2010 Stanford University
16:42:05:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:42:05:         Args: --lifeline 5276
16:42:05:       Config: C:/Documents and Settings/All Users/Application
16:42:05:               Data/FAHClient/config.xml
16:42:05:******************************** Build ********************************
16:42:05:      Version: 7.1.21
16:42:05:         Date: Mar 23 2011
16:42:05:         Time: 15:13:48
16:42:05:      SVN Rev: 2883
16:42:05:       Branch: fah/trunk/client
16:42:05:     Compiler: Intel(R) C++ MSVC 1500 mode 1110
16:42:05:      Options: /TP /nologo /EHa /wd4297 /wd4103 /wd1786 /Ox -arch:SSE
16:42:05:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
16:42:05:     Platform: win32 XP
16:42:05:         Bits: 32
16:42:05:         Mode: Release
16:42:05:******************************* System ********************************
16:42:05:           OS: Microsoft Windows XP Home Edition
16:42:05:          CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz
16:42:05:       CPU ID: GenuineIntel Family 15 Model 2 Stepping 9
16:42:05:         CPUs: 2
16:42:05:       Memory: 2.00GiB
16:42:05:  Free Memory: 1.03GiB
16:42:05:      Threads: WINDOWS_THREADS
16:42:05:         GPUs: 2
16:42:05:        GPU 0: ATI:2 RV730 Pro AGP [Radeon HD 4600 Series]
16:42:05:        GPU 1: RV710/730
16:42:05:         CUDA: Not detected
16:42:05:   On Battery: false
16:42:05:   UTC offset: -7
16:42:05:          PID: 5752
16:42:05:          CWD: C:/Documents and Settings/All Users/Application Data/FAHClient
16:42:05:Win32 Service: false
16:42:05:***********************************************************************
16:42:05:<config>
16:42:05:  <service-description v='Folding@home Client'/>
16:42:05:  <service-restart v='true'/>
16:42:05:  <service-restart-delay v='5000'/>
16:42:05:
16:42:05:  <!-- Client Control -->
16:42:05:  <cycle-rate v='4'/>
16:42:05:  <cycles v='-1'/>
16:42:05:  <data-directory v='.'/>
16:42:05:  <exec-directory v='C:\Program Files\FAHClient'/>
16:42:05:  <exit-when-done v='false'/>
16:42:05:  <max-delay v='21600'/>
16:42:05:  <min-delay v='60'/>
16:42:05:  <threads v='4'/>
16:42:05:
16:42:05:  <!-- Configuration -->
16:42:05:  <config-rotate v='true'/>
16:42:05:  <config-rotate-dir v='configs'/>
16:42:05:  <config-rotate-max v='16'/>
16:42:05:
16:42:05:  <!-- Debugging -->
16:42:05:  <assignment-servers>
16:42:05:    assign3.stanford.edu:8080 assign4.stanford.edu:80
16:42:05:  </assignment-servers>
16:42:05:  <capture-directory v='capture'/>
16:42:05:  <capture-sockets v='false'/>
16:42:05:  <debug-sockets v='false'/>
16:42:05:  <exception-locations v='true'/>
16:42:05:  <gpu-assignment-servers>
16:42:05:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
16:42:05:  </gpu-assignment-servers>
16:42:05:  <stack-traces v='false'/>
16:42:05:
16:42:05:  <!-- Error Handling -->
16:42:05:  <max-slot-errors v='5'/>
16:42:05:  <max-unit-errors v='5'/>
16:42:05:
16:42:05:  <!-- FahCore Control -->
16:42:05:  <checkpoint v='15'/>
16:42:05:  <core-dir v='cores'/>
16:42:05:  <core-priority v='idle'/>
16:42:05:  <cpu-affinity v='false'/>
16:42:05:  <cpu-usage v='100'/>
16:42:05:  <no-assembly v='false'/>
16:42:05:
16:42:05:  <!-- Folding Slot Configuration -->
16:42:05:  <client-subtype v='STDCLI'/>
16:42:05:  <client-type v='normal'/>
16:42:05:  <cpu-species v='UNKNOWN'/>
16:42:05:  <cpu-type v='X86'/>
16:42:05:  <cpus v='2'/>
16:42:05:  <extra-core-args v='-forceasm'/>
16:42:05:  <gpu v='false'/>
16:42:05:  <gpu-id v='0'/>
16:42:05:  <max-packet-size v='normal'/>
16:42:05:  <os-species v='WIN_XP'/>
16:42:05:  <os-type v='WIN32'/>
16:42:05:  <project-key v='0'/>
16:42:05:  <smp v='true'/>
16:42:05:
16:42:05:  <!-- Logging -->
16:42:05:  <log v='log.txt'/>
16:42:05:  <log-color v='false'/>
16:42:05:  <log-crlf v='true'/>
16:42:05:  <log-date v='false'/>
16:42:05:  <log-debug v='true'/>
16:42:05:  <log-domain v='false'/>
16:42:05:  <log-header v='true'/>
16:42:05:  <log-level v='true'/>
16:42:05:  <log-no-info-header v='true'/>
16:42:05:  <log-redirect v='false'/>
16:42:05:  <log-rotate v='true'/>
16:42:05:  <log-rotate-dir v='logs'/>
16:42:05:  <log-rotate-max v='16'/>
16:42:05:  <log-short-level v='false'/>
16:42:05:  <log-simple-domains v='true'/>
16:42:05:  <log-thread-id v='false'/>
16:42:05:  <log-time v='true'/>
16:42:05:  <log-to-screen v='true'/>
16:42:05:  <log-truncate v='false'/>
16:42:05:  <verbosity v='4'/>
16:42:05:
16:42:05:  <!-- Process Control -->
16:42:05:  <child v='false'/>
16:42:05:  <daemon v='false'/>
16:42:05:  <pid v='false'/>
16:42:05:  <pid-file v='Folding@home Client.pid'/>
16:42:05:  <respawn v='false'/>
16:42:05:  <service v='false'/>
16:42:05:
16:42:05:  <!-- Remote Command Server -->
16:42:05:  <command-address v='0.0.0.0'/>
16:42:05:  <command-allow v='127.0.0.1'/>
16:42:05:  <command-allow-no-pass v='127.0.0.1'/>
16:42:05:  <command-deny v='0.0.0.0/0'/>
16:42:05:  <command-deny-no-pass v='0.0.0.0/0'/>
16:42:05:  <command-port v='36330'/>
16:42:05:  <password v=''/>
16:42:05:
16:42:05:  <!-- Slot Control -->
16:42:05:  <max-shutdown-wait v='60'/>
16:42:05:  <pause-on-battery v='false'/>
16:42:05:  <pause-on-start v='false'/>
16:42:05:
16:42:05:  <!-- User Information -->
16:42:05:  <machine-id v='0'/>
16:42:05:  <passkey v='********************************'/>
16:42:05:  <team v='0'/>
16:42:05:  <user v='GreyWhiskers'/>
16:42:05:
16:42:05:  <!-- Work Unit Control -->
16:42:05:  <dump-after-deadline v='true'/>
16:42:05:  <max-queue v='16'/>
16:42:05:  <max-units v='0'/>
16:42:05:  <next-unit-percentage v='99'/>
16:42:05:
16:42:05:  <!-- Folding Slots -->
16:42:05:  <slot id='1' type='GPU'>
16:42:05:    <client-type v='advanced'/>
16:42:05:  </slot>
16:42:05:  <slot id='0' type='UNIPROCESSOR'>
16:42:05:    <core-priority v='low'/>
16:42:05:  </slot>
16:42:05:</config>
16:42:10:Enabled folding slot 01: READY gpu:0:"RV730 Pro AGP [Radeon HD 4600 Series]"
16:42:10:Enabled folding slot 00: READY uniprocessor
16:42:11:Starting Unit 01
16:42:11:Running core: "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a6.fah/FahCore_a6.exe" -dir 01 -suffix 01 -lifeline 5752 -version 701 -checkpoint 15 -forceasm
16:42:11:Started core on PID 5312
16:42:11:FahCore 0xa6 started
16:42:11:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1:1070
16:42:12:Starting Unit 00
16:42:12:Running core: "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/ATI/R600/Core_11.fah/FahCore_11.exe" -dir 00 -suffix 01 -lifeline 5752 -version 701 -checkpoint 15 -gpu 0 -forceasm
16:42:12:Unit 01:
16:42:12:Unit 01:*------------------------------*
16:42:12:Unit 01:Folding@Home Gromacs Core
16:42:12:Unit 01:Version 2.28 (Wed Mar 23 13:51:17 PDT 2011)
16:42:12:Unit 01:
16:42:12:Unit 01:Preparing to commence simulation
16:42:12:Unit 01:- Ensuring status. Please wait.
16:42:12:Started core on PID 5404
16:42:12:FahCore 0x11 started
16:42:14:Unit 00:
16:42:14:Unit 00:*------------------------------*
16:42:14:Unit 00:Folding@Home GPU Core - Beta
16:42:14:Unit 00:Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
16:42:14:Unit 00:
16:42:14:Unit 00:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
16:42:14:Unit 00:Build host: amoeba
16:42:14:Unit 00:Board Type: AMD
16:42:14:Unit 00:Core      : 
16:42:14:Unit 00:Preparing to commence simulation
16:42:14:Unit 00:- Ensuring status. Please wait.
16:42:22:Unit 01:- Assembly optimizations manually forced on.
16:42:22:Unit 01:- Not checking prior termination.
16:42:24:Unit 00:- Assembly optimizations manually forced on.
16:42:24:Unit 00:- Not checking prior termination.
16:42:25:Unit 00:- Expanded 96745 -> 489152 (decompressed 505.6 percent)
16:42:25:Unit 00:Called DecompressByteArray: compressed_data_size=96745 data_size=489152, decompressed_data_size=489152 diff=0
16:42:25:Unit 00:- Digital signature verified
16:42:25:Unit 00:
16:42:25:Unit 00:Project: 5738 (Run 0, Clone 629, Gen 487)
16:42:25:Unit 00:
16:42:25:Unit 00:Assembly optimizations on if available.
16:42:26:Unit 01:- Expanded 1487293 -> 2410840 (decompressed 162.0 percent)
16:42:26:Unit 01:Called DecompressByteArray: compressed_data_size=1487293 data_size=2410840, decompressed_data_size=2410840 diff=0
16:42:26:Unit 01:- Digital signature verified
16:42:26:Unit 00:Entering M.D.
16:42:26:Unit 01:
16:42:26:Unit 01:Project: 3866 (Run 589, Clone 0, Gen 6)
16:42:26:Unit 01:
16:42:28:Unit 01:Assembly optimizations on if available.
16:42:28:Unit 01:Entering M.D.
16:42:32:Unit 00:Will resume from checkpoint file
16:42:32:Unit 00:Tpr hash 00/wudata_01.tpr:  2748390342 573898720 1243433966 1251171428 1178523977
16:42:34:Unit 01:Using Gromacs checkpoints
16:42:36:Unit 01:Mapping NT from 1 to 1 
16:42:45:Unit 01:Resuming from checkpoint
16:42:45:Unit 01:Verified 01/wudata_01.log
16:42:46:Unit 01:Verified 01/wudata_01.trr
16:42:46:Unit 01:Verified 01/wudata_01.xtc
16:42:47:Unit 01:Verified 01/wudata_01.edr
16:42:49:Unit 00:Working on Protein
16:42:50:Unit 01:Completed 132610 out of 250000 steps  (53%)
16:42:51:Unit 00:Client config unavailable.
16:42:51:Unit 00:Starting GUI Server
16:43:07:Unit 00:Resuming from checkpoint
16:43:07:Unit 00:fcCheckPointResume: retreived and current tpr file hash:
16:43:07:Unit 00:   0   2748390342   2748390342
16:43:07:Unit 00:   1    573898720    573898720
16:43:07:Unit 00:   2   1243433966   1243433966
16:43:07:Unit 00:   3   1251171428   1251171428
16:43:07:Unit 00:   4   1178523977   1178523977
16:43:07:Unit 00:Verified 00/wudata_01.log
16:43:07:Unit 00:Verified 00/wudata_01.edr
16:43:07:Unit 00:Verified 00/wudata_01.xtc
16:43:07:Unit 00:Completed 9%
16:43:19:Unit 01:mdrun returned 255
16:43:19:Unit 01:Going to send back what have done -- stepsTotalG=250000
16:43:19:Unit 01:Work fraction=0.5305 steps=250000.
16:43:23:Unit 01:logfile size=99083 infoLength=99083 edr=0 trr=25
16:43:23:Unit 01:logfile size: 99083 info=99083 bed=0 hdr=25
16:43:25:Server connection id=2 on 0.0.0.0:36330 from 192.168.10.193:4017
16:43:25:Server connection id=3 on 0.0.0.0:36330 from 127.0.0.1:1075
16:43:28:Unit 01:- Writing 99621 bytes of core data to disk...
16:43:51:Unit 01:Done: 99109 -> 14973 (compressed to 15.1 percent)
16:43:51:Unit 01:  ... Done.
16:44:31:Unit 01:
16:44:31:Unit 01:Folding@home Core Shutdown: UNSTABLE_MACHINE
16:44:32:FahCore running Unit 01 returned: UNSTABLE_MACHINE (122)
16:44:32:Starting Unit 01
16:44:32:Running core: "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a6.fah/FahCore_a6.exe" -dir 01 -suffix 01 -lifeline 5752 -version 701 -checkpoint 15 -forceasm
16:44:32:Started core on PID 4448
16:44:32:FahCore 0xa6 started
16:44:33:Unit 01:
16:44:33:Unit 01:*------------------------------*
16:44:33:Unit 01:Folding@Home Gromacs Core
16:44:33:Unit 01:Version 2.28 (Wed Mar 23 13:51:17 PDT 2011)
16:44:33:Unit 01:
16:44:33:Unit 01:Preparing to commence simulation
16:44:33:Unit 01:- Assembly optimizations manually forced on.
16:44:33:Unit 01:- Not checking prior termination.
16:44:33:Unit 01:Error: Missing work file=<>
16:44:33:Unit 01:
16:44:33:Unit 01:Folding@home Core Shutdown: MISSING_WORK_FILES
16:44:33:FahCore running Unit 01 returned: MISSING_WORK_FILES (116)
16:44:33:WARNING: Unit 01 Fatal error, dumping
16:44:34:Sending unit results: id:01 state:SEND project:3866 run:589 clone:0 gen:6 core:0xa6 unit:0x00000006000000654dd9e38048bcfd97
16:44:35:Unit 01: Uploading 15.12KiB
16:44:35:Connecting to assign3.stanford.edu:8080
16:44:35:Connecting to 128.143.48.226:8080
16:44:35:News: Welcome to Folding@Home
16:44:35:Assigned to work server 128.143.48.226
16:44:35:Requesting new work unit for slot 00: READY uniprocessor from 128.143.48.226
16:44:35:Connecting to 128.143.48.226:8080
16:44:35:Unit 01: Upload complete
16:44:35:Server responded UNKNOWN_ENUM (575)
16:44:35:WARNING: Failed to send results, will try again later
16:44:37:Sending unit results: id:01 state:SEND project:3866 run:589 clone:0 gen:6 core:0xa6 unit:0x00000006000000654dd9e38048bcfd97
16:44:38:Unit 01: Uploading 15.12KiB
16:44:38:Slot 00: Downloading 1.42MiB
16:44:38:Connecting to 128.143.48.226:8080
16:44:38:Unit 01: Upload complete
16:44:38:Server responded WORK_ACK (400)
16:44:39:Cleaning up Unit 01
16:44:40:Slot 00: Download complete
16:44:42:Received Unit: id:02 state:DOWNLOAD project:3866 run:2360 clone:0 gen:4 core:0xa6 unit:0x00000006000000654dd9e829ee8bc3a7
16:44:43:Starting Unit 02
16:44:43:Running core: "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a6.fah/FahCore_a6.exe" -dir 02 -suffix 01 -lifeline 5752 -version 701 -checkpoint 15 -forceasm
16:44:43:Started core on PID 5976
16:44:43:FahCore 0xa6 started
16:44:44:Unit 02:
16:44:44:Unit 02:*------------------------------*
16:44:44:Unit 02:Folding@Home Gromacs Core
16:44:44:Unit 02:Version 2.28 (Wed Mar 23 13:51:17 PDT 2011)
16:44:44:Unit 02:
16:44:44:Unit 02:Preparing to commence simulation
16:44:44:Unit 02:- Assembly optimizations manually forced on.
16:44:44:Unit 02:- Not checking prior termination.
16:44:44:Unit 02:- Expanded 1488373 -> 2408452 (decompressed 161.8 percent)
16:44:44:Unit 02:Called DecompressByteArray: compressed_data_size=1488373 data_size=2408452, decompressed_data_size=2408452 diff=0
16:44:44:Unit 02:- Digital signature verified
16:44:44:Unit 02:
16:44:44:Unit 02:Project: 3866 (Run 2360, Clone 0, Gen 4)
16:44:44:Unit 02:
16:44:45:Unit 02:Assembly optimizations on if available.
16:44:45:Unit 02:Entering M.D.
16:44:51:Unit 02:Mapping NT from 1 to 1 
16:44:55:Unit 02:Completed 0 out of 250000 steps  (0%)
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 3866 (Run 589, Clone 0, Gen 6) mdrun returned 2

Post by PantherX »

Nothing in the WU Database yet:
No data back from query
I have marked it for a follow-up.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply