Core 17 has suddenly started crashing

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

bollix47
Posts: 2951
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Core 17 has suddenly started crashing

Post by bollix47 »

Any recent Windows updates? Try a clean install of the appropriate drivers from nvidia.
Image
Eagle
Posts: 116
Joined: Sun Feb 17, 2008 1:06 am
Hardware configuration: AMD Ryzen ThreadRipper 2950X (3.5 GHz)
ASUS Prime X399-A
G.Skill 32 GB DDR4-RAM (3.2 GHz)
EVGA GeForce RTX 2080 Ti Black (1.8 / 7 GHz)
Samsung 970 Pro 1 TB, 850 Pro 512 GB, Crucial C300 256 GB
Western Digital Black 2 TB, Gold 4 TB
Location: » Earth » Europe » Germany
Contact:

Re: Core 17 has suddenly started crashing

Post by Eagle »

Yes, but mostly Windows Defender definition updates, a couple of Office updates and a handful with regards to security.

Re-installing didn't help. At first, I believed the newer nVidia drivers where the cause, but even going back to 335.23 didn't resolve the issue (although it worked back in March/April). When did FAH update from 0.0.49 to 0.0.52?
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
Image
Joe_H
Site Admin
Posts: 7900
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Core 17 has suddenly started crashing

Post by Joe_H »

Eagle wrote:Regarding the error, it doesn't happen at exactly 0%. It's mostly around 1% when the error happens - if that makes a difference.
From the log, the WU never reaches 1%. In fact it never started processing beyond the setup phase. Do you have examples of WU's that fail at 1%? If you are going by the percentage displayed in the FAHControl or Web Control display, that can be off when a WU is just started or restarted and especially where errors are being processed.

As for the WU, it has no entries in the database yet. So it is too early to tell if it is a bad WU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Core 17 has suddenly started crashing

Post by P5-133XL »

There is no point in bigadv with a CPU limit of 7 when the minimum CPU cores are 24.

I do not know why you want to limit the number of logs to 5 but is irrelevant to the issue.

OC'ing would be the obvious likeihood.
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Core 17 has suddenly started crashing

Post by 7im »

Eagle wrote:Yes, but mostly Windows Defender definition updates, a couple of Office updates and a handful with regards to security.

Re-installing didn't help. At first, I believed the newer nVidia drivers where the cause, but even going back to 335.23 didn't resolve the issue (although it worked back in March/April). When did FAH update from 0.0.49 to 0.0.52?
Windows updates, if left to automatic updates, will sometimes update the NV driver with the MS version instead of the NV version. NV version is needed. For your card, the latest version is fine.

That core update was a while ago, but may not have been required on your system, depending on the WUs you were folding. So for any client, could have been a few days ago, or a few months ago.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Eagle
Posts: 116
Joined: Sun Feb 17, 2008 1:06 am
Hardware configuration: AMD Ryzen ThreadRipper 2950X (3.5 GHz)
ASUS Prime X399-A
G.Skill 32 GB DDR4-RAM (3.2 GHz)
EVGA GeForce RTX 2080 Ti Black (1.8 / 7 GHz)
Samsung 970 Pro 1 TB, 850 Pro 512 GB, Crucial C300 256 GB
Western Digital Black 2 TB, Gold 4 TB
Location: » Earth » Europe » Germany
Contact:

Re: Core 17 has suddenly started crashing

Post by Eagle »

Joe_H wrote:From the log, the WU never reaches 1%. In fact it never started processing beyond the setup phase. Do you have examples of WU's that fail at 1%? If you are going by the percentage displayed in the FAHControl or Web Control display, that can be off when a WU is just started or restarted and especially where errors are being processed.

As for the WU, it has no entries in the database yet. So it is too early to tell if it is a bad WU.
Thanks for the clarification - indeed, I judged from the values displayed within FAHControl. I just watched it reaching right above 2% before failing and no percentage within the log:

Code: Select all

17:57:45:WU00:FS01:Starting
17:57:45:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
17:57:45:WU00:FS01:Started FahCore on PID 7480
17:57:45:WU00:FS01:Core PID:2444
17:57:45:WU00:FS01:FahCore 0x17 started
17:57:46:WU00:FS01:0x17:*********************** Log Started 2014-06-02T17:57:45Z ***********************
17:57:46:WU00:FS01:0x17:Project: 13000 (Run 1869, Clone 0, Gen 16)
17:57:46:WU00:FS01:0x17:Unit: 0x00000028538b3db75311ac46997470b8
17:57:46:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
17:57:46:WU00:FS01:0x17:Machine: 1
17:57:46:WU00:FS01:0x17:Reading tar file state.xml
17:57:46:WU00:FS01:0x17:Reading tar file system.xml
17:57:47:WU00:FS01:0x17:Reading tar file integrator.xml
17:57:47:WU00:FS01:0x17:Reading tar file core.xml
17:57:47:WU00:FS01:0x17:Digital signatures verified
17:57:47:WU00:FS01:0x17:Folding@home GPU core17
17:57:47:WU00:FS01:0x17:Version 0.0.52
17:58:29:WU02:FS00:0xa3:Completed 355000 out of 500000 steps  (71%)
18:01:37:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:ERROR:103: Lost client lifeline
18:01:37:WU00:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
18:01:38:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
18:01:38:WARNING:WU00:FS01:FahCore returned: CLIENT_DIED (103 = 0x67)
18:01:38:WARNING:WU00:FS01:Too many errors, failing
18:01:38:WU00:FS01:Sending unit results: id:00 state:SEND error:FAILED project:13000 run:1869 clone:0 gen:16 core:0x17 unit:0x00000028538b3db75311ac46997470b8
18:01:38:WU00:FS01:Connecting to 140.163.4.231:8080
18:01:38:WU01:FS01:Connecting to 171.67.108.201:80
18:01:38:WU00:FS01:Server responded WORK_ACK (400)
18:01:38:WU00:FS01:Cleaning up
So, you're certainly right with 0%. I apologize for that.

But if others are failing too and hence aren't able to fill the database with entries - how can the FAH staff differentiate between them?
P5-133XL wrote:There is no point in bigadv with a CPU limit of 7 when the minimum CPU cores are 24.

I do not know why you want to limit the number of logs to 5 but is irrelevant to the issue.

OC'ing would be the obvious likeihood.
"bigadv" was once required to get the newer FAH core. I assume it can be safely removed then?
When I set up FAH 7, this was suggested to prevent the log directory from getting filled with dozens of log files.

As I never OC'ed the card and it worked just fine until a couple of days/weeks, this seems a bit strange to me.
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
Image
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Core 17 has suddenly started crashing

Post by P5-133XL »

Check temps?
Image
Eagle
Posts: 116
Joined: Sun Feb 17, 2008 1:06 am
Hardware configuration: AMD Ryzen ThreadRipper 2950X (3.5 GHz)
ASUS Prime X399-A
G.Skill 32 GB DDR4-RAM (3.2 GHz)
EVGA GeForce RTX 2080 Ti Black (1.8 / 7 GHz)
Samsung 970 Pro 1 TB, 850 Pro 512 GB, Crucial C300 256 GB
Western Digital Black 2 TB, Gold 4 TB
Location: » Earth » Europe » Germany
Contact:

Re: Core 17 has suddenly started crashing

Post by Eagle »

Constantly being monitored, 51 °C right now, at heavy times around 67 °C.
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
Image
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Core 17 has suddenly started crashing

Post by P5-133XL »

Those are not problem temps...
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Core 17 has suddenly started crashing

Post by PantherX »

Eagle wrote:...When did FAH update from 0.0.49 to 0.0.52?
It was a while back, 7 November 2013 (https://folding.stanford.edu/home/chang ... -full-fah/). However, the minimum FahCore_17 version requirement is determined by the WU so if you get a WU which required version 0.0.52, it will automatically update FahCore_17.
Eagle wrote:..."bigadv" was once required to get the newer FAH core. I assume it can be safely removed then?...
Pretty sure that it was advanced when FahCore_17 was released for the first time (https://folding.stanford.edu/home/welco ... core-17-2/). You can now remove it. Moreover, the setting was in the CPU Slot so wouldn't have effected the GPU Slot.
Eagle wrote:...When I set up FAH 7, this was suggested to prevent the log directory from getting filled with dozens of log files...
Humm, not sure why that is since by default, there can be a maximum of 17 log files, the log file in the FAHClient folder which is the current one and the previous 16 log files in the logs folder. The oldest one is automatically deleted and a new log file is generated every time FAHClient is restarted.
Eagle wrote:..."As I never OC'ed the card and it worked just fine until a couple of days/weeks, this seems a bit strange to me.
Apart from Windows Update, did anything else on the system change?

These error messages are uncommon for FahCore_17 and I don't remember seeing them recently:
Eagle wrote:...
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:ERROR:103: Lost client lifeline
....
However, a quick search later, and it seems that something is killing the client (viewtopic.php?p=164030#p164030). Maybe the updated security application is interfering with F@H files? Have you created an exception for F@H in your security application?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Eagle
Posts: 116
Joined: Sun Feb 17, 2008 1:06 am
Hardware configuration: AMD Ryzen ThreadRipper 2950X (3.5 GHz)
ASUS Prime X399-A
G.Skill 32 GB DDR4-RAM (3.2 GHz)
EVGA GeForce RTX 2080 Ti Black (1.8 / 7 GHz)
Samsung 970 Pro 1 TB, 850 Pro 512 GB, Crucial C300 256 GB
Western Digital Black 2 TB, Gold 4 TB
Location: » Earth » Europe » Germany
Contact:

Re: Core 17 has suddenly started crashing

Post by Eagle »

PantherX wrote:It was a while back, 7 November 2013 (https://folding.stanford.edu/home/chang ... -full-fah/). However, the minimum FahCore_17 version requirement is determined by the WU so if you get a WU which required version 0.0.52, it will automatically update FahCore_17.
Thanks for the information. :)
PantherX wrote:Pretty sure that it was advanced when FahCore_17 was released for the first time (https://folding.stanford.edu/home/welco ... core-17-2/). You can now remove it. Moreover, the setting was in the CPU Slot so wouldn't have effected the GPU Slot.
Regarding the GPU slot, you are right. But when it comes to the CPU one, I was once told (and it was here at Folding Forum) to add bigadv to the CPU slot. But that might be out of date as well.
Anyway, I removed both, advanced and bigadv from the particular slots.
PantherX wrote:Humm, not sure why that is since by default, there can be a maximum of 17 log files, the log file in the FAHClient folder which is the current one and the previous 16 log files in the logs folder. The oldest one is automatically deleted and a new log file is generated every time FAHClient is restarted.
Thanks for the clarification. I removed that setting as well. :)
New config-log:

Code: Select all

<config>
  <!-- Network -->
  <proxy v=':8080'/>

  <!-- Slot Control -->
  <power v='full'/>

  <!-- User Information -->
  <passkey v='********************************'/>
  <team v='34361'/>
  <user v='Eagle3386'/>

  <!-- Folding Slots -->
  <slot id='1' type='GPU'/>
  <slot id='0' type='CPU'>
    <cpus v='7'/>
  </slot>
</config>
PantherX wrote:Apart from Windows Update, did anything else on the system change?
I updated Logitech G-Software, Miranda NG, Mozilla Firefox (I'm on the Nightly channel, so it updates almost every day..), Paint.NET (4.0 Beta), Skype and Steam. Apart from that, both, hard- and software, run on the same revision as months ago and 24/7.
PantherX wrote:These error messages are uncommon for FahCore_17 and I don't remember seeing them recently:
Eagle wrote:...
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:Lost lifeline PID 7480, exiting
18:01:37:WU00:FS01:0x17:ERROR:103: Lost client lifeline
....
However, a quick search later, and it seems that something is killing the client (viewtopic.php?p=164030#p164030). Maybe the updated security application is interfering with F@H files? Have you created an exception for F@H in your security application?
That might have happened due to the frequent slot deaths with that client eventually dying. In order to test again, but without waiting a couple of hours, I normally pause and un-pause the slot. That way, it retries to work on WUs.
Right now, it folds a WU of project 7623:

Code: Select all

12:36:25:FS01:Paused
12:36:26:FS01:Unpaused
12:36:26:WU00:FS01:Connecting to 171.67.108.201:80
12:36:27:WU00:FS01:Assigned to work server 171.64.65.105
12:36:27:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 171.64.65.105
12:36:27:WU00:FS01:Connecting to 171.64.65.105:8080
12:36:28:WU00:FS01:Downloading 122.32KiB
12:36:29:WU00:FS01:Download complete
12:36:29:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:7623 run:349 clone:0 gen:168 core:0x15 unit:0x000000de664f2dd14fe4fa6aca9dcf16
12:36:29:WU00:FS01:Starting
12:36:29:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:36:29:WU00:FS01:Started FahCore on PID 4432
12:36:29:WU00:FS01:Core PID:1820
12:36:29:WU00:FS01:FahCore 0x15 started
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:*------------------------------*
12:36:30:WU00:FS01:0x15:Folding@Home GPU Core
12:36:30:WU00:FS01:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
12:36:30:WU00:FS01:0x15:Build host             AmoebaRemote
12:36:30:WU00:FS01:0x15:Board Type             NVIDIA/CUDA
12:36:30:WU00:FS01:0x15:Core                   15
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:Window's signal control handler registered.
12:36:30:WU00:FS01:0x15:Preparing to commence simulation
12:36:30:WU00:FS01:0x15:- Looking at optimizations...
12:36:30:WU00:FS01:0x15:DeleteFrameFiles: successfully deleted file=00/wudata_01.ckp
12:36:30:WU00:FS01:0x15:- Created dyn
12:36:30:WU00:FS01:0x15:- Files status OK
12:36:30:WU00:FS01:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
12:36:30:WU00:FS01:0x15:- Expanded 124739 -> 501826 (decompressed 402.3 percent)
12:36:30:WU00:FS01:0x15:Called DecompressByteArray: compressed_data_size=124739 data_size=501826, decompressed_data_size=501826 diff=0
12:36:30:WU00:FS01:0x15:- Digital signature verified
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:Project: 7623 (Run 349, Clone 0, Gen 168)
12:36:30:WU00:FS01:0x15:
12:36:30:WU00:FS01:0x15:Assembly optimizations on if available.
12:36:30:WU00:FS01:0x15:Entering M.D.
12:36:32:WU00:FS01:0x15:Tpr hash 00/wudata_01.tpr:  3019852558 3487233801 375813301 3414467450 1681703861
12:36:32:WU00:FS01:0x15:GPU device id=0
12:36:32:WU00:FS01:0x15:Working on Protein
12:36:32:WU00:FS01:0x15:Client config unavailable.
12:36:32:WU00:FS01:0x15:Starting GUI Server
12:37:45:WU00:FS01:0x15:Setting checkpoint frequency: 400000
12:37:45:WU00:FS01:0x15:Completed         3 out of 40000000 steps (0%).
12:41:20:WU00:FS01:0x15:Completed    400000 out of 40000000 steps (1%).
12:45:00:WU00:FS01:0x15:Completed    800000 out of 40000000 steps (2%).
12:48:42:WU00:FS01:0x15:Completed   1200000 out of 40000000 steps (3%).
12:52:22:WU00:FS01:0x15:Completed   1600000 out of 40000000 steps (4%).
12:56:04:WU00:FS01:0x15:Completed   2000000 out of 40000000 steps (5%).
But then again it's Core 15, not 17. IIRC, you told me about major differences between both cores in the past before.
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Core 17 has suddenly started crashing

Post by 7im »

AV software updates itself all the time. Please confirm what version and type you have, and that it has an exception set for fah.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Eagle
Posts: 116
Joined: Sun Feb 17, 2008 1:06 am
Hardware configuration: AMD Ryzen ThreadRipper 2950X (3.5 GHz)
ASUS Prime X399-A
G.Skill 32 GB DDR4-RAM (3.2 GHz)
EVGA GeForce RTX 2080 Ti Black (1.8 / 7 GHz)
Samsung 970 Pro 1 TB, 850 Pro 512 GB, Crucial C300 256 GB
Western Digital Black 2 TB, Gold 4 TB
Location: » Earth » Europe » Germany
Contact:

Re: Core 17 has suddenly started crashing

Post by Eagle »

I apologize for forgetting about that - I'm using Eset Smart Security 7 (AV, HIPS and firewall active; no anti-theft, gamer mode, child safety lock or mail/web protection) and I hereby confirm that proper exceptions were set just moments ago.
Core 15 will finish project 7623 in about 4.5 hours and I'm reporting in as soon as Core 17 works on a WU again.
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Core 17 has suddenly started crashing

Post by PantherX »

Humm, none of the additional updates would normally interfere with F@H. FahCore_15 is working which means that at least the CUDA code is running properly. FahCore_17 uses OpenCL so maybe a reinstallation of Nvidia Driver with the clean option and install only the drivers and not the additional features if you aren't using it might be next. Also, when reviewing your log file (viewtopic.php?p=265425#p265425) I noticed an anomaly in the message sequence:

Code: Select all

16:23:32:WU00:FS01:Connecting to 171.67.108.201:80
16:23:34:WU00:FS01:Assigned to work server 140.163.4.231
16:23:34:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 140.163.4.231
16:23:34:WU00:FS01:Connecting to 140.163.4.231:8080
16:23:35:WU00:FS01:Downloading 4.83MiB
16:23:40:WU00:FS01:Download complete
16:23:40:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:434 clone:1 gen:23 core:0x17 unit:0x00000028538b3db75328cabc362813c8
16:23:40:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
16:23:40:WU00:FS01:Connecting to web.stanford.edu:80
16:23:41:WU00:FS01:FahCore 17: Downloading 2.55MiB
16:23:47:WU00:FS01:FahCore 17: 36.76%
16:23:53:WU00:FS01:FahCore 17: 75.97%
16:23:56:WU00:FS01:FahCore 17: Download complete
16:23:56:WU00:FS01:Valid core signature
16:23:56:WARNING:WU00:FS01:FahCore has not changed since last download, aborting core update
16:23:56:WU00:FS01:Starting
16:23:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
16:23:56:WU00:FS01:Started FahCore on PID 6964
16:23:57:WU00:FS01:Core PID:4132
16:23:57:WU00:FS01:FahCore 0x17 started
The message about aborting FahCore update shouldn't occur after the download is over and FahCore is verified. It should happen during the download. Maybe, you can delete the FahCore_17 in the folder which will automatically force a fresh download of FahCore_17.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Eagle
Posts: 116
Joined: Sun Feb 17, 2008 1:06 am
Hardware configuration: AMD Ryzen ThreadRipper 2950X (3.5 GHz)
ASUS Prime X399-A
G.Skill 32 GB DDR4-RAM (3.2 GHz)
EVGA GeForce RTX 2080 Ti Black (1.8 / 7 GHz)
Samsung 970 Pro 1 TB, 850 Pro 512 GB, Crucial C300 256 GB
Western Digital Black 2 TB, Gold 4 TB
Location: » Earth » Europe » Germany
Contact:

Re: Core 17 has suddenly started crashing

Post by Eagle »

PantherX wrote:Humm, none of the additional updates would normally interfere with F@H. FahCore_15 is working which means that at least the CUDA code is running properly. FahCore_17 uses OpenCL so maybe a reinstallation of Nvidia Driver with the clean option and install only the drivers and not the additional features if you aren't using it might be next.
I always only install the display, audio and PhysX driver, no 3D-stuff and applications aren't offered (at least not via GeForce Experience).

Regarding a full re-installation: I did that when I switched back from the beta-driver (for testing if that works). Though, if you suggest, I'd do that right away..
PantherX wrote:Also, when reviewing your log file (viewtopic.php?p=265425#p265425) I noticed an anomaly in the message sequence:

Code: Select all

16:23:32:WU00:FS01:Connecting to 171.67.108.201:80
16:23:34:WU00:FS01:Assigned to work server 140.163.4.231
16:23:34:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK110 [GeForce GTX 780] from 140.163.4.231
16:23:34:WU00:FS01:Connecting to 140.163.4.231:8080
16:23:35:WU00:FS01:Downloading 4.83MiB
16:23:40:WU00:FS01:Download complete
16:23:40:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:434 clone:1 gen:23 core:0x17 unit:0x00000028538b3db75328cabc362813c8
16:23:40:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
16:23:40:WU00:FS01:Connecting to web.stanford.edu:80
16:23:41:WU00:FS01:FahCore 17: Downloading 2.55MiB
16:23:47:WU00:FS01:FahCore 17: 36.76%
16:23:53:WU00:FS01:FahCore 17: 75.97%
16:23:56:WU00:FS01:FahCore 17: Download complete
16:23:56:WU00:FS01:Valid core signature
16:23:56:WARNING:WU00:FS01:FahCore has not changed since last download, aborting core update
16:23:56:WU00:FS01:Starting
16:23:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
16:23:56:WU00:FS01:Started FahCore on PID 6964
16:23:57:WU00:FS01:Core PID:4132
16:23:57:WU00:FS01:FahCore 0x17 started
The message about aborting FahCore update shouldn't occur after the download is over and FahCore is verified. It should happen during the download. Maybe, you can delete the FahCore_17 in the folder which will automatically force a fresh download of FahCore_17.
I've paused, deleted and un-paused the GPU slot. The result is this:

Code: Select all

19:13:40:FS01:Unpaused
19:13:40:WU02:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah
19:13:40:WU02:FS01:Connecting to web.stanford.edu:80
19:13:40:WU02:FS01:FahCore 17: Downloading 2.55MiB
19:13:46:WU02:FS01:FahCore 17: 26.96%
19:13:52:WU02:FS01:FahCore 17: 58.81%
19:13:58:WU02:FS01:FahCore 17: 90.67%
19:13:59:WU02:FS01:FahCore 17: Download complete
19:13:59:WU02:FS01:Valid core signature
19:13:59:WU02:FS01:Unpacked 8.60MiB to cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe
19:14:00:WU02:FS01:Starting
19:14:00:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
19:14:00:WU02:FS01:Started FahCore on PID 5784
19:14:00:WU02:FS01:Core PID:4248
19:14:00:WU02:FS01:FahCore 0x17 started
19:14:01:WU02:FS01:0x17:*********************** Log Started 2014-06-03T19:14:00Z ***********************
19:14:01:WU02:FS01:0x17:Project: 9406 (Run 34, Clone 0, Gen 61)
19:14:01:WU02:FS01:0x17:Unit: 0x000000520a3b1e5c533dd2516a29146a
19:14:01:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:14:01:WU02:FS01:0x17:Machine: 1
19:14:01:WU02:FS01:0x17:Reading tar file state.xml
19:14:02:WU02:FS01:0x17:Reading tar file system.xml
19:14:03:WU02:FS01:0x17:Reading tar file integrator.xml
19:14:03:WU02:FS01:0x17:Reading tar file core.xml
19:14:03:WU02:FS01:0x17:Digital signatures verified
19:14:03:WU02:FS01:0x17:Folding@home GPU core17
19:14:03:WU02:FS01:0x17:Version 0.0.52
19:19:59:WU02:FS01:0x17:Completed 0 out of 2000000 steps (0%)
19:19:59:WU02:FS01:0x17:Lost lifeline PID 5784, exiting
19:19:59:WU02:FS01:0x17:Lost lifeline PID 5784, exiting
19:19:59:WU02:FS01:0x17:ERROR:103: Lost client lifeline
19:19:59:WU02:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
19:19:59:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
19:19:59:WARNING:WU02:FS01:FahCore returned: CLIENT_DIED (103 = 0x67)
19:19:59:WU02:FS01:Starting
19:19:59:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "X:/Folding At Home/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 02 -suffix 01 -version 704 -lifeline 5640 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
19:19:59:WU02:FS01:Started FahCore on PID 4580
19:19:59:WU02:FS01:Core PID:6528
19:19:59:WU02:FS01:FahCore 0x17 started
19:20:00:WU02:FS01:0x17:*********************** Log Started 2014-06-03T19:20:00Z ***********************
19:20:00:WU02:FS01:0x17:Project: 9406 (Run 34, Clone 0, Gen 61)
19:20:00:WU02:FS01:0x17:Unit: 0x000000520a3b1e5c533dd2516a29146a
19:20:00:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:20:00:WU02:FS01:0x17:Machine: 1
19:20:00:WU02:FS01:0x17:Reading tar file state.xml
19:20:01:WU02:FS01:0x17:Reading tar file system.xml
19:20:02:WU02:FS01:0x17:Reading tar file integrator.xml
19:20:02:WU02:FS01:0x17:Reading tar file core.xml
19:20:02:WU02:FS01:0x17:Digital signatures verified
19:20:02:WU02:FS01:0x17:Folding@home GPU core17
19:20:02:WU02:FS01:0x17:Version 0.0.52
So, the client still dies.. :(
Michael Jordan: “I can accept failure — But I can’t accept not trying.”
Image
Post Reply