Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on GPU

Moderators: Site Moderators, FAHC Science Team

Post Reply
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 2600K@4.2 GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 HT@3.2 GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on GPU

Post by GreyWhiskers »

I was reissued the same WU right after this Unstable Machine w/NANs on one of the new 7621 WUs. The second time around, it successfully completed, and one more successfully completed, and am at 95% on the next. It should be more stable at this backoff.....

I have had the GTX 560Ti overclocked to a core clock of 953 MHz for months - after the NAN, I backed it off to 930 MHz. BTW, GPU temps have remained good - less than 70 deg C.
[03:28:41] Completed 36400000 out of 40000000 steps (91%).
[03:28:42] mdrun_gpu returned 52
[03:28:42] NANs detected on GPU
[03:28:42]
[03:28:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:28:45] CoreStatus = 7A (122)

Code: Select all


--- Opening Log file [August 20 22:32:34 UTC] 


# Windows GPU Systray Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.41r2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\Al\AppData\Roaming\Folding@home-gpu
Arguments: -verbosity 9 -advmethods 

[22:32:34] - Ask before connecting: No
[22:32:34] - User name: GreyWhiskers (Team 0)
[22:32:34] - User ID: 51EA5C9A7EF9D58E
[22:32:34] - Machine ID: 3
[22:32:34] 
[22:32:34] Gpu type=3 species=21.
[22:32:34] Loaded queue successfully.
[22:32:34] Initialization complete
[22:32:34] 
[22:32:34] + Processing work unit
[22:32:34] - Autosending finished units... [August 20 22:32:34 UTC]
[22:32:34] Trying to send all finished work units
[22:32:34] + No unsent completed units remaining.
[22:32:34] - Autosend completed
[22:32:35] Core required: FahCore_15.exe
[22:32:35] Core found.
[22:32:35] Working on queue slot 04 [August 20 22:32:35 UTC]
[22:32:35] + Working ...
[22:32:35] - Calling '.\FahCore_15.exe -dir work/ -suffix 04 -nice 19 -checkpoint 15 -verbose -lifeline 2608 -version 641'

[22:32:35] 
[22:32:35] *------------------------------*
[22:32:35] Folding@Home GPU Core
[22:32:35] Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
[22:32:35] Build host             SimbiosNvdWin7
[22:32:35] Board Type             NVIDIA/CUDA
[22:32:35] Core                   15
[22:32:35] 
[22:32:35] Window's signal control handler registered.
[22:32:35] Preparing to commence simulation
[22:32:35] - Ensuring status. Please wait.
[22:32:45] - Looking at optimizations...
[22:32:45] - Working with standard loops on this execution.
[22:32:45] - Previous termination of core was improper.
[22:32:45] - Files status OK
[22:32:45] sizeof(CORE_PACKET_HDR) = 512 file=<>
[22:32:45] - Expanded 124817 -> 501826 (decompressed 402.0 percent)
[22:32:45] Called DecompressByteArray: compressed_data_size=124817 data_size=501826, decompressed_data_size=501826 diff=0
[22:32:45] - Digital signature verified
[22:32:45] 
[22:32:45] Project: 7621 (Run 372, Clone 0, Gen 3)
[22:32:45] 
[22:32:45] Entering M.D.
[22:32:47] Will resume from checkpoint file work/wudata_04.ckp
[22:32:47] Tpr hash work/wudata_04.tpr:  135446047 2298027851 2561396342 2124944768 3566292881
[22:32:47] calling fah_main gpuDeviceId=0
[22:32:47] Working on Protein
[22:32:47] Client config found, loading data.
[22:32:47] Starting GUI Server
[22:33:58] Resuming from checkpoint
[22:33:58] fcCheckPointResume: retreived and current tpr file hash:
[22:33:58]    0    135446047    135446047
[22:33:58]    1   2298027851   2298027851
[22:33:58]    2   2561396342   2561396342
[22:33:58]    3   2124944768   2124944768
[22:33:58]    4   3566292881   3566292881
[22:33:58] fcCheckPointResume: file hashes same.
[22:33:58] fcCheckPointResume: state restored.
[22:33:58] fcCheckPointResume: name work/wudata_04.log Verified work/wudata_04.log
[22:33:58] fcCheckPointResume: name work/wudata_04.trr Verified work/wudata_04.trr
[22:33:58] fcCheckPointResume: name work/wudata_04.xtc Verified work/wudata_04.xtc
[22:33:58] fcCheckPointResume: name work/wudata_04.edr Verified work/wudata_04.edr
[22:33:58] fcCheckPointResume: state restored 2

[22:33:58] Resumed from checkpoint
[22:33:58] Setting checkpoint frequency: 400000
[22:33:58] Completed  12800001 out of 40000000 steps (32%).
[22:38:58] Completed  13200000 out of 40000000 steps (33%).
[22:43:57] Completed  13600000 out of 40000000 steps (34%).
[22:48:57] Completed  14000000 out of 40000000 steps (35%).
[22:53:57] Completed  14400000 out of 40000000 steps (36%).
[22:58:56] Completed  14800000 out of 40000000 steps (37%).
[23:03:56] Completed  15200000 out of 40000000 steps (38%).
[23:08:56] Completed  15600000 out of 40000000 steps (39%).
[23:13:56] Completed  16000000 out of 40000000 steps (40%).
[23:18:55] Completed  16400000 out of 40000000 steps (41%).
[23:23:55] Completed  16800000 out of 40000000 steps (42%).
[23:28:54] Completed  17200000 out of 40000000 steps (43%).
[23:33:55] Completed  17600000 out of 40000000 steps (44%).
[23:38:54] Completed  18000000 out of 40000000 steps (45%).
[23:43:54] Completed  18400000 out of 40000000 steps (46%).
[23:48:53] Completed  18800000 out of 40000000 steps (47%).
[23:53:53] Completed  19200000 out of 40000000 steps (48%).
[23:58:53] Completed  19600000 out of 40000000 steps (49%).
[00:03:53] Completed  20000000 out of 40000000 steps (50%).
[00:08:52] Completed  20400000 out of 40000000 steps (51%).
[00:13:52] Completed  20800000 out of 40000000 steps (52%).
[00:18:52] Completed  21200000 out of 40000000 steps (53%).
[00:23:52] Completed  21600000 out of 40000000 steps (54%).
[00:28:51] Completed  22000000 out of 40000000 steps (55%).
[00:33:51] Completed  22400000 out of 40000000 steps (56%).
[00:38:51] Completed  22800000 out of 40000000 steps (57%).
[00:43:50] Completed  23200000 out of 40000000 steps (58%).
[00:48:50] Completed  23600000 out of 40000000 steps (59%).
[00:53:50] Completed  24000000 out of 40000000 steps (60%).
[00:58:50] Completed  24400000 out of 40000000 steps (61%).
[01:03:49] Completed  24800000 out of 40000000 steps (62%).
[01:08:49] Completed  25200000 out of 40000000 steps (63%).
[01:13:49] Completed  25600000 out of 40000000 steps (64%).
[01:18:49] Completed  26000000 out of 40000000 steps (65%).
[01:23:48] Completed  26400000 out of 40000000 steps (66%).
[01:28:48] Completed  26800000 out of 40000000 steps (67%).
[01:33:48] Completed  27200000 out of 40000000 steps (68%).
[01:38:48] Completed  27600000 out of 40000000 steps (69%).
[01:43:47] Completed  28000000 out of 40000000 steps (70%).
[01:48:47] Completed  28400000 out of 40000000 steps (71%).
[01:53:46] Completed  28800000 out of 40000000 steps (72%).
[01:58:46] Completed  29200000 out of 40000000 steps (73%).
[02:03:46] Completed  29600000 out of 40000000 steps (74%).
[02:08:46] Completed  30000000 out of 40000000 steps (75%).
[02:13:45] Completed  30400000 out of 40000000 steps (76%).
[02:18:45] Completed  30800000 out of 40000000 steps (77%).
[02:23:45] Completed  31200000 out of 40000000 steps (78%).
[02:28:45] Completed  31600000 out of 40000000 steps (79%).
[02:33:44] Completed  32000000 out of 40000000 steps (80%).
[02:38:44] Completed  32400000 out of 40000000 steps (81%).
[02:43:44] Completed  32800000 out of 40000000 steps (82%).
[02:48:43] Completed  33200000 out of 40000000 steps (83%).
[02:53:43] Completed  33600000 out of 40000000 steps (84%).
[02:58:43] Completed  34000000 out of 40000000 steps (85%).
[03:03:42] Completed  34400000 out of 40000000 steps (86%).
[03:08:42] Completed  34800000 out of 40000000 steps (87%).
[03:13:42] Completed  35200000 out of 40000000 steps (88%).
[03:18:42] Completed  35600000 out of 40000000 steps (89%).
[03:23:41] Completed  36000000 out of 40000000 steps (90%).
[03:28:41] Completed  36400000 out of 40000000 steps (91%).
[03:28:42] mdrun_gpu returned 52
[03:28:42] NANs detected on GPU
[03:28:42] 
[03:28:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:28:45] CoreStatus = 7A (122)
[03:28:45] Sending work to server
[03:28:45] Project: 7621 (Run 372, Clone 0, Gen 3)
[03:28:45] - Read packet limit of 540015616... Set to 524286976.
[03:28:45] - Error: Could not get length of results file work/wuresults_04.dat
[03:28:45] - Error: Could not read unit 04 file. Removing from queue.

[03:28:51] 
geokilla
Posts: 64
Joined: Sun Mar 08, 2009 4:36 am
Hardware configuration: Intel Core i5-10600KF @ 4.9Ghz @ 1.25V
MSI Z490 Gaming Edge Wi-Fi BIOS v17
XPG D50 32GB DDR4-3200 16-19-9-36 2T (Samsung M-Die)
XPG S11 Pro 1TB and Western Digital WD140EDFZ 14TB
EVGA RTX 3060 XC
Corsair RM650x
Phantek P360A with Noctua Exhaust Fans
Location: Toronto, Canada

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by geokilla »

The new WU are a lot more stressful than the old GPU WUs. Therefore getting crashes are not uncommon. My GTX 460 used to fold fine at 62C with 860 core. Now I have to back it down to 840 core and the temps are at 70C. Fan speed is auto.
Intel Core i5-10600KF @ 4.9Ghz @ 1.25V
MSI Z490 Gaming Edge Wi-Fi BIOS v17
XPG D50 32GB DDR4-3200 16-19-9-36 2T (Samsung M-Die)
XPG S11 Pro 1TB and Western Digital WD140EDFZ 14TB
ASUS TUF RTX 3070 OC
Corsair RM650x
Phantek P360A with Noctua Exhaust Fans
uncle fuzzy
Posts: 460
Joined: Sun Dec 02, 2007 10:15 pm
Location: Michigan

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by uncle fuzzy »

The higher the number on your card, the less likely you will be to see a temperature rise. However, all cards seem to be equally open to the OC NAN. I have less capable cards, but but they fold these fine at lower clocks.

GTX460- dropped from 850 to 825 (72C, max fan)
GTS450- dropped form 950 to 875 (75C, max fan)
schwancr
Pande Group Member
Posts: 136
Joined: Wed Jun 01, 2011 9:45 pm

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by schwancr »

Thanks for your input here everyone, it seems that turning down the clock rate may be necessary to finish these WUs.

-Christian
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 2600K@4.2 GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 HT@3.2 GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by GreyWhiskers »

BTW, another quick stat on folding the 7620/7621s. I'm seeing through the stats in both MSI Afterburner and GPU-Z that the memory usage is 513 MB (out of the 1024 MB). This is signifigantly higher than I remember getting on the p680x WUs. I don't have the exact mem usage numbers for the p680x, but I do remember noting that it seemed very low the last time I looked.

One other observation, similar to others. The system seems pretty sluggish with the SMP and GPU together compared with before. It may be the video, it may be the CPU, it may be the wireless keyboard and mouse - but with SMP 8 and the 7620/7621s, the system is getting a real workout.
ra_alfaomega
Posts: 16
Joined: Tue Aug 23, 2011 9:00 am
Hardware configuration: Intel 3930k@4,5Ghz
Kingston HyperX 8Gb 1866@2000Mhz
Nvidia gtx460 Hawk

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by ra_alfaomega »

With a 460 Hawk at 68xx projects my card was overclocked to 925 and the max temp was 77. Now I have a 7621 project and at 912mhz my temps are around 88C. I am disappointed about the ppd, because is about the same with 68xx projects. Considering the size of the project and the heat that is produced I think that the points are not high enough. So I will not fold this projects anymore until the points system will be reconsidered. Anyone else has the same opinion about the points for this projects?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by bruce »

The points are determined by the official benchmarking policy, which has not changed (and probably won't). The fact that FAH is now able to make more effective use of the GPU resources is considered a good thing, but it's unrelated to the benchmarking process. Others have discovered that although they have been able to overclocking their GPU when it's running only inefficient projects, they now know why the manufacturer established the standard clock rate.

If you choose to stop folding, that's your own personal decision, but please don't attempt to recruit others to do the same.
ra_alfaomega
Posts: 16
Joined: Tue Aug 23, 2011 9:00 am
Hardware configuration: Intel 3930k@4,5Ghz
Kingston HyperX 8Gb 1866@2000Mhz
Nvidia gtx460 Hawk

Re: Project: 7621 (Run 372, Clone 0, Gen 3) NANs detected on

Post by ra_alfaomega »

Everybody has a choice,and I don't want anybody to follow my opinion,just wanted to know if I am the only one who thinks that a much bigger project deserve more points. I don't want to quit folding, and I fold for 2 years(almost 4 million points). Since yesterday I have a better airflow in my case and my temps on gpu on 7621 project dropped almost 10C.Maybe I will reconsider folding 762X projects again :) the more so as is so important for researchers in this period of time . Thank you Bruce for your answer!
Post Reply