Project 13421

Moderators: Site Moderators, FAHC Science Team

Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Project 13421

Post by Nuitari »

I think this is more of an issue with the client then the project itself. It seems that sometimes one of my GPUs receive a long string of faulty units from Project 13421 and the client marks the GPU failed. that means I have to once again babysit 9 different computers that do folding...

The error is always ERROR:Discrepancy: Forces are blowing up! 23 2 and it is always before any steps seem to complete

This is the latest one I found, but I can dig up past ones.

Code: Select all

18:45:28:WU02:FS01:0x22:Project: 13421 (Run 6192, Clone 19, Gen 0)
18:45:28:WU02:FS01:0x22:Unit: 0x0000000012bc7d9a5f224a095867c495
18:45:28:WU02:FS01:0x22:Reading tar file core.xml
18:45:28:WU02:FS01:0x22:Reading tar file integrator.xml
18:45:28:WU02:FS01:0x22:Reading tar file state.xml.bz2
18:45:28:WU02:FS01:0x22:Reading tar file system.xml.bz2
18:45:28:WU02:FS01:0x22:Digital signatures verified
18:45:28:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
18:45:28:WU02:FS01:0x22:Version 0.0.11
18:45:28:WU02:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:45:28:WU02:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:45:28:WU02:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:45:28:WU02:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
18:45:31:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 23 2
18:45:31:WU02:FS01:0x22:Saving result file ../logfile_01.txt
18:45:31:WU02:FS01:0x22:Saving result file science.log
18:45:31:WU02:FS01:0x22:Saving result file state.xml.bz2
18:45:31:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
I think the F@H client should have a way to know when a project is more at risk of faulty units and manage it appropriately.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 13421

Post by bruce »

The chances of a NaN error are higher for the MoonShot projects (p13400 series) than for other projects. I don't think there's any way to prevent them other than to work on other projects and I don't think there's an easy way to exclude specific projects. Because of the "sprint" concept, JohnChodra is doing whatever he can to maximize the distribution of those projects.

@johnChodra: How about excluding distribution to Advanced so donors like Nuitari can select Advanced and (hopefully) get a preponderance of other projects? (If there are no Advanced projects, the assignment will roll over to Full FAH so he'll likely get one there.) :( Even removing the COVID preference would not guarantee a non-MoonShot assignment.
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13421

Post by Nuitari »

I don't have any problem doing the p13400 projects, in fact I would prefer to focus on those. However the client itself need to handle errors much more gracefully then it does now.
Image
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project 13421

Post by toTOW »

What GPU is it ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13421

Post by Nuitari »

One was a GeForce GTX 660, the other an APU (A8-9600)

And this morning, 2 different RX570 and a RX560
I checked the 226 WU of project 13421 that the last rig got and found the following:

54 Success from me, with no other failures
86 faulty were I was the only one (probably will be reassigned)
19 where I was faulty but someone else Ok
35 I was Ok, but someone else reported as faulty
32 where everyone was reporting back Faulty

Except maybe 3 cases, all failures were before we had the first report of "Completed 0 out of 1000000 steps "
Image
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13421

Post by Nuitari »

12 GPUs out of 16 this morning were faulty because of project 13421.
All WUs that fail even before the step 0.
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 13421

Post by PantherX »

Generally speaking, Nvidia GPUs do work quite well with FahCore_22 and Project 13421.

Your GPU is fully supported as it has OpenCL 1.2 and Double Precision support: https://www.techpowerup.com/gpu-specs/g ... x-660.c895

Can you please inform us:
What Driver version you're running?
Where did you download the Driver from (automatic via Windows Update or manually from Nvidia's site)?
What OS you're running?
Any thing unique/special about those systems where the failure occur?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13421

Post by Nuitari »

The more powerful NVIDIA cards I have don't get project 13421, the only one that does is a GeForce GTX 660

In fact, it is the only one out of the 4 nvidia cards that is having that project.

Driver 384.130
Ubuntu 16.04
Drivers are the binaries coming from ubuntu
Nothing special otherwise, CPU is an older i5-2320, and its also doing CPU folding.

On the AMD ones, it is the following:
2 systems have the following:
Ubuntu 16.04, mix of RX570 and RX560. Carizzo core (APU) is also affected by the failure issue.
AMDGPU 18.30-641594 from AMD's website

1 system is Ubuntu 18.04
AMDGPU 20.20-1089974 from AMD's website


Interesting log snippets from the Geforce GTX660

Code: Select all

$ grep ERROR log.txt |grep -v NO_ERROR
05:56:48:WU01:FS01:0x22:ERROR:NaNs detected in forces. 8 0
05:56:55:WU02:FS01:0x22:ERROR:NaNs detected in forces. 8 0
05:57:01:WU01:FS01:0x22:ERROR:Force RMSE error of 415.735 with threshold of 5
05:57:05:WU02:FS01:0x22:ERROR:NaNs detected in forces. 8 0
08:33:40:ERROR:WU02:FS00:Exception: Server did not assign work unit
08:51:06:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
08:51:12:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
12:29:02:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 9 0
12:29:58:ERROR:WU00:FS00:Exception: Could not get an assignment
12:29:59:ERROR:WU00:FS00:Exception: Could not get an assignment
12:35:13:ERROR:WU00:FS00:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
14:17:18:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 6 0
14:17:25:WU01:FS01:0x22:ERROR:Force RMSE error of 445.004 with threshold of 5
14:17:31:WU02:FS01:0x22:ERROR:Force RMSE error of 272.417 with threshold of 5
14:17:35:WU01:FS01:0x22:ERROR:Force RMSE error of 381.459 with threshold of 5
14:17:41:WU02:FS01:0x22:ERROR:Force RMSE error of 267.462 with threshold of 5
14:17:47:WU01:FS01:0x22:ERROR:Force RMSE error of 524.41 with threshold of 5
16:07:45:ERROR:WU01:FS00:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
19:59:36:ERROR:WU00:FS00:Exception: Could not get an assignment
13:51:25:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 12 0
00:14:07:WU01:FS01:0x22:ERROR:Force RMSE error of 165.891 with threshold of 5
00:14:12:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 5 0
00:14:18:WU01:FS01:0x22:ERROR:Force RMSE error of 143.528 with threshold of 5
00:14:24:WU00:FS01:0x22:ERROR:Force RMSE error of 1416.69 with threshold of 5
13:29:23:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 12 0
15:05:35:WU01:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
01:14:20:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 12 0
01:14:24:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
01:14:27:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
01:14:31:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 10 2
01:14:36:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 25 0
01:14:40:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
01:14:44:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 25 0
01:23:40:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
09:12:20:ERROR:WU01:FS00:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
20:41:57:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
20:42:04:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 15 2
20:42:07:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 21 0
06:19:54:ERROR:WU01:FS01:Exception: Transfer failed
06:21:45:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 21 0
09:26:20:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:27:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:31:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:35:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:39:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:44:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:48:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:53:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:26:57:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
09:27:02:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 7 0
12:44:31:ERROR:WU00:FS00:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
13:57:49:WU01:FS01:0x22:ERROR:Force RMSE error of 38.2303 with threshold of 5
13:57:53:WU02:FS01:0x22:ERROR:Force RMSE error of 17.6938 with threshold of 5
13:57:57:WU01:FS01:0x22:ERROR:Force RMSE error of 15.6476 with threshold of 5
13:58:02:WU02:FS01:0x22:ERROR:Force RMSE error of 65.3317 with threshold of 5
13:58:07:WU01:FS01:0x22:ERROR:Force RMSE error of 69.6833 with threshold of 5
13:58:11:WU02:FS01:0x22:ERROR:Force RMSE error of 69.8878 with threshold of 5
13:58:19:WU01:FS01:0x22:ERROR:Force RMSE error of 21.2261 with threshold of 5
13:58:22:WU02:FS01:0x22:ERROR:Force RMSE error of 20.6909 with threshold of 5
14:06:58:ERROR:WU01:FS01:Exception: Transfer failed
14:07:01:WU01:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 15 0
14:07:04:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 15 0
14:07:37:WU01:FS01:0x22:ERROR:Force RMSE error of 596.999 with threshold of 5
14:07:41:WU02:FS01:0x22:ERROR:Force RMSE error of 639.224 with threshold of 5
00:29:32:WU02:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 6 1

Code: Select all

05:56:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:3533 clone:15 gen:0 core:0x22 unit:0x0000000212bc7d9a5f20206f5b5d3eda
05:56:55:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:3553 clone:15 gen:0 core:0x22 unit:0x0000000212bc7d9a5f20206741780e8c
05:57:02:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:3558 clone:15 gen:0 core:0x22 unit:0x0000000212bc7d9a5f2020680636f542
05:57:05:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:3582 clone:15 gen:0 core:0x22 unit:0x0000000212bc7d9a5f202071a2a1087f
08:51:02:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13421 run:3589 clone:15 gen:0 core:0x22 unit:0x0000000212bc7d9a5f202071b9531fe5
08:51:06:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:6428 clone:17 gen:0 core:0x22 unit:0x0000000112bc7d9a5f224a0aad570b6b
08:51:12:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:6588 clone:17 gen:0 core:0x22 unit:0x0000000112bc7d9a5f224a1169e6179a
10:40:35:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13421 run:6600 clone:17 gen:0 core:0x22 unit:0x0000000112bc7d9a5f224a13be52964a
12:29:00:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13421 run:4178 clone:19 gen:0 core:0x22 unit:0x0000000212bc7d9a5f20708f0b1912b9
12:29:02:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:2996 clone:21 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4f716980b2e6
14:17:14:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13421 run:3154 clone:21 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1fc0d2cba70df7
14:17:19:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:1842 clone:23 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4d0c48905fca
14:17:25:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:2023 clone:23 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4d4188a0f371
14:17:31:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:2039 clone:23 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4d6f62d34cdc
14:17:35:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:2071 clone:23 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4d759e4e3649
14:17:42:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:2104 clone:23 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4d859c7c6854
14:17:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:2133 clone:23 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1f4d905c76a65e
04:12:19:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13421 run:3338 clone:29 gen:0 core:0x22 unit:0x0000000212bc7d9a5f1fc0fa9dba1177
13:51:26:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:5651 clone:22 gen:0 core:0x22 unit:0x0000000212bc7d9a5f2249eb66dbcde5
10:57:44:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13421 run:8173 clone:60 gen:0 core:0x22 unit:0x0000000012bc7d9a5f284d8ef58578e8
12:46:24:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13421 run:3471 clone:64 gen:0 core:0x22 unit:0x0000000012bc7d9a5f1fc14ca6785ba0
00:14:03:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13421 run:1485 clone:69 gen:0 core:0x22 unit:0x0000000012bc7d9a5f1e0dbf9e5b4300
00:14:07:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:7435 clone:69 gen:0 core:0x22 unit:0x0000000012bc7d9a5f224a38f66b7bc1
00:14:13:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:7502 clone:69 gen:0 core:0x22 unit:0x0000000012bc7d9a5f224a4078515492
00:14:19:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:7514 clone:69 gen:0 core:0x22 unit:0x0000000012bc7d9a5f224a40b385f85f
00:14:25:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:7521 clone:69 gen:0 core:0x22 unit:0x0000000012bc7d9a5f224a401cc8b4ba
11:42:23:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13421 run:2077 clone:73 gen:0 core:0x22 unit:0x0000000012bc7d9a5f1f4d7f134a2e4b
13:29:19:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13421 run:129 clone:74 gen:0 core:0x22 unit:0x0000000012bc7d9a5f1d0fff3d8580ef
13:29:24:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:6591 clone:74 gen:0 core:0x22 unit:0x0000000012bc7d9a5f224a156d3abbbd
10:35:49:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13421 run:3660 clone:81 gen:0 core:0x22 unit:0x0000000012bc7d9a5f20206c40b32827
01:14:21:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:7215 clone:37 gen:1 core:0x22 unit:0x0000000212bc7d9a5f224a2e8b55170c
01:14:24:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13421 run:7208 clone:45 gen:1 core:0x22 unit:0x0000000112bc7d9a5f224a30e9fd7089
01:14:28:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:7208 clone:49 gen:1 core:0x22 unit:0x0000000212bc7d9a5f224a31dbe82c13
01:14:32:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13421 run:7208 clone:53 gen:1 core:0x22 unit:0x0000000112bc7d9a5f224a2f1734bee9
01:14:36:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:7208 clone:60 gen:1 core:0x22 unit:0x0000000112bc7d9a5f224a2f22013846
01:14:41:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13421 run:7208 clone:65 gen:1 core:0x22 unit:0x0000000112bc7d9a5f224a2f654039ef
01:14:44:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:7208 clone:70 gen:1 core:0x22 unit:0x0000000112bc7d9a5f224a2e79f7e1ba
13:55:52:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13421 run:6763 clone:54 gen:1 core:0x22 unit:0x0000000212bc7d9a5f224a1cfcfc3add
15:44:47:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13421 run:6603 clone:66 gen:1 core:0x22 unit:0x0000000112bc7d9a5f224a10fc9e5c4c
01:23:40:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13421 run:6070 clone:50 gen:1 core:0x22 unit:0x0000000112bc7d9a5f2249ffb817cf0c
20:41:58:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:5228 clone:21 gen:1 core:0x22 unit:0x0000000112bc7d9a5f2249e2e4f3496d
20:42:04:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13421 run:5224 clone:19 gen:1 core:0x22 unit:0x0000000112bc7d9a5f2249db0e4136af
20:42:07:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:5224 clone:25 gen:1 core:0x22 unit:0x0000000112bc7d9a5f2249db0ca65d7e
06:21:46:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4845 clone:2 gen:1 core:0x22 unit:0x0000000212bc7d9a5f2249ccacd50bb7
09:26:18:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13421 run:4844 clone:42 gen:1 core:0x22 unit:0x0000000112bc7d9a5f2249cc5eb8be00
09:26:20:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:4718 clone:56 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20bd4738e6cfd6
09:26:27:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4708 clone:1 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20bd465c1ef795
09:26:31:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:4708 clone:13 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20bd4947b4ce16
09:26:35:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4708 clone:15 gen:1 core:0x22 unit:0x0000000212bc7d9a5f20bd472fb0d1de
09:26:39:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:4708 clone:19 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20bd473e69abc7
09:26:44:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4708 clone:25 gen:1 core:0x22 unit:0x0000000412bc7d9a5f20bd4418671006
09:26:49:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:4708 clone:26 gen:1 core:0x22 unit:0x0000000212bc7d9a5f20bd4a67513fe6
09:26:53:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4708 clone:31 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20bd4b97a650e4
09:26:58:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13421 run:4708 clone:34 gen:1 core:0x22 unit:0x0000000212bc7d9a5f20bd4785cb695b
09:27:02:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4708 clone:40 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20bd4ad606d382
13:57:50:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4506 clone:41 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20709b1066bca2
13:57:53:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4506 clone:44 gen:1 core:0x22 unit:0x0000000212bc7d9a5f20709da5e6d0ec
13:57:57:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4506 clone:47 gen:1 core:0x22 unit:0x0000000212bc7d9a5f20709bf85a3e3b
13:58:03:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4506 clone:56 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709b617a0b23
13:58:07:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4506 clone:62 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709b69bc982f
13:58:11:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4506 clone:67 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709b820e8c20
13:58:19:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4505 clone:2 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709bb806f7d1
13:58:22:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4505 clone:14 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709b3f0af7ee
14:07:01:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4499 clone:52 gen:1 core:0x22 unit:0x0000000212bc7d9a5f20709bbce5d549
14:07:04:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4499 clone:54 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709b62a529cc
14:07:38:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13421 run:4498 clone:24 gen:1 core:0x22 unit:0x0000000112bc7d9a5f20709bba692b0c
14:07:41:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4498 clone:30 gen:1 core:0x22 unit:0x0000000312bc7d9a5f20709b863d1a62
00:29:32:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13421 run:4060 clone:46 gen:1 core:0x22 unit:0x0000000112bc7d9a5f207094e404bfd2
Project 16600 does run fine on it though.
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 13421

Post by PantherX »

AFAIK, Project 13420 is restricted to high-end GPUs while Project 13421 is for low-end GPUs. One reason for the restriction is that Project 13421 doesn't scale that well on high-end GPUs but works very well on low end GPUs.

Here's a bit of comparison on Windows with GTX 1080 Ti:
Project 13420: Avg. Time / Frame : 00:02:55 - 812,856.48 PPD
Project 13421: Avg. Time / Frame : 00:01:14 - 155,868.67 PPD

Regarding the issue, have you installed OpenCL package (sudo apt-get install ocl-icd-opencl-dev)? Also, I do believe that the current Drivers for Nvidia GPUs are 4XX series so maybe you can try to update that too?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13421

Post by Nuitari »

That's the package for compiling opencl code, but I will install it.
I will check for the nvidia drivers, I'm not sure what is the latest one for that very old card model.
The card works well otherwise, except for that project
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 13421

Post by PantherX »

Regarding the OpenCL Package, that is what I have seen donors on the Forum report to have worked. While I am not an expert on Linux, there are other members here who might be able to provide some guidance.

Project 134XX are using some of the latest features while other Project haven't used them. Thus, it could be one reason why that Project fails while others work.

A quick search indicates that the latest Nvidia Driver on Linux for GTX 660 is 450.57: https://www.nvidia.com/Download/driverR ... 2107/en-us
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Joe_H
Site Admin
Posts: 7936
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 13421

Post by Joe_H »

There are some different theories on why the dev kit being installed gets some things related to OpenCL to work on Linux systems. The one that seems to fit best is that some driver installers will get the OpenCL runtime code in place but leave some unresolved links. The dev kit install patches those up apparently. It might be something else, and the behavior is not exactly the same from one driver version to another.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
NormalDiffusion
Posts: 124
Joined: Sat Apr 18, 2020 1:50 pm

Re: Project 13421

Post by NormalDiffusion »

PantherX wrote:AFAIK, Project 13420 is restricted to high-end GPUs while Project 13421 is for low-end GPUs. One reason for the restriction is that Project 13421 doesn't scale that well on high-end GPUs but works very well on low end GPUs.

Here's a bit of comparison on Windows with GTX 1080 Ti:
Project 13420: Avg. Time / Frame : 00:02:55 - 812,856.48 PPD
Project 13421: Avg. Time / Frame : 00:01:14 - 155,868.67 PPD
Seeing these values, I feel the Radeon R9 290x should be removed from the 13421 list.
Similar situation as with the 1080Ti:
Project 13420: Avg. Time / Frame : 00:03:23 - 650,619.0 PPD
Project 13421: Avg. Time / Frame : 00:01:08 - 176,947.1 PPD
(sadly I could only get 13420 on 26th and 27th of July. Since then only 13421...)
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: Project 13421

Post by JohnChodera »

> The error is always ERROR:Discrepancy: Forces are blowing up! 23 2 and it is always before any steps seem to complete

We have seen some GPU/driver combinations are giving discrepancies, and are working on adding code to the 0.0.12 core build that would run through some debugging tests whenever this happens so we can identify what the issue is and fix it without needing a local machine that reproduces the exact GPU/driver combination. We're hoping to get this out next week.

Thanks so much for bearing with us here!

~ John Chodera // MSKCC
BobWilliams757
Posts: 519
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 13421

Post by BobWilliams757 »

PantherX wrote:AFAIK, Project 13420 is restricted to high-end GPUs while Project 13421 is for low-end GPUs. One reason for the restriction is that Project 13421 doesn't scale that well on high-end GPUs but works very well on low end GPUs.

Here's a bit of comparison on Windows with GTX 1080 Ti:
Project 13420: Avg. Time / Frame : 00:02:55 - 812,856.48 PPD
Project 13421: Avg. Time / Frame : 00:01:14 - 155,868.67 PPD

Regarding the issue, have you installed OpenCL package (sudo apt-get install ocl-icd-opencl-dev)? Also, I do believe that the current Drivers for Nvidia GPUs are 4XX series so maybe you can try to update that too?
The complexity of how different projects work with various hardware and software is amazing. I had been wondering if 13420 had hardware restrictions, as I never picked any up on my machine. It was a really smart move for overall project throughput to segregate and restrict the various project numbers to GPU's that could do them efficiently. Since it seems like many smaller atom count projects are popping up, it just made sense.

As a comparison, my little Vega 11 onboard has probably averaged that 155k (give or take a couple K) through the project 13421 runs.
Fold them if you get them!
Post Reply