Project 16435 and RX Vega 56/64

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

BobWilliams757
Posts: 523
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 16435 and RX Vega 56/64

Post by BobWilliams757 »

Just another follow up....

PPD estimates in the FAH control still jump around. I've now run 11 various instances of the WU, and all actually return about double the points expected based on timeout time. On my slow Vega 11 built in graphics, it takes right about a full day to run the WU's. The timeout is 1 day. But actual returns are about twice the base, as if it's actually awarding points as if it had a 2 day timout.

Great suggestions on the CPU and HD notes, but neither seem to impact the WU any greater than usual. If I'm CPU folding it drops the PPD slightly, but that is normal on this system since the CPU and graphics are integrated.

PPD is on the lowish side for my hardware, but so far no errors at all, other than one operator error when I hit "Pause slot" vs "Finish" the other day on the last WU I did. But even though that WU took 47 hours to complete due to the accidental pause, it awarded just shy of 47K points. Base points are 28.6K.
Fold them if you get them!
Joe_H
Site Admin
Posts: 7941
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 16435 and RX Vega 56/64

Post by Joe_H »

The QRB formula awards bonus based on the Deadline time, so up to the Timeout value base point plus a bonus are credited. After the Timeout, just base points, so there is a step in the the function at that point. The fraction is:

Upload time - Download time
-------------------------
Deadline time
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
BobWilliams757
Posts: 523
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 16435 and RX Vega 56/64

Post by BobWilliams757 »

Joe_H wrote:The QRB formula awards bonus based on the Deadline time, so up to the Timeout value base point plus a bonus are credited. After the Timeout, just base points, so there is a step in the the function at that point. The fraction is:

Upload time - Download time
-------------------------
Deadline time

Thanks for the breakdown of how it works. I've never seen that before and since I usually have no issues completing WU's before the timeout, had no idea how it worked.

In this case it sort of solidifies that something is strange though. I suspect that maybe there was just an error inputting the project info, and the timeout is actually two days yet displays as one.

As for the display in the advanced control, I think maybe the slight variance in speed (based on other system use) combined with the fact that my system is running this WU almost right at the timeout time is what is making the PPD estimate jump around. I had one of these WU's complete in 24 hours exactly based on HFM logs. All the others were 24 hours + 5-15 minutes. But it's possible that on that frame/checkpoint it could have been ahead of the curve or behind the curve just very slightly.

I was under the impression that QRB no longer existed after you passed the timeout time, thus why I was assuming the WU is awarding points based on a 2 day timeout.
Fold them if you get them!
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 16435 and RX Vega 56/64

Post by PantherX »

BobWilliams757 wrote:...I was under the impression that QRB no longer existed after you passed the timeout time, thus why I was assuming the WU is awarding points based on a 2 day timeout.
That correct...
Before Timeout: Base points + Bonus points
After Timeout (and before Expiration): Base credits

Joe_H was just highlighting a part of the formula, not the entire formula. You can read about that here: https://foldingathome.org/support/faq/points/
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
BobWilliams757
Posts: 523
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 16435 and RX Vega 56/64

Post by BobWilliams757 »

PantherX wrote:
BobWilliams757 wrote:...I was under the impression that QRB no longer existed after you passed the timeout time, thus why I was assuming the WU is awarding points based on a 2 day timeout.
That correct...
Before Timeout: Base points + Bonus points
After Timeout (and before Expiration): Base credits

Joe_H was just highlighting a part of the formula, not the entire formula. You can read about that here: https://foldingathome.org/support/faq/points/
Thanks for the further follow up on this. Though with my modest hardware points are a rather insignificant thing, it's interesting to know how they set them up regardless.

As for the moving PPD I had, I did in fact confirm my above suspicion. Due to slight variance in frame times when using the computer, it couldn't decide frame to frame if I would make the timeout time or not, thus the points moving quite a bit. A slight bump in memory speed (which in my case impacts GPU speed) moved the PPD estimate in the advanced control up enough to allow it to show stable estimates through the WU.


I also did some digging by checking several of the various 16435 WU's I had completed. Similar to the findings of Crawdaddy79, I'm seeing a solid trend of a lot of fault codes rather than completed work units. I finished one today, picked up another, and the one I just picked up has already failed for a couple of donors in the couple of hours before I got it. A couple of the specific PRCG reports showed 4 or 5 faulty returns with then maybe 1 or 2 good returns. I'm not sure if or how this should be reported, but I suspect that the expected completion rate would usually be higher than the 20-25% range. Of the several I looked at, the highest percentage of completion with an "OK" status was in the 50% range. If anyone knows the how or who if it needs to be reported, I'd provide the specific PRCG info or link them the way Crawdaddy79 did.

I had run a search in the beta forum early on trying to figure out the points jumping I had, and there were some upload bugs for some reported, but most people seemed to have them execute and run fine. Other than figuring out the points jumping and awarded thing, they run just fine on my modest system, they just take almost a full day or so. This month has felt like groundhog day with these WU's, with over half the folding time the machine logged. :mrgreen:
Fold them if you get them!
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 16435 and RX Vega 56/64

Post by PantherX »

Regarding failures of WUs, that would need to be investigated for your system to see if the underlying issue can be fixed. When a WU "fails" on the donor system, the Servers don't know if that is a bad WU or a system issue so assigns multiple copies to verify. There's an upper limit of 5 (IIRC) and if all 5 are failures, then the WU is marked as a bad one. Recently, we have had a large influx of new donors and some of them might be running overclocked GPUs which were stable for gaming but unstable for folding. Thus, it explains why some WUs are being assigned few times.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
BobWilliams757
Posts: 523
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 16435 and RX Vega 56/64

Post by BobWilliams757 »

PantherX,

It seems your memory is solid on the failed units. None had 5 or more, so probably wouldn't display as a bad WU. I looked at all the various runs, and in doing that found that several had no faulty returns and one or more returned OK, so my bad in assuming there were more problems than was correct. And having seen all the threads with donors having unstable systems, it certainly makes sense to not flag them too early in the game.


Just for future reference though, is that to say that at some point (possibly the 5 faulty WU's) the system would somehow flag it and bring it to the attention of someone?
Fold them if you get them!
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 16435 and RX Vega 56/64

Post by PantherX »

BobWilliams757 wrote:...Just for future reference though, is that to say that at some point (possibly the 5 faulty WU's) the system would somehow flag it and bring it to the attention of someone?
Bad WUs from the Server's POV are automatically flagged and it is up to the researcher to further investigate it. IME, this is done regularly as it allows researchers to detect any potential issues, make adjustments, stop simulations, etc. Even bad WUs are scientifically valuable as it provides researchers information that they need and helps them out for their research.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

PantherX wrote:Regarding failures of WUs, that would need to be investigated for your system to see if the underlying issue can be fixed. When a WU "fails" on the donor system, the Servers don't know if that is a bad WU or a system issue so assigns multiple copies to verify. There's an upper limit of 5 (IIRC) and if all 5 are failures, then the WU is marked as a bad one. Recently, we have had a large influx of new donors and some of them might be running overclocked GPUs which were stable for gaming but unstable for folding. Thus, it explains why some WUs are being assigned few times.
Because of what you say here, I've been looking at my other completed project numbers. The majority of them have a single return, mine, with OK. I did notice something that surprised me; that when there is more than one entry (when I am not the first to get assigned a WU), there are many many faulty returns. I would have never guessed that there were so many unstable systems out there (and gives credence to everyone pointing the finger at my PC in this thread); I was thinking I was bringing the average down with a ~60% success rate on Pr 16435 WUs. But it turns out almost everyone is having issues folding :D. My last send as an example: link.

I keep coming across nVidia's raplab - almost all of its returns are bad: https://apps.foldingathome.org/cpu?q=raplab
Image
muziqaz
Posts: 951
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

You would be surprised what passes as stable these days to average user (or even serious gamer). Folding@home brings every instability out very quickly.
We know that nVidia and AMD have driver profiles for certain benchmarks, profiles which limit power and usage in them, however they do not have such things for F@H.
FAH Omega tester
Joe_H
Site Admin
Posts: 7941
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 16435 and RX Vega 56/64

Post by Joe_H »

Some of these average or gamer users can be convinced to check and reduce their overclock to get to a folding stable setup. But many are convinced the benchmarks they have run, or the temperatures are a final indication of stable. Running integer based compute benchmarks, when floating point is what is important for F@h or video benchmarks are the most common.

The best we have at this point is FAHBench for GPUs, but that is limited to code emulating running of Core_21. Core_22 had efficiency improvements that push cards more. There is an unofficial build of FAHBench out there that substitutes later OpenMM code, eventually an official release should be finished and put up on the download site. Basically a card stable on the current FAHBench may need overclock reduced a bit to be stable on Core_22 projects. Some specific projects load GPUs even more than the test WU included in FAHBench, but there are ways to capture a WU data and substitute it for the embedded one.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply