PRCG numbers for some abnormally long runtime WUs
In my case this is based on 1 x CPU core running 1 x FAHCore_22
13421 - 4444, 0, 0 - ETA 13 days
13421 – 3142, 11, 0 – ETA 11 days
Tested what makes them move forward at some normal speed, and giving each of them 2 x CPU cores makes them rather happy, and giving them 3 x CPU cores makes them even more happy, which in general is rather grabby on CPU resources for a GPU WU.
13421 WUs with abnormally long runtime
Moderators: Site Moderators, FAHC Science Team
Re: 13421 WUs with abnormally long runtime
Three questions:Sparkly wrote:PRCG numbers for some abnormally long runtime WUs
In my case this is based on 1 x CPU core running 1 x FAHCore_22
* What GPU is doing the processing?
* How long were those assignments processing without interruption before you noted the estimated run-time?
* Are you running other projects that make heavy use of the GPU?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: 13421 WUs with abnormally long runtime
Both WUs ran on its own RX580 GPU for like 8-10 hours, reaching like 3%, when the remaining ETA was recorded, before giving each of the WUs more cores to play with, resulting in a more normal runtime and ETA of like 3h, and this was the only two WUs running at the time, since everything else was turned off.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: 13421 WUs with abnormally long runtime
Thanks for the heads-up on these. We're still working to figure out why some RUNs for these are exceptionally long. Our best hypothesis so far is that this has to do with how constraints are handled, but more investigation is necessary. Hopefully we can dig into that this week now that the sprints are running.
As always, huge thanks for sticking with us despite the suboptimal situation right now!
~ John Chodera // MSKCC
As always, huge thanks for sticking with us despite the suboptimal situation right now!
~ John Chodera // MSKCC
Re: 13421 WUs with abnormally long runtime
How are you giving the gpu work units more cores? I have lowered the cpu slot to 6c out of 12, and bumped a22 priority on a wu per wu basis when I can, but only see 2 process threads per wu.
I would like to try this as well, as 13421 has some pretty low performance for me (rx470, ppd down to half, power use down 30% per card)
I would like to try this as well, as 13421 has some pretty low performance for me (rx470, ppd down to half, power use down 30% per card)
Re: 13421 WUs with abnormally long runtime
I am using a process managerYeroon wrote:How are you giving the gpu work units more cores?
https://www.bill2-software.com/processm ... load.shtml
to automatically reduce the number cores a FAHcore_22 process gets access to in the first place, so unless you have reduced the amount of cores the process gets access to when it starts, then it will just grab what it can from available cores, so setting affinity might not make any difference for you in that case, unless you do it for manual load balancing purposes.
In Windows 10 you can set process affinity manually for each running process
https://thegeekpage.com/set-affinity-fo ... indows-10/
Re: 13421 WUs with abnormally long runtime
FAHCore_22 has two functions that use a CPU thread. One process moves data to/from mail RAM and the GPU. For NVidia, on WIndows, this process uses a spin-wait (rather han an interruptible sleep) so the CPU always appears to be busy. For AMD, it doesn't use a spin-wait.
The other thing it does (for either GPU) is it runs a sanity check which periodically checks the current state of the WU. That generally uses CPU resources for a very short period of time at widely spaced intervals.
To the best of my knowledge, it doesn't add up to much (except for the spin-wait) so I don't really worry about it.
The other thing it does (for either GPU) is it runs a sanity check which periodically checks the current state of the WU. That generally uses CPU resources for a very short period of time at widely spaced intervals.
To the best of my knowledge, it doesn't add up to much (except for the spin-wait) so I don't really worry about it.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 520
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: 13421 WUs with abnormally long runtime
Of the 20 WU's I've run on this project, only 2 seemed a bit out of range.
RCG 876, 79, 0 was a bit quicker, with a 56 second frame time vs 1:11 average. 215k PPD
RCG 6262, 31, 0 was the slow one, with a frame time of 2:47 and PPD of 42k
All the others with the average frame time of 1:11 netted approx 150k PPD
All other than the one slow one are on the high side of normal for the Vega 11 I'm using. But any project with atom counts below 15k or so are always on the fast side with this little onboard GPU, so really nothing unexpected.
Passed for any use in troubleshooting only. I'll fold 'em if they all drop down to slow return times.
RCG 876, 79, 0 was a bit quicker, with a 56 second frame time vs 1:11 average. 215k PPD
RCG 6262, 31, 0 was the slow one, with a frame time of 2:47 and PPD of 42k
All the others with the average frame time of 1:11 netted approx 150k PPD
All other than the one slow one are on the high side of normal for the Vega 11 I'm using. But any project with atom counts below 15k or so are always on the fast side with this little onboard GPU, so really nothing unexpected.
Passed for any use in troubleshooting only. I'll fold 'em if they all drop down to slow return times.
Fold them if you get them!
Re: 13421 WUs with abnormally long runtime
Thanks.
Troubleshooting is focusing mostly on error reports (crashes, etc.) I don't think anybody notices (except donors like yourself) when a WU takes longer or shorter time than "normal" unless you report them.
One or two have been extracted for special attention, though.
I suggest a concise report sort of like this one including the hardware involved.
Troubleshooting is focusing mostly on error reports (crashes, etc.) I don't think anybody notices (except donors like yourself) when a WU takes longer or shorter time than "normal" unless you report them.
One or two have been extracted for special attention, though.
I suggest a concise report sort of like this one including the hardware involved.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.