As far as nVidia support goes, that's good. AMD is also listed as contributor on the FAH website, maybe they could be motivated to provide such support at least for the ROCm stack, the project they advertise as open and scalable HPC solution. Can't be in their interest to have nVidia recognized as running without problems and AMD being consistently troublesome.muziqaz wrote: nVidia is doing that.
FAH dev creates fahcore>nVidia rep takes that core and runs it through their hardware in their lab with all their driver profilers and tools>driver team either optimises the drivers for the fahcore, or they give suggestions/submit patches of code to fah devs to improve fahcore.
Hardware vendor does not need to have source code in order to optimise for the code.
I know how much nVidia is involved, and I just don't see the same involvement from AMD, not even close, which is a shame, as their hardware was always very strong in pure compute tasks.
Also, fah devs mainly have nVidia hardware, as far as I know. I do not believe there are any AMD GPUs in their possession. At least we can be content that AMD CPUs punched through Intel wall when it comes to fah
If FAH developers develop and tests mostly on nVidia hardware, but do not have AMD GPUs, then it's no wonder that in the wild FAH is having more trouble on AMD hardware. But in that case AMD should be asked for what they have their logo as supporter on the website.
As for need of source code, there may be a difference between optimizing and debugging. For optimizing it may be enough to profile which kernels are run at what frequency, without the need to understand the calculation flow. For debugging it however is very helpful to understand what is going on, what should be going on and at what point and under which preconditions there is a failure. That even more, as part of the failures point in the direction of early failures, which might be caused by missing/invalid initialization. Understanding can best be achieved by following code function through the logics of source code and observing the effects (follow variable values ...). If FAH developers can't follow that flow due to lack of appropriate (AMD) systems and want to do it by feedback from series (published) core versions, for me that looks like a rather long turnaround time for debugging. To be effective such turnaround times should be measured in minutes, not weeks between released core versions. Just my thoughts, I well might miss something important.