clEnqueueNDRangeKernel (-5) error + GPU reset logged in Event log is typically the TDR issue.JohnChodera wrote:> This is the TDR issue ... now, is triggered by faulty hardware or by software ?
Are you sure this is the same as the TDR? Project 11411 is not very large, so it would be surprising if one of the GPU kernels was exceeding the windows timeout on a GTX 970.
But the TDR is not necessarily triggered by software misbehaviour. The purpose of TDR is to detected unresponsive GPU to avoid a frozen screen, like we used to have on previous Windows. There are two sources of freezes :
- when software sends too much work to the GPU. First step of this is sluggish UI, then as the work is getting even bigger, it leads to longer freezes. This is the form of TDR triggered by large WUs that you tried to workaround in core21.
- when an unrecoverable error occurs in GPU or VRAM. This one is usually triggered by too much overclocking, or faulty/dying hardware. This one is similar to the freezes that could happen on CPUs in case of unstable overcklocking ...