Page 4 of 5

Re: Core temperatures

Posted: Thu Jun 11, 2020 5:59 pm
by ajm
But why with a one software only, and not with others that use the same instruction sets? And that only when using a certain processor, and not with others?

And why do I have those 997 errors only when using the Threadripper? Twice already, I went from an X299 system to a TRX40 system with the same Windows, on the same disk. Just place the system disk in the new kit, boot, let Windows charge the drivers for the new CPU and chipset, and hit the ground running. Those errors appear as the only change in the whole set-up is the processor and the architecture it demands. All else stays the same.

And I don't have a temperature problem - I have that covered now. I could even overclock into instability without throttling or over-stressing the cooling. What I have is a one dead CPU problem, on a machine that is crucial for me. And I don't want a two dead CPUs one.

But that said, I also disable the boost during the night, when all the machine is doing is backups and index-updates. With the regedit tweak, it's a matter of 2-3 clicks.

EDIT: And the dead CPU passed while running (and folding) without any boost (or DOCP), at base speed only, because I did not have sufficient cooling then.

Re: Core temperatures

Posted: Fri Jun 12, 2020 5:17 am
by ajm
MeeLee wrote:I'm currently experiencing issues with my 3950x, where the scalar doesn't seem to work well anymore, and the CPU is throttling at 4,1Ghz (with threads going between 100% and 80% active).
Something is not working right now, and hence why I looked online about a possible PBO overboost issue with Ryzen (and threadrippers), that cause high temps and high wattages.
With what kind of load (software) and at which Package temperature are you experiencing these problems? Have you tried/compared Linpack Xtreme with FAH?

EDIT: And what kind of cooling are you using? https://youtu.be/QxEPye6mSsI

Re: Core temperatures

Posted: Fri Jun 12, 2020 10:26 am
by ajm
PantherX wrote:
ajm wrote:A question for @PantherX: What do you think of AIDA64 Extreme CPU stress test? As a complement to Linpack? ...
I haven't used AIDA64 so can't comment. While it seems that they have AV2 support, I guess that the question is how are they using it in the benchmark.

(...)
From AIDA64 Technical Support:
The Stability test on your CPU is using Zen2 FMA with YMM registers instead of AVX2

Re: Core temperatures

Posted: Fri Jun 12, 2020 10:55 am
by PantherX
Humm... does that mean that AVX2 support is emulated (not native) in AIDA64. If that's my understanding (correct me if I am wrong), then it would make sense that the thermal output would be lower when compared to folding as it would use native AVX2 support which would generate more heat.

Re: Core temperatures

Posted: Fri Jun 12, 2020 11:05 am
by ajm
Or it means that AIDA64 adapts the stability test to the tested processor, and it didn't use AVX2 at all on the Threadripper.
Anyway, AIDA64 generated more heat than Linpack Xtreme.

But I asked the support. We'll see.

Re: Core temperatures

Posted: Fri Jun 12, 2020 11:10 am
by PantherX
ajm wrote:...But I asked the support. We'll see.
Appreciate that you're going the extra mile to see what's happening!

Out-of-the-box-thinking :idea: I wonder if you can log a support case pointing out that on a Threadripper, folding is more intensive than their stability test so their product may not be fit for purpose and see what their response is :idea:

Re: Core temperatures

Posted: Fri Jun 12, 2020 11:27 am
by ajm
done!

Re: Core temperatures

Posted: Fri Jun 12, 2020 2:37 pm
by ajm
AMD Threadripper 3970X under heavy AVX2 load: Defective design? (No, but there is an issue)
https://forum.level1techs.com/t/amd-thr ... e/153883/5

https://www.anandtech.com/show/15044/th ... s-on-7nm/6
Image

COMMENT: If you want to fold, go Xeon or X299, and stay away from the Threadripper...

Re: Core temperatures

Posted: Fri Jun 12, 2020 3:27 pm
by MeeLee
ajm wrote:
MeeLee wrote:I'm currently experiencing issues with my 3950x, where the scalar doesn't seem to work well anymore, and the CPU is throttling at 4,1Ghz (with threads going between 100% and 80% active).
Something is not working right now, and hence why I looked online about a possible PBO overboost issue with Ryzen (and threadrippers), that cause high temps and high wattages.
With what kind of load (software) and at which Package temperature are you experiencing these problems? Have you tried/compared Linpack Xtreme with FAH?

EDIT: And what kind of cooling are you using? https://youtu.be/QxEPye6mSsI
Cooling is not an issue. 240mm water cooled. The CPU remains relatively cool.
The board doesn't have a sensor, but the cooling solution doesn't surpass 40c, which makes me believe the CPU runs at around 60-70C max.

I presume either I have one of the first 3950x versions, or my motherboard (ASUS X570 TUF) is pure garbage (also one of the first boards available supporting the 3000 series CPUs, with their inherent problems).

I knew the Infinity fabric couldn't run at 1800Mhz, even though it should, but it seemed to run odd at 1700Mhz.
For a while it ran fine though, but started having problems (hanging, freezes) down the road that became more frequent.
I lowered it to 1666, which ran fine for a while, but errors later again. Then further to 1633Mhz, where it runs stable.
Then I adjusted the 3600Mhz DDR RAM which ran at 3400Mhz already, down to 3266Mhz. That seemed to do the trick.
It's mostly running stable again now.
I'm starting to believe the board/CPU combination doesn't support faster than maybe 3200Mhz (3400Mhz possibly as a far overclock from factory).
Faster memory never ran well on this board.

It'll be replaced by the end of the week, whenever the new delivery comes in.
The seller already returned the money,before I even shipped it back.
Means something...

Re: Core temperatures

Posted: Fri Jun 12, 2020 3:54 pm
by ajm
MeeLee wrote:(...)Cooling is not an issue. 240mm water cooled. The CPU remains relatively cool.
The board doesn't have a sensor, but the cooling solution doesn't surpass 40c, which makes me believe the CPU runs at around 60-70C max.
(...)
The board does not have a sensor for the CPU??? Incredible!

I don't think that you can deduce the CPU temp on the basis of the loop temp. There is a threshold above which the loop just can't dissipate more heat and that heat then stays in the CPU. Be careful. And glad you can get a new board!

Re: Core temperatures

Posted: Fri Jun 12, 2020 11:05 pm
by JimboPalmer
This is what I think I know about F@H Core_a7 0.0.18 and AVX.

Core_a7 can use either SSE2 or AVX_256. It will not use AVX2 or AVX_512 on any CPU, nor for that matter, other flavors of SSE. (GROMACS can, so this may change in the future)

Zen and Zen+ can do AVX_256 but they do so with two micro ops of 128 bits each, Zen 2 does AVX_256 in one micro op as does Intel. So there is a distinct performance gain with Zen2, and so a gain in temperature.

SSE2 does a vector of four 32 bit FP operations at once per thread, while AVX_256 does a vector of eight 32 bit FP operations at once, so for intel and Zen2, expect a 2 to 1 jump in performance. In the above charts with some other program, it jumps from 9987 to 18,996. That is close to 2 to 1.

Re: Core temperatures

Posted: Sat Jun 13, 2020 3:53 am
by ajm
JimboPalmer wrote:Zen and Zen+ can do AVX_256 but they do so with two micro ops of 128 bits each, Zen 2 does AVX_256 in one micro op as does Intel. So there is a distinct performance gain with Zen2, and so a gain in temperature.

SSE2 does a vector of four 32 bit FP operations at once per thread, while AVX_256 does a vector of eight 32 bit FP operations at once, so for intel and Zen2, expect a 2 to 1 jump in performance. In the above charts with some other program, it jumps from 9987 to 18,996. That is close to 2 to 1.
But that is nothing compared to the Xeon W-3175X, which jumps from 6522 to 52889, that is, a factor of 8.1.
There IS a major difference between Intel and Zen2 with AVX, and it doesn't benefit to Zen2: the 3970X is almost 3 times less performing than the Xeon, with more cores. Almost twice less performing than an i9-7960X, with twice the number of cores. I know that it gets hotter than a 7940X, too.
Why? Probably a less than optimal implementation of AVX, no?

Re: Core temperatures

Posted: Sat Jun 13, 2020 4:31 am
by bruce
The recent Intel line of CPUs all support AVX. (I've got some older AMD and Intel CPUs that only support SSE2.) I'm not sure about AMD's AVX, but Intel's AVX adds an appreciable amount of heat over just using SSE so the package is more likely to thermal-limit, especially if you're using a generic heat-sink. I suppose it will be even worse when somebody decides to permit the use of Intel's iGPU for folding.

Anyway, I'd get more production out of my CPU chips if I upgraded my heat-sinks. That may be what's going on in the With AVX / without charts above, too.

Re: Core temperatures

Posted: Sat Jun 13, 2020 9:24 am
by ajm
There is no "generic" heat-sink for the latest Threadrippers.
But it's true that hardly any of the cooling solutions AMD is proposing (https://www.amd.com/en/thermal-solutions-threadripper ) would allow for prolonged folding at boost speed.

Re: Core temperatures

Posted: Sat Jun 13, 2020 10:57 am
by Neil-B
… but by definition boost is not for sustained processing of any sort let along something as intensive as FAH folding so this shouldn't come as a surprise? … my guess is that in order to claim high boosts AMD accepted that they would only be sustainable for short spiky bursts due to heat issues (whatever the cooling) and hence how they have defined boost.

tbh I use Intel server grade cpus which always sound "slow" but tend to consistently over achieve their stated speeds even when folding FAH without thermal issues … even if those stated speeds keep becoming less effective as Intel nerfs their CPUs performance !!!

It all depends on where the various chip makers see their market and how they choose to design/sell their wares … It would appear that AMD and the MoBo manufacturers have possibly pushed Threadrippers to an extreme where power consumption and heat generation issues may be more of an issue than they would like for a larger part of their market sector (it is beginning to snowball a bit in the press/media).

Inadequate cooling is the "easy target" but there seems to be an equally if not more valid issue with significant heat generation at high frequency/power levels causing cooling to be inadequate … Quite what AMD have done within the Threadrippers to make this such a marked issue (especially for FAH folding) will probably never see the light of day - but I guess they might learn from this and future CPUs may be performant to a slightly wider usage pattern.