Initial impressions for fast ARM hardware

Moderators: Site Moderators, FAHC Science Team

calxalot
Site Moderator
Posts: 1091
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Initial impressions for fast ARM hardware

Post by calxalot »

The efficiency cores just slow down the performance cores. You should set the slot to cpu:8.

You should get a passkey if you haven’t already. Once you complete 10 WUs, you should get more than 150k ppd.
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

I've managed to max out m1 pro (10 core) CPU.
Configure -> Slots -> 8 CPUs (= number of P-cores)
Configure -> Advanced -> Priority slightly higher (to avoid E-cores)

Output is 145k PPD at 20W SoC power

Some observations:
P-cores are running at 2.1 GHz, fans are nearly inaudible at 1500 rpm, looks like OS throttled performance down
On cold start cores are running at 3 GHz and initial PPD estimation was around 300k PPD

P.S. it's still an emulated x86 process. When do we get native arm64 support?
Joe_H
Site Admin
Posts: 7922
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Initial impressions for fast ARM hardware

Post by Joe_H »

vzim wrote:P.S. it's still an emulated x86 process. When do we get native arm64 support?
My guess is sometime in 2022, but no idea on exactly when during the year that will happen.

For the client it will probably be when they release the next version of the client. Work was resumed recently on version 8, it had been halted in early 2020 as F@h started its response to COVID. I am also aware of some work being done towards creating an Apple Silicon version of the CPU folding core, but testing and distribution of that may take a while. Some changes on the server end may also need to be made.

There is already an ARM64 version of the CPU folding core out for ARM systems running Linux.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

+ Macs Fan Control to force min 2500 rpm
Final WU score: 227k PPD @30W or 7566 PPD/W
Properly optimized native version will be around 10k PPD/W
Joe_H
Site Admin
Posts: 7922
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Initial impressions for fast ARM hardware

Post by Joe_H »

vzim wrote:Properly optimized native version will be around 10k PPD/W
Maybe. Rosetta 2 is pretty good at translating Intel code to the ARM code used on Apple Silicon, and it caches the translated code so it is not continuously incurring the overhead of translating running code. So it remains to be seen just how much overhead gets removed by running a native folding core.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

It's possible to run FAHClient exclusively on E-cores.

This involves some QoS magic and LaunchDaemons/org.foldingathome.fahclient.plist editing
macbook air m1
PPD est: ~24k (most likely overestimated)
Package power: 465 mW

At this power level I can keep it running 24/7 even on battery power
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Initial impressions for fast ARM hardware

Post by Neil-B »

vzim wrote:At this power level I can keep it running 24/7 even on battery power
How well does it do with completing wus within timeout?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

Neil-B wrote:
vzim wrote:At this power level I can keep it running 24/7 even on battery power
How well does it do with completing wus within timeout?
It haven't completed single WU yet. After running WU 16955 overnight it's at 30% and 1.26 days left (ETA) with 4.46 days left to complete.
Upd on power and PPD: 3.5k @ 0.33W
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Initial impressions for fast ARM hardware

Post by Neil-B »

16955 timeout is three days with 5 day expiration ... completion within the timeout rather than expiration is preferable as at timeout the wu is assigned to another folder ... from your figures it looks like it will complete in 1.86 days and so within the timeout which is great :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

Recipe for cooking on E-cores
Edit the /Library/LaunchDaemons/org.foldingathome.fahclient.plist

Code: Select all

	<key>ProgramArguments</key>
	<array>
		<string>/usr/sbin/taskpolicy</string>
		<string>-c</string>
		<string>background</string>
		<string>/usr/local/bin/FAHClient</string>
	</array>
Configure cpu slot in FAH Control to use proper number of cores (4 in my case)
Restart the folding service

Code: Select all

sudo launchctl unload /Library/LaunchDaemons/org.foldingathome.fahclient.plist
sudo launchctl load /Library/LaunchDaemons/org.foldingathome.fahclient.plist
calxalot
Site Moderator
Posts: 1091
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Initial impressions for fast ARM hardware

Post by calxalot »

Interesting.

Do you get the same result using ProcessType Background?
Does FAHControl work properly with the background clamped client?
What app do you use to get cpu core usage and clock speeds?

Thanks.
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

calxalot wrote: Do you get the same result using ProcessType Background?
I haven't tested it. I suppose it should be an equivalent in terms of CPU cores scheduling. Not sure if there additional restrictions applied by OS for "background" daemons
calxalot wrote: Does FAHControl work properly with the background clamped client?
I see no difference in FAHControl operation
calxalot wrote: What app do you use to get cpu core usage and clock speeds?
sudo powermetrics
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Initial impressions for fast ARM hardware

Post by MeeLee »

vzim wrote:+ Macs Fan Control to force min 2500 rpm
Final WU score: 227k PPD @30W or 7566 PPD/W
Properly optimized native version will be around 10k PPD/W
Are you measuring SOC power, or total system power?
The PPD values in PPD/W should be measured at the power socket, not from a program like HWMonitor that gets SOC power consumption.

It would be interesting to see which GPU system would be matching it in terms of PPD/W.
I would guess if the above watt ratings is SOC, and not total system power, that you're closer to 8PPD/W, which should be close to the PPD/W of an RTX 1660, paired with a 35W Intel Celeron CPU, running a total system power of 150W.
The desktop would be crunching out closer to 1,2M PPD though.
vzim
Posts: 7
Joined: Wed Dec 29, 2021 3:36 pm

Re: Initial impressions for fast ARM hardware

Post by vzim »

I measure SoC power impact w/ FAH process active.
Power at wall (with screen sleep) is usually 2-5W higher
Post Reply