Setup Guide: Multi GPU Ubuntu Server AMD & nvidia

Moderators: Site Moderators, FAHC Science Team

Post Reply
BJMcGee
Posts: 5
Joined: Sun Mar 22, 2020 6:31 pm

Setup Guide: Multi GPU Ubuntu Server AMD & nvidia

Post by BJMcGee »

First, thank you to all the others on this forum who shared their experience. I ran into multiple challenges getting my rigs up and running, so I wanted to share the path that finally worked for me. I did a fair amount of folding years ago mainly on Windows with my CPU. Getting GPUs to fold took a few extra steps. I'll write this a bit high level and add details as people ask questions.

Starting with the back story:
I built a few mining rigs during the ethereum bubble with the plans of turning them to folding when the bubble popped. Well the bubble popped a while ago and the rigs have been collecting dust. They double as the winter heat source for my garage, and I have some garage projects to do now. So, it was time to get these up and running. Some comments on the hardware selection ... I won't be running any CPU folding on these machines as you can see the CPU is woefully under spec for managing 6 GPUs and doing anything else. These rigs will focus on folding with the GPUs only.

Common Hardware:
  • Motherboard: Biostar H110M-BTC
    CPU: Celeron G3930
    Ram: 8 GB DDR4
    HDD: 500 GB, 7200 RPM, SATA
    PowerSupply: 1000W ATX
    Case: Open Frame
Video Cards: (6 for Each Rig)
  • PCIe: 1x to 16x Risers with Power Connections
    AMD: RX 580
    nvidia: GTX 1060
Additional Support Machine:
Being that my rigs run with no display and no keyboard, I will be using a second machine to run FAHControl and SSH to access the terminal. For this write-up I'm using an iMac as my support machine. The specs on this machine are not critical as it won't be doing the heavy lifting.

Hardware/BIOS Pre-Installation Setup:
To ensure minimal challenges during setup, I installed all the Common Hardware, but no PCIe risers or GPUs. I am using the on motherboard HDMI connection to the Integrated GPU (Intel HD Graphics) as the display connection for setup. This allows me to get into the BIOS and perform initial setup. I recommend resetting the BIOS to factory defaults before the start of any new build, then adjust the settings needed to support your setup. For this board there are specific items that need to be set, and some optional items that may help as well. Biostar has a mining mode that needs to be enable to properly allocate PCIe lanes to support more than 4 GPUs. I recommend forcing the integrated graphics to be the primary display and force it to enabled; this gives a default target for the OS without bothering your compute GPUs. Optionally, you can disable the onboard sound and serial port controller since we won't be needing them. You can also enable smart CPU fan control so it isn't on all the time and calibrate the speed controller.

Once the BIOS display settings are configured, you can add ONE PCIe riser and GPU to the system. This will give you all the hardware you need for a successful install, without any additional surprises. Once the machine is up and healthy, adding more GPUs is pretty easy. Don't forget to connect the network cable to your router before you proceed with the OS install, it will download updates if needed along the way.

OS Selection and Install:
Ubuntu has been my goto Linux distro for years. It is also supported by the Folding@Home client. Additionally, AMD has drivers for Ubuntu 18.04. Being these machine don't typically have a monitor hooked up, the server version of Ubuntu will do nicely. There is a newer release of Ubuntu available, but it may not have the software and/or driver support needed. So, unless you just really enjoy a challenge, it's best to stick with the officially supported release.

Download Ubuntu Server 18.04 LTS: https://ubuntu.com/download/server
Put this onto a USB Flash drive to use it as the install media. Windows example: https://ubuntu.com/tutorials/tutorial-c ... 1-overview
  • During setup you can use the default options for pretty much everything. I would suggest installing openSSH so you can remotely login in and check on the rig later.
    Make sure you remember the username and password you choose! You will need it multiple times throughout this process.
    Ubuntu Server Installation Guide can be found here: https://ubuntu.com/tutorials/tutorial-i ... 1-overview
Once the machine is running Ubuntu Server, you can continue the rest of setup through SSH. This allows you to copy/paste commands instead of typing them out long hand.
You can find the ip address of your new Ubuntu Server by typing "ifconfig" at the terminal. It will give you more information than you need. It should show two adapters; "lo" is the loopback and should be 127.0.0.1; the other adapter should be your ethernet connection. The inet address is the ip address you need to use to access the machine remotely. In the example below you can see mine is "192.168.1.101" You should see something like this come back:

Code: Select all

enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.101  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::ba97:5aff:fef9:c699  prefixlen 64  scopeid 0x20<link>
        ether b8:97:5a:f9:c6:99  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
On my iMac I can access the rig by opening a terminal and typing:

Code: Select all

SSH <user>@192.168.1.101
Setup Temperature and Fan Monitoring:
We will need a way to keep an eye on temperatures. For AMD cards you can see core voltages and fan speeds in the sensors tool. For nvidia cards I will be using nvidia-smi (installed with the drivers later) to watch fan speeds and temperatures. Since my CPUs won't be folding, I'm not too worried about temperatures there.

Install lm-sensors and scan for hardware:

Code: Select all

sudo apt install lm-sensors
sudo sensors-detect
sensors
nvidia Graphics Drivers Install:
nvidia drivers for opencl are available in a few different locations. The method below worked for me first shot, so I didn't try any of the others.
Add the PPA Graphics Drivers Repository:

Code: Select all

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get install dkms build-essential
sudo apt-get update
Find the latest version available:

Code: Select all

sudo apt list nvidia-driver*
Install the latest driver:

Code: Select all

sudo apt install nvidia-driver-440
AMD Graphics Drivers Install:
I used the official AMD drivers available here: https://www.amd.com/en/support/graphics ... eon-rx-580
AMD has a nice writeup to help guide the install: https://amdgpu-install.readthedocs.io/en/latest/
One challenge I ran into here was getting the file onto my Ubuntu Server rig. The path provided by the AMD site is not directly accessible. I downloaded the file and moved it to my rig using a flash drive.
Once the file was on the rig can can unpack the files as instructed by AMD using:

Code: Select all

tar -Jxvf amdgpu-pro-YY.XX-NNNNNN.tar.xz
This will create a folder with all the installer files. cd into the folder with the installer files. The command options below install the basic driver and openCL drivers only.

Code: Select all

./amdgpu-pro-install -y --opencl=legacy --headless
OpenCL Verification and Install
You may be able to skip this step on some combinations of hardware and drivers, but I found it doesn't hurt anything to have it just in case.
CLinfo will allow you to verify the drivers installed for your cards can actually run OpenCL content.
Install and run CLinfo:

Code: Select all

sudo apt install clinfo
clinfo
The response from CLinfo is quite long in some cases.
You can run:

Code: Select all

clinfo | grep "Platform Version"
This is what my AMD rig responded with:

Code: Select all

Platform Version                                OpenCL 2.1 AMD-APP (3004.6)
Just to be sure we have what we need to make openCL work I installed the latest openCL development drivers.

Code: Select all

sudo apt install ocl-icd-opencl-dev
This is a good spot to restart the computer before we continue:

Code: Select all

sudo shutdown -r now
PCIe Errors; Severity: Corrected:
I was having some serious PCIe errors at this point in my setup. After doing some research, I decided the errors were harmless. However, I didn't want the additional overhead of logging thousands of errors.
I modified my grub to disable Advanced Error Reporting to hide these from the log file and reduce system overhead.
Be careful on these next steps, it can stop your rig from booting at all if you type something incorrectly.
You can edit your grub file by using:

Code: Select all

sudo nano /etc/default/grub
There may be other items in these lines already, you just need to add to them. Don't remove other content unless you need to for other reasons.
You will need to add "pci=noaer" to your linux boot lines as shown here:

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="pci=noaer"
GRUB_CMDLINE_LINUX="pci=noaer"
Click Ctrl-X to exit, answer Yes to the Save, and press Enter to confirm the file name.
Once the File is saved you need to update GRUB with the new config by:

Code: Select all

sudo update-grub
You will need a reboot to get this into effect:

Code: Select all

sudo shutdown -r now
Download and Install the Folding@Home Client:
Download the most recent release:

Code: Select all

wget https://download.foldingathome.org/releases/public/release/fahclient/debian-stable-64bit/v7.5/fahclient_7.5.1_amd64.deb
Install as a Service:
Be sure to specify your Folding Username, Team, etc. during setup ...

Code: Select all

sudo dpkg -i fahclient_7.5.1_amd64.deb
To be sure you have the latest supported GPU list, you can update it manually:

Code: Select all

sudo wget https://apps.foldingathome.org/GPUs.txt -P /var/lib/fahclient/
I didn't install FAHControl or FAHViewer on my folding rig. If you used Ubuntu desktop on your rig, then these would be helpful for configuring and watching your rig fold.
I did install the Folding@Home package on my iMac so I could use the FAHControl panel to remotely monitor my rigs. For Windows and Mac these come as one package, unlike the Linus releases.
You can find all the available versions here for your support machine: https://foldingathome.org/alternative-downloads/

AMD Only: Fixing openCL Access:
The nvidia install seems happy to share it's openCL drivers. However, the AMD drivers are restricted to a different user/group than fahclient.
Currently, I worked around this by changing fahclient to run as root. This is acceptable to me on this dedicated folding rig. If you are installing this on a daily use Ubuntu rig, I would look for other solutions. The gap here is just access to the openCL drivers during boot up. Once the rig is up it seems to be able to figure it out, but requires other work arounds.
To run FAHClient as root you need to edit the init.d file:

Code: Select all

sudo nano /etc/init.d/FAHClient
Once the file is open you will see a full script that handles starting, stopping, etc. the fahclient. You only need to change the one line that says USER=fahclient to USER=root.
It should look like this:

Code: Select all

USER=root
Reboot the rig again to make this effective:

Code: Select all

sudo shutdown -r now
You can verify the user running the client using htop. The FAHClient should be near the top of the list. You can see the user who call the command in the second column.

Code: Select all

htop
Enabling Remote Access for FAHControl:
To make entering the PassKey easier and adjusting other settings simpler, you can enable remote access and remote web.
You need to edit the FAHClient config file first. Use this command to open your config file for editing:

Code: Select all

sudo nano /etc/fahclient/config.xml
This will allow you to connect from your local network only. Should be safe enough for most home networks. You will need to edit the ip address and username to match your situation. My whole home network uses addresses that start 192.168.1.### so I use the ip filter of 192.168.1.0/24 to allow anyone on my network to access the rig. If you want to restrict it down, you can put just the ip of your support machine in that spot as '192.168.1.44'
Be sure to remove the line that says something like "gpu v='false'''
It will have some items in it already, but you can remove/add lines look like this:

Code: Select all

<config>
  <!-- Client Control -->
  <fold-anon v='true'/>

  <!-- HTTP Server -->
  <allow v='192.168.1.0/24'/>

  <!-- Network -->
  <proxy v=':8080'/>

  <!-- Remote Command Server -->
  <command-allow-no-pass v='192.168.1.0/24'/>

  <!-- Slot Control -->
  <power v='full'/>

  <!-- User Information -->
  <user v='BJMcGee'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
</config>
You will have to stop and start the FAHClient for the settings to take effect.
You can stop the client using:

Code: Select all

sudo /etc/init.d/FAHClient stop
And start is back up using:

Code: Select all

sudo /etc/init.d/FAHClient start
Now you should be able to connect FAHControl from your support machine.

Stop Folding on Your Support Machine:
My iMac is not very power efficient for folding, and spends most of its time off. When it is on, it's already running hot due to its age. So, to avoid any undue stress on this old machine, I've stopped the local machine from folding. If you have a gaming rig with a newer CPU and a powerful GPU, you can keep folding on your support rig and make an even bigger contribution.

Add Your Folding Rigs to your FAHControl:
Use the "+Add" button at the bottom of the client list window to add your remote rigs (your local client is already added). The port should be default. You can name them however you want to keep track of which is which. Just enter the correct ip address for the rig to get it connected. Once the rig is connected, you can click on it in the client list and click the "Configure" button on the menu bar. You should see Tabs for: Connection, Identity, Slots, Remote Access, Proxy, Advanced and Expert.
On the Identity Tab make sure your Name is correct. You can choose any name, but you might want to check the F@H site to make sure your name isn't taken already, or you'll be giving your points to someone else.
You can search for stats by Donor name here: https://stats.foldingathome.org/donors
If your name returns no data, then you should be good to use it for your folding.
You also want to enter your Passkey on this tab. You have to enter it twice, so it is best to copy paste directly from the email you receive.
You can request a Passkey here: https://apps.foldingathome.org/getpasskey
While a Passkey is optional, it is highly recommended to get the most credits for your work.
More information on Passkeys here: https://foldingathome.org/support/faq/points/passkey/

Enable Slots for Folding on GPUs:
The default config file included with the FAHClient starts folding on the CPU immediately, but it may ignore your GPU. Using FAHControl is the easiest way to add GPUs without worrying about typing mistakes.
Still in the Configure window ... On the Slots Tab you can see the list of hardware actively available for the client. You can click the "+Add" button at the bottom of the list to add new slots (CPUs or GPUs). For most setups you just need to select the radio button for GPU and click OK. I recommend clicking save and making sure the GPU appears one at a time. (Since I only have one GPU in the rig at this point, I stop here until I finish the rest of the setup. Come back to this step as you add GPUs to you rig and need to enable them in the client.) Since my folding rigs have very weak CPUs, I actually remove the CPU slot. This allows the CPU to focus on loading WUs and sending/receiving data on the network. If you have a more powerful CPU, then you can leave it loaded as an active slot.

AMD Fan Control:
My RX580s would run for roughly 10 minutes with the default fan settings before overheating and rebooting the rig. I expected this because they did this when I was mining too.
While cranking up the fan speed is a bit wasteful on power, these cards run hot. So, no choice really.
I found amdgpu-pro-fans that allow me to set the fan speeds for all cards in one step.
You can find more information about it here: https://github.com/DominiLux/amdgpu-pro-fans
To install you have to first make sure you have git:

Code: Select all

sudo apt-get install git
Then you can download and install amdgpu-pro-fans

Code: Select all

git clone https://github.com/dominilux/amdgpu-pro-fans
cd amdgpu-pro-fans
chmod +x amdgpu-pro-fans.sh
Now we just need to make this run at boot up.
If you never used crontab, you can set it up using the defaults.
You can run crontab using this:

Code: Select all

sudo crontab -e
Add the following line to the end of the file: (You will have to replace <user> with your rig username, or the path completely if you downloaded to a different location.)

Code: Select all

@reboot /home/<user>/amdgpu-pro-fans/amdgpu-pro-fans.sh -s 80
nvidia Fan Control:
This section is TBD. Currently the default fan control is keeping the rig at decent temps. I would like to come back to this to get the temps down a few degrees. It is important to balance power consumption for cooling vs. folding. Over cooling can be a waste of energy and doesn't really gain much in a steady state application like this.

How is it running?
We can use the nvidia-smi command to check on nvidia card temps and fan speeds:

Code: Select all

nvidia-smi
For AMD setups you can see all the cards and CPU/Motherboard temps using sensors:

Code: Select all

sensors
You can use sensors for nvidia setups, but it only shows the CPU/Motherboard temps.
Obviously, you can use FAHControl on your support machine to watch the progress on each slot

I'm sure I missed something somewhere, but I wanted to write down as much as possible so I can setup more rigs in the future and remember how I did it. I hope this helps some others with similar setups. If you have questions/suggestions, please response and I'll do what I can to help explain further. I'm by no means an expert in this area, just a hobbyist. :)
Kebast
Posts: 386
Joined: Thu Aug 06, 2015 5:21 pm

Re: Setup Guide: Multi GPU Ubuntu Server AMD & nvidia

Post by Kebast »

Great guide! Coolbits for nvidia isn't terribly hard to use. Just have to be careful with the commands.
I'll say, I have some older motherboards that just won't run the 18.04 installer. I have to install 16.04 then do a release upgrade. Just mentioning in case anyone reading this runs into that problem.
Image
Ryzen 5900x 12T - RTX 4070 TI
Post Reply