ROCm GPU Server Driver Installation Guide for Linux
The ROCm Platform brings a rich foundation to advanced computing by seamlessly integrating the CPU and GPU with the goal of solving real-world problems.
ROCm started with just the support of AMD’s FIJI Family of dGPUs. Starting with ROCm 1.3 we further extends support to include the Polaris Family of ASICs. With ROCm 1.6 we added Vega Family of products.
New too ROCm 1.7 is the DKMS driver installation
- New driver installation uses Dynamic Kernel Module Support (DKMS)
- Only amdkfd and amdgpu kernel modules are installed to support AMD hardware
- Currently only Debian packages are provided for DKMS (no Fedora suport available)
- See the ROCT-Thunk-Interface and ROCK-Kernel-Driver for additional documentation on driver setup
To use ROCm on your system you need the following:
- ROCm Capable CPU and GPU
- Supported Version of Linux with a specified GCC Compiler and ToolChain
Table 1. Native Linux Distribution Support in ROCm 1.7
Pre Install Directions
First make sure your system is up to date
sudo apt update sudo apt dist-upgrade sudo reboot
Verify You Have ROCm Capable GPU Installed int the System
lspci | grep -i AMD
You will see list of AMD GPU’s
Verify You Have a Supported Version of Linux
uname -m && cat /etc/*release
You will see some thing like for Ubuntu
x86_64 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
Verify version of GCC
You will see
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Choose an Installation Method
Package manager Based Install
A Package Manager Based Installation use your Linux Distro system’s package management service.
Ubuntu uses Debian and Fedora RPM Packages
Add the Repo Server
For Debian based systems, like Ubuntu, configure the Debian ROCm repository as follows:
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - sudo sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list'
The gpg key might change, so it may need to be updated when installing a new release. The current rocm.gpg.key is not avialable in a standard key ring distribution, but has the following sha1sum hash:
Install or update ROCm
Next, update the apt-get repository list and install/update the ROCm package:
Warning: Before proceeding, make sure to completely uninstall any previous ROCm package:
sudo apt-get update sudo apt-get install libnuma-dev sudo apt-get install rocm-dkms rocm-opencl-dev
Setting Permisions to Use ROCm
With move to upstreaming the KFD driver and the support of DKMS, for all Console aka headless user, you will need to add all your users to the ‘video” group by setting the Unix permissions
Configure Ensure that your user account is a member of the “video” group prior to using the ROCm driver. You can find which groups you are a member of with the following command:
To add yourself to the video group you will need the sudo password and can use the following command:
sudo usermod -a -G video $LOGNAME
Once complete, reboot your system.
We recommend you verify your installation to make sure everything completed successfully.
######## If you Plan to Run with X11 - we are seeing X freezes under load
ROCm 1.7.1 a kernel parameter noretry has been set to 1 to improve overall system performance. However it has been proven to bring instability to graphics driver shipped with Ubuntu. This is an ongoing issue and we are looking into it.
Before that, please try apply this change by changing noretry bit to 0.
echo 0 | sudo tee /sys/module/amdkfd/parameters/noretry Files under /sys won’t be preserved after reboot so you’ll need to do it every time.
One way to keep noretry=0 is to change /etc/modprobe.d/amdkfd.conf and make it be:
options amdkfd noretry=0
Once it’s done, run sudo update-initramfs -u. Reboot and verify /sys/module/amdkfd/parameters/noretry stays as 0.
Post install verification
Upon restart, To test your OpenCL instance
Post Install all user need to part of the member of “video” group so set your Unix permisions for this.
Build and run Hello World OCL app..
wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cl
Build it using the default ROCm OpenCL include and library locations:
g++ -I /opt/rocm/opencl/include/ ./HelloWorld.cpp -o HelloWorld -L/opt/rocm/opencl/lib/x86_64 -lOpenCL
To un-install the entire rocm development package execute:
- Ubuntu ```shell sudo apt-get autoremove rocm-dkms
Installing development packages for cross compilation
It is often useful to develop and test on different systems. In this scenario, you may prefer to avoid installing the ROCm Kernel to your development system.
In this case, install the development subset of packages:
sudo apt-get update sudo apt-get install rocm-dev