Install CUDA and cuDNN to Ubuntu Server

We have installed and set up JupyterHub in the previous post. To make use of the GPU card in the server, we are going to also install and configure CUDA and cuDNN from NVIDIA.

Setup CUDA and cuDNN

According to NVIDIA, CUDA is not just an API or a programming language:

CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer still programs in the familiar C, C++, Fortran, or an ever expanding list of supported languages, and incorporates extensions of these languages in the form of a few basic keywords.

To let PyTorch successfully access the power of GPU, we need to first install NVIDIA graphics drivers and CUDA driver and toolkit, which the cuDNN library can communicate with the GPU and provide neural network related primitives for PyTorch. The relationship is shown in below figure:

Relationship between GPUs, CUDA driver and toolkit, and cuDNN that is one of the applications.
Relationship between GPUs, CUDA driver and toolkit, and cuDNN that is one of the applications. source: NVIDIA

Step zero, clean up environment

Here we are going to remove any previously installed NVIDIA drivers and nouveau, if presented. That should increase the chance of successful installation.

  1. Clean previous installations

    $ sudo apt-get purge nvidia*
    $ sudo apt-get autoremove
    $ sudo nvidia-uninstall
    

    Check if there is still any other NVIDIA packages installed

    $ sudo dpkg -l | grep -i nvidia
    $ sudo dpkg --remove nvidia-{name}
    
  2. Clean nouveau

    Check if nouveau is running $ lsmod | grep nouveau. If there is any nouveau running, turn it off by adding a file blacklist-nouveau.conf under /etc/modprobe.d, and paste below content to the file.

    blacklist nouveau
    options nouveau modeset=0
    

    Generate a new kernel and reboot

    $ sudo update-initramfs -u
    $ sudo reboot
    

    Check whether nouveau has been turned off: lsmod | grep nouveau

Install drivers

  1. Install NVIDIA graphics driver. There are 2 ways to install the drive, 1 is to download it from NVIDIA and run the setup file; another one is to install from Ubuntu repository.

    • Option 1: Download driver from NVIDIA website
      1. Check whether your NVIDIA card has been detected: lspci | grep -i nvidia
      2. Go to Download Drivers page and look for the driver that suits your environment. I just provide my case below as a sample:
        $ wget http://us.download.nvidia.com/tesla/450.51.06/NVIDIA-Linux-x86_64-450.51.06.run
        $ sudo sh NVIDIA-Linux-x86_64-450.51.06.run
        
    • Option 2: Install from Ubuntu repository
      $ sudo apt-get update
      $ sudo apt upgrade
      $ apt list | grep "^nvidia-driver"
      $ sudo apt install nvidia-driver-450
      
  2. Install CUDA Toolkit

    1. Since we have some programs written in C and needed to be developed in this environment in the near future, we need to upgrade GCC as well. According to the documentation, GCC should be updated to 9.x in Ubuntu 18.04 environment (Ref: install gcc-9 on Ubuntu 18.04?).

      $ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
      $ sudo apt update
      $ sudo apt install gcc-9 g++-9
      
    2. Download and install the NVIDIA CUDA Toolkit here, e.g.

      $  wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run
      $ sudo sh cuda_11.0.3_450.51.06_linux.run
      

      The installer may prompt you that you have installed the driver through Linux package manager. It is fine if that’s what you just finished in the previous step. So choose Continue at the screen.

      Installer warns about driver installed through package manager, choose Continue here.
      Choose Continue here
      Deselect Driver as we have installed it already.
      Deselect Driver option
      No need to install driver again

      After finishing the installation, as instructed, please ensure PATH and LD_LIBRARY_PATH has been set properly:

      • PATH includes /usr/local/cuda-11.0/bin
      • LD_LIBRARY_PATH includes /usr/local/cuda-11.0/lib64, or, add /usr/local/cuda-11.0/lib64 to /etc/ld.so.conf and run ldconfig as root
        Installation summary reminding to check environment variables
        Remember to check environment variables
    3. Check whether the toolkit has been successfully installed: nvcc -V. Sample output:

      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2020 NVIDIA Corporation
      Built on Wed_Jul_22_19:09:09_PDT_2020
      Cuda compilation tools, release 11.0, V11.0.221
      Build cuda_11.0_bu.TC445_37.28845127_0
      
  3. Download and install cuDNN for Linux

    To download cuDNN, you need to first register as an NVIDIA developer, and then you can download the tar file (cuDNN Library for Linux (x86_64)) or DEB files here.

    • Install from a tar file
      1. Expand the cuDNN pacakge to cuda directory: $ tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgz. (Replace x.x and x.x.x with suitable numbers.)
      2. Copy following files to the installation directory of CUDA toolkit. In my case, the CUDA installation directory is /usr/local/cuda-11.0/
        $ sudo cp cuda/include/cudnn*.h /usr/local/cuda-11.0/include
        $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda-11.0/lib64
        $ sudo chmod a+r /usr/local/cuda-11.0/include/cudnn*.h /usr/local/cuda-11.0/lib64/libcudnn*
        
    • Install from DEB files Install the downloaded files in below order:
      $ sudo dpkg -i libcudnn8_x.x.x-1+cudax.x_amd64.deb
      $ sudo dpkg -i libcudnn8-dev_8.x.x.x-1+cudax.x_amd64.deb
      $ sudo dpkg -i libcudnn8-samples_8.x.x.x-1+cudax.x_amd64.deb
      
    • Check if cuDNN has been installed: cat /usr/local/cuda-11.0/include/cudnn_version.h | grep "CUDNN_MAJOR" -A 2

Verify CUDA is available in Jupyterlab

Finally, if everything went well, we can have a check if PyTorch is able to access the GPU.

import torch

if torch.cuda.is_available():
    print("cuda available!")
    torch.device("cuda:0")
else:
    print("cuda not available.")
    torch.device("cpu")

Output:

# cuda available!

Hurray! The installation is not that difficult, but somehow it is still error-prone, especially during the CUDA stage while checking for the graphics driver. It may be due to the order of steps or the unclean environment. I also have difficulty to install CUDA at the first time because of the previous installed driver. So, take a look and experiment different settings. See you until next time. Ciao.

References

  1. NVIDIA cuDNN documentation
  2. How to install Nvidia Driver Cuda 10.0 and CuDNN on Ubuntu 16.04
Avatar
Leo Mak
Make the world a better place, piece by piece.
comments powered by Disqus