Arch Linux, Docker, and NVIDIA

Arch Linux, Docker, and NVIDIA

Alternate Title: Wading Through Massive Amounts of Incorrect and Obsolete Information to Get Docker and Docker-Compose Working With NVIDIA

Summary

If you've stumbled across this article, it's likely because you've been trying to get Docker, docker-compose, and NVIDIA all working together nicely. It's one of those rabbit holes where you think you're this close to finding the correct answer, realize those answers were for the previous version of Docker or docker-compose, modify your search, and then realize you've been reading replies on a Github issue that is 2 years old. Trust me, I've been there and done that already. This article will hopefully provide the answer that you're looking for.

That being said, I'll preface this with all the versions I'm using:

Arch Linux: 12-14-2020
Docker version 19.03.14, build 5eb3275d40
docker-compose version 1.27.4, build unknown
NVIDIA-SMI 455.45.01, Driver Version: 455.45.01, CUDA Version: 11.1
nvidia-container-runtime version 1.0.0-rc9

Please note these instructions are specific to Arch Linux, but similar package names should apply for most Linux distributions.

Yay

You'll need yay since we're using the AUR package repository. Details on yaycan be found here. Quick instructions:

pacman -S --needed git base-devel
git clone https://aur.archlinux.org/yay.git
cd yay
makepkg -si

Docker

yay -S docker
sudo systemctl enable docker
sudo systemctl start docker

Docker-Compose

yay -S docker-compose

NVIDIA Drivers and Components

yay -S nvidia nvidia-utils nvidia-container-toolkit nvidia-container-runtime

Configuring the System

Loading Kernel Modules

This just ensures the correct modules are loaded between Docker and NVIDIA requirements.

sudo vi /etc/modules-load.d/custom.conf

# Add the following to the file
nvidia
nvidia-modeset
nvidia-drm
nvidia-uvm
aufs
overlay
macvlan

Configuring NVIDIA Runtime for Docker

This adds the option to load NVIDIA drivers in a docker-compose.yml file.

# This should return the file if it exists.
which nvidia-container-runtime

# Edit the config file for Docker.
sudo vi /etc/docker/daemon.json

# Add the following to the file.
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

...and finally:

sudo reboot

Conclusion

Testing NVIDIA Drivers on the Host OS

First, you'll want to verify that your Linux distribution can see the video card as expected. You'll be running this from the host OS, not the Docker container. The nvidia-smi command should be able to display information on your card. You do not need to run this as root.

~ ❯ nvidia-smi  

5m 20s nlabadie@nas
Mon Dec 14 23:34:46 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P620         Off  | 00000000:01:00.0 Off |                  N/A |
| 37%   50C    P0    N/A /  N/A |    227MiB /  1998MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    No running processes found
+-----------------------------------------------------------------------------+

Don't worry about the lack of processes. You just want to make sure the NVIDIA card is recognized, e.g. Quadro P620.

Testing NVIDIA in Docker

Based on the above output, we know that our card is working and detected. Next we'll try the same for Docker via a docker-compose.yml file.

cd ~
vi docker-compose.yml

# Add the following to the file.
version: '2.3'
services:
  nvidia-smi-test:
    runtime: nvidia
    image: nvidia/cuda:9.2-runtime-centos7

Notice that runtime: nvidia under the services: section? That's what you'll need to add for it to be running with the NVIDIA runtime. Next we'll run docker-compose up to bring up that container.

~ ❯ docker-compose up -d     

Creating network "nlabadie_default" with the default driver
Pulling nvidia-smi-test (nvidia/cuda:9.2-runtime-centos7)...
9.2-runtime-centos7: Pulling from nvidia/cuda
75f829a71a1c: Pull complete
3bfd9bee7f23: Pull complete
e264677109d2: Pull complete
04be0f279c7b: Pull complete
c537f616fcbb: Pull complete
0e51dcda29db: Pull complete
Digest: sha256:ee19c7ccab11cc1df37b45417aae9077ac99a8fcee017012218dd57d6c27fe0d
Status: Downloaded newer image for nvidia/cuda:9.2-runtime-centos7
Creating nlabadie_nvidia-smi-test_1 ... done

Finally, we'll make sure that nvidia-smi also works in the Docker container.

~ ❯ docker-compose run nvidia-smi-test                                         
Creating nlabadie_nvidia-smi-test_run ... done
# Notice the next prompt is different? 
# That's because you're like IN THE CONTAINER, man.

[root@16da75e7c31d /]# nvidia-smi

Tue Dec 15 04:46:38 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P620         Off  | 00000000:01:00.0 Off |                  N/A |
| 37%   50C    P0    N/A /  N/A |    227MiB /  1998MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Finished

The output shows that NVIDIA drivers are still accessible even within the Docker container. That's it! You can now run NVIDIA-enabled Docker containers via docker-compose and whatever fancy docker-compose.yml files that you have sitting around.

Show Comments