Alternate Title: Wading Through Massive Amounts of Incorrect and Obsolete Information to Get Docker and Docker-Compose Working With NVIDIA
Summary
If you've stumbled across this article, it's likely because you've been trying to get Docker, docker-compose, and NVIDIA all working together nicely. It's one of those rabbit holes where you think you're this close to finding the correct answer, realize those answers were for the previous version of Docker or docker-compose, modify your search, and then realize you've been reading replies on a Github issue that is 2 years old. Trust me, I've been there and done that already. This article will hopefully provide the answer that you're looking for.
That being said, I'll preface this with all the versions I'm using:
Arch Linux: 12-14-2020
Docker version 19.03.14, build 5eb3275d40
docker-compose version 1.27.4, build unknown
NVIDIA-SMI 455.45.01, Driver Version: 455.45.01, CUDA Version: 11.1
nvidia-container-runtime version 1.0.0-rc9Please note these instructions are specific to Arch Linux, but similar package names should apply for most Linux distributions.
Installing Related Packages
Paru
You'll need paru since we're using the AUR package repository. Quick instructions:
pacman -S --needed git base-devel
git clone https://aur.archlinux.org/paru.git
cd paru
makepkg -siDocker
paru -S docker
sudo systemctl enable docker
sudo systemctl start dockerDocker-Compose
paru -S docker-composeNVIDIA Drivers and Components
paru -S nvidia nvidia-utils nvidia-container-toolkit nvidia-container-runtimeConfiguring the System
Loading Kernel Modules
This just ensures the correct modules are loaded between Docker and NVIDIA requirements.
sudo vi /etc/modules-load.d/custom.conf
# Add the following to the file
nvidia
nvidia-modeset
nvidia-drm
nvidia-uvm
aufs
overlay
macvlanConfiguring NVIDIA Runtime for Docker
This adds the option to load NVIDIA drivers in a docker-compose.yml file.
# This should return the file if it exists.
which nvidia-container-runtime
# Edit the config file for Docker.
sudo vi /etc/docker/daemon.json
# Add the following to the file.
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}...and finally:
sudo rebootConclusion
Testing NVIDIA Drivers on the Host OS
First, you'll want to verify that your Linux distribution can see the video card as expected. You'll be running this from the host OS, not the Docker container. The nvidia-smi command should be able to display information on your card. You do not need to run this as root.
~ ❯ nvidia-smi
5m 20s nlabadie@nas
Mon Dec 14 23:34:46 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 37% 50C P0 N/A / N/A | 227MiB / 1998MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found
+-----------------------------------------------------------------------------+Don't worry about the lack of processes. You just want to make sure the NVIDIA card is recognized, e.g. Quadro P620.
Testing NVIDIA in Docker
Based on the above output, we know that our card is working and detected. Next we'll try the same for Docker via a docker-compose.yml file.
cd ~
vi docker-compose.yml
# Add the following to the file.
version: '2.3'
services:
nvidia-smi-test:
runtime: nvidia
image: nvidia/cuda:9.2-runtime-centos7Notice that runtime: nvidia under the services: section? That's what you'll need to add for it to be running with the NVIDIA runtime. Next we'll run docker-compose up to bring up that container.
~ ❯ docker-compose up -d
Creating network "nlabadie_default" with the default driver
Pulling nvidia-smi-test (nvidia/cuda:9.2-runtime-centos7)...
9.2-runtime-centos7: Pulling from nvidia/cuda
75f829a71a1c: Pull complete
3bfd9bee7f23: Pull complete
e264677109d2: Pull complete
04be0f279c7b: Pull complete
c537f616fcbb: Pull complete
0e51dcda29db: Pull complete
Digest: sha256:ee19c7ccab11cc1df37b45417aae9077ac99a8fcee017012218dd57d6c27fe0d
Status: Downloaded newer image for nvidia/cuda:9.2-runtime-centos7
Creating nlabadie_nvidia-smi-test_1 ... doneFinally, we'll make sure that nvidia-smi also works in the Docker container.
~ ❯ docker-compose run nvidia-smi-test
Creating nlabadie_nvidia-smi-test_run ... done
# Notice the next prompt is different?
# That's because you're like IN THE CONTAINER, man.
[root@16da75e7c31d /]# nvidia-smi
Tue Dec 15 04:46:38 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 37% 50C P0 N/A / N/A | 227MiB / 1998MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+Finished
The output shows that NVIDIA drivers are still accessible even within the Docker container. That's it! You can now run NVIDIA-enabled Docker containers via docker-compose and whatever fancy docker-compose.yml files that you have sitting around.