Alternate Title: Wading Through Massive Amounts of Incorrect and Obsolete Information to Get Docker and Docker-Compose Working With NVIDIA
Summary
If you've stumbled across this article, it's likely because you've been trying to get Docker, docker-compose, and NVIDIA all working together nicely. It's one of those rabbit holes where you think you're this close to finding the correct answer, realize those answers were for the previous version of Docker or docker-compose, modify your search, and then realize you've been reading replies on a Github issue that is 2 years old. Trust me, I've been there and done that already. This article will hopefully provide the answer that you're looking for.
That being said, I'll preface this with all the versions I'm using:
Arch Linux: 12-14-2020
Docker version 19.03.14, build 5eb3275d40
docker-compose version 1.27.4, build unknown
NVIDIA-SMI 455.45.01, Driver Version: 455.45.01, CUDA Version: 11.1
nvidia-container-runtime version 1.0.0-rc9
Please note these instructions are specific to Arch Linux, but similar package names should apply for most Linux distributions.
Installing Related Packages
Paru
You'll need paru
since we're using the AUR package repository. Quick instructions:
pacman -S --needed git base-devel
git clone https://aur.archlinux.org/paru.git
cd paru
makepkg -si
Docker
paru -S docker
sudo systemctl enable docker
sudo systemctl start docker
Docker-Compose
paru -S docker-compose
NVIDIA Drivers and Components
paru -S nvidia nvidia-utils nvidia-container-toolkit nvidia-container-runtime
Configuring the System
Loading Kernel Modules
This just ensures the correct modules are loaded between Docker and NVIDIA requirements.
sudo vi /etc/modules-load.d/custom.conf
# Add the following to the file
nvidia
nvidia-modeset
nvidia-drm
nvidia-uvm
aufs
overlay
macvlan
Configuring NVIDIA Runtime for Docker
This adds the option to load NVIDIA drivers in a docker-compose.yml
file.
# This should return the file if it exists.
which nvidia-container-runtime
# Edit the config file for Docker.
sudo vi /etc/docker/daemon.json
# Add the following to the file.
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
...and finally:
sudo reboot
Conclusion
Testing NVIDIA Drivers on the Host OS
First, you'll want to verify that your Linux distribution can see the video card as expected. You'll be running this from the host OS, not the Docker container. The nvidia-smi
command should be able to display information on your card. You do not need to run this as root
.
~ ❯ nvidia-smi
5m 20s nlabadie@nas
Mon Dec 14 23:34:46 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 37% 50C P0 N/A / N/A | 227MiB / 1998MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found
+-----------------------------------------------------------------------------+
Don't worry about the lack of processes. You just want to make sure the NVIDIA card is recognized, e.g. Quadro P620.
Testing NVIDIA in Docker
Based on the above output, we know that our card is working and detected. Next we'll try the same for Docker via a docker-compose.yml
file.
cd ~
vi docker-compose.yml
# Add the following to the file.
version: '2.3'
services:
nvidia-smi-test:
runtime: nvidia
image: nvidia/cuda:9.2-runtime-centos7
Notice that runtime: nvidia
under the services:
section? That's what you'll need to add for it to be running with the NVIDIA runtime. Next we'll run docker-compose up
to bring up that container.
~ ❯ docker-compose up -d
Creating network "nlabadie_default" with the default driver
Pulling nvidia-smi-test (nvidia/cuda:9.2-runtime-centos7)...
9.2-runtime-centos7: Pulling from nvidia/cuda
75f829a71a1c: Pull complete
3bfd9bee7f23: Pull complete
e264677109d2: Pull complete
04be0f279c7b: Pull complete
c537f616fcbb: Pull complete
0e51dcda29db: Pull complete
Digest: sha256:ee19c7ccab11cc1df37b45417aae9077ac99a8fcee017012218dd57d6c27fe0d
Status: Downloaded newer image for nvidia/cuda:9.2-runtime-centos7
Creating nlabadie_nvidia-smi-test_1 ... done
Finally, we'll make sure that nvidia-smi
also works in the Docker container.
~ ❯ docker-compose run nvidia-smi-test
Creating nlabadie_nvidia-smi-test_run ... done
# Notice the next prompt is different?
# That's because you're like IN THE CONTAINER, man.
[root@16da75e7c31d /]# nvidia-smi
Tue Dec 15 04:46:38 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 37% 50C P0 N/A / N/A | 227MiB / 1998MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Finished
The output shows that NVIDIA drivers are still accessible even within the Docker container. That's it! You can now run NVIDIA-enabled Docker containers via docker-compose
and whatever fancy docker-compose.yml
files that you have sitting around.