I followed this guide to get NVIDIA drivers working on my Proxmox machine. However when I tried to get them working in my container I couldn’t see how to get nvidia-smi installed. Thankfully this blog had what I needed.
The step I missed was copying & installing the NVIDIA drivers into the container with this flag:
--no-kernel-module
That got me one step closer but I could not spin up open-webui in a container. I kept getting the error
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]
The fix was to install the NVDIA Container Toolkit:
Configure the production repository:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Update the packages list from the repository:
sudo apt-get update
Install the NVIDIA Container Toolkit packages:
sudo apt-get install -y nvidia-container-toolkit
An additional hurtle I encountered was this error:
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown
I found here that the fix is to change a line in /etc/nvidia-container-runtime/config.toml
. Uncomment and change no-cgroups to true.
no-cgroups = true
Success.
Not working after reboot
I had a working config until I rebooted the host. It turns out that two services need to run on the host:
nvidia-persistenced
nvidia-smi
Configured cron tab to run these on reboot:
/etc/cron.d/nvidia:
@reboot root /usr/bin/nvidia-smi
@reboot root /usr/bin/nvidia-persistenced