Add NVIDIA GPU to LXC container

I followed this guide to get NVIDIA drivers working on my Proxmox machine. However when I tried to get them working in my container I couldn’t see how to get nvidia-smi installed. Thankfully this blog had what I needed.

The step I missed was copying & installing the NVIDIA drivers into the container with this flag:

--no-kernel-module

That got me one step closer but I could not spin up open-webui in a container. I kept getting the error

Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

The fix was to install the NVDIA Container Toolkit:

Configure the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the packages list from the repository:

sudo apt-get update

Install the NVIDIA Container Toolkit packages:

sudo apt-get install -y nvidia-container-toolkit

An additional hurtle I encountered was this error:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown

I found here that the fix is to change a line in /etc/nvidia-container-runtime/config.toml. Uncomment and change no-cgroups to true.

no-cgroups = true

Success.

Not working after reboot

I had a working config until I rebooted the host. It turns out that two services need to run on the host:

nvidia-persistenced
nvidia-smi

Configured cron tab to run these on reboot:

/etc/cron.d/nvidia:
@reboot root /usr/bin/nvidia-smi
@reboot root /usr/bin/nvidia-persistenced