GPUs

Configure GPU for Kommander cluster

This section describes Nvidia GPU support on Kommander. DKP supported Nvidia driver versions is 470.x. Other GPUs, such as those made by AMD, are not currently supported. This document assumes familiarity with Kubernetes GPU support. More information about GPUs in AWS environment can be found in the Advanced AWS section.

Kommander GPU Overview

GPU support on Kommander uses the Nvidia container runtime. With the Nvidia GPU turned on, Kommander configures the container runtime to run GPU containers, and installs all the necessary items to power up the Nvidia GPU devices.

The following components provide Nvidia GPU support on Kommander:

  • libnvidia-container and nvidia-container-runtime: GPU Support in Kommander depends on the containerd runtime. libnvidia-container and nvidia-container-runtime fit between containerd and runc, simplifying the container runtime integration with the GPU.
  • Nvidia Device Plugin: Kommander makes use of Nvidia GPUs using this Kubernetes device plugin. It allows GPU enabled containers to run on Kubernetes, tracking the number of available GPUs on each node and their health.
  • Nvidia Data Center GPU Manager: Contains a Prometheus exporter that provides Nvidia GPU metrics.

Kommander runs these components as daemonsets, making them easier to manage and upgrade across all GPU nodes.

The following procedures are described: