The numerous benefits offered by graphics processing units (GPUs) have led to enterprises deploying machine learning and inference applications on GPU-enabled Kubernetes clusters. However, scheduling GPU workloads on Kubernetes clusters to optimize usage is a challenging task that requires a large amount of effort and a high level of expertise.

Alibaba Cloud allows you to create Kubernetes clusters consisting of nodes equipped with NVIDIA GPUs and provides useful tools to help you optimize GPU usage on cluster nodes. You can deploy multiple workloads on a single GPU device to optimize GPU usage for Kubernetes clusters and set up auto scaling to automatically adjust the GPU capacity of a cluster according to workload changes. What’s more, you can continuously monitor the amount of GPU memory used by different workloads in real time.