GPUs are expensive and often deployed in a large scale to support workloads. Therefore, the improvement of overall resource usage in a large GPU cluster to realize the potential of each GPU is the key challenge in GPU scheduling. Cluster administrators that want to improve the GPU utilization of clusters need a more flexible scheduling strategy. Application developers need to run model training tasks on multiple GPUs at the same time.

cGPU is Alibaba’s GPU-shared container solution. It aims to improve GPU utilization by offering multiple isolated environments in a single GPU for different AI inference tasks. These tasks are conventionally deployed in multiple GPUs, thereby causing a waste of a GPU resource. This document introduces the process about how to deploy the cGPU component in the ACK environment.