cGPU (GPU Sharing) for AI Inference

GPUs are expensive and often deployed in a large scale to support workloads. Therefore, the improvement of overall resource usage in a large GPU cluster to realize the potential of each GPU is the key challenge in GPU scheduling. Cluster administrators that want to improve the GPU utilization of clusters need a more flexible scheduling strategy. Application developers need to run model training tasks on multiple GPUs at the same time.

cGPU is Alibaba’s GPU-shared container solution. It aims to improve GPU utilization by offering multiple isolated environments in a single GPU for different AI inference tasks. These tasks are conventionally deployed in multiple GPUs, thereby causing a waste of a GPU resource. This document introduces the process about how to deploy the cGPU component in the ACK environment.

cGPU (GPU Sharing) for AI Inference

Scenario

Solution Detail

Your Feedback

cGPU (GPU Sharing) for AI Inference

Scenario

Solution Detail

Your Feedback

Products Used

Container Service for Kubernetes

Elastic Compute Service