Create a GPU workload cluster
Accessing GPU Shapes
Some shapes are limited to specific regions and specific Availability Domains (AD). In order to make sure the workload cluster comes up check the region and AD for shape availability.
Check shape availability
Make sure the OCI CLI is installed. Then set the AD information if using muti-AD regions.
NOTE: Use the OCI Regions and Availability Domains page to figure out which regions have multiple ADs.
oci iam availability-domain list --compartment-id=<your compartment> --region=<region>
Using the AD name
from the output start searching for GPU shape availability.
oci compute shape list --compartment-id=<your compartment> --profile=DEFAULT --region=us-ashburn-1 --availability-domain=<your AD ID> | grep GPU
"shape-name": "BM.GPU3.8"
"shape-name": "BM.GPU4.8"
"shape-name": "VM.GPU3.1"
"shape": "VM.GPU2.1"
NOTE: If the output is empty then the compartment for that region/AD doesn't have GPU shapes. If you are unable to locate any shapes you may need to submit a service limit increase request
Create a new GPU workload cluster an Ubuntu custom image
NOTE: Nvidia GPU drivers aren't supported for Oracle Linux at this time. Ubuntu is currently the only supported OS.
When launching a multi-AD region shapes are likely be limited to a specific AD (example: US-ASHBURN-AD-2
).
To make sure the cluster comes up without issue specifically target just that AD for the GPU worker nodes.
To do that modify the released version of the cluster-template-failure-domain-spread.yaml
template.
Download the latest cluster-template-failure-domain-spread.yaml
file and save it as
cluster-template-gpu.yaml
.
Make sure the modified template has only the MachineDeployment
section(s) where there is GPU
availability and remove all the others. See the full example file
that targets only AD 2 (OCI calls them Availability Domains while Cluster-API calls them Failure Domains).
Virtual instances
The following command will create a workload cluster comprising a single control plane node and single GPU worker node using the default values as specified in the preceding Workload Cluster Parameters table:
NOTE: The
OCI_NODE_MACHINE_TYPE_OCPUS
must match the OPCU count of the GPU shape. See the Compute Shapes page to get the OCPU count for the specific shape.
OCI_COMPARTMENT_ID=<compartment-id> \
OCI_IMAGE_ID=<ubuntu-custom-image-id> \
OCI_SSH_KEY=<ssh-key> \
NODE_MACHINE_COUNT=1 \
OCI_NODE_MACHINE_TYPE=VM.GPU3.1 \
OCI_NODE_MACHINE_TYPE_OCPUS=6 \
OCI_CONTROL_PLANE_MACHINE_TYPE_OCPUS=1 \
OCI_CONTROL_PLANE_MACHINE_TYPE=VM.Standard3.Flex \
CONTROL_PLANE_MACHINE_COUNT=1 \
OCI_SHAPE_MEMORY_IN_GBS= \
KUBERNETES_VERSION=v1.24.4 \
clusterctl generate cluster <cluster-name> \
--target-namespace default \
--from cluster-template-gpu.yaml | kubectl apply -f -
Bare metal instances
The following command uses the OCI_CONTROL_PLANE_MACHINE_TYPE
and OCI_NODE_MACHINE_TYPE
parameters to specify bare metal shapes instead of using CAPOCI's default virtual
instance shape. The OCI_CONTROL_PLANE_PV_TRANSIT_ENCRYPTION
and OCI_NODE_PV_TRANSIT_ENCRYPTION
parameters disable encryption of data in flight between the bare metal instance and the block storage resources.
NOTE: The
OCI_NODE_MACHINE_TYPE_OCPUS
must match the OPCU count of the GPU shape. See the Compute Shapes page to get the OCPU count for the specific shape.
OCI_COMPARTMENT_ID=<compartment-id> \
OCI_IMAGE_ID=<ubuntu-custom-image-id> \
OCI_SSH_KEY=<ssh-key> \
OCI_NODE_MACHINE_TYPE=BM.GPU3.8 \
OCI_NODE_MACHINE_TYPE_OCPUS=52 \
OCI_NODE_PV_TRANSIT_ENCRYPTION=false \
OCI_CONTROL_PLANE_MACHINE_TYPE=VM.Standard3.Flex \
CONTROL_PLANE_MACHINE_COUNT=1 \
OCI_SHAPE_MEMORY_IN_GBS= \
KUBERNETES_VERSION=v1.24.4 \
clusterctl generate cluster <cluster-name> \
--target-namespace default \
--from cluster-template-gpu.yaml | kubectl apply -f -
Access workload cluster Kubeconfig
Execute the following command to list all the workload clusters present:
kubectl get clusters -A
Execute the following command to access the kubeconfig of a workload cluster:
clusterctl get kubeconfig <cluster-name> -n default > <cluster-name>.kubeconfig
Install a CNI Provider, OCI Cloud Controller Manager and CSI in a self-provisioned cluster
To provision the CNI and Cloud Controller Manager follow the Install a CNI Provider and the Install OCI Cloud Controller Manager sections.
Install Nvidia GPU Operator
Setup the worker instances to use the GPUs install the Nvidia GPU Operator.
For the most up-to-date install instructions see the official install instructions. They layout how to install the Helm tool and how to setup the Nvidia helm repo.
With Helm setup you can now install the GPU-Operator
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
The pods will take a while to come up but you can check the status:
kubectl --<cluster-name>.kubeconf get pods -n gpu-operator
Test GPU on worker node
Once all of the GPU-Operator pods are running
or completed
deploy the test pod:
cat <<EOF | kubectl --kubeconfig=<cluster-name>.kubeconf apply -f -
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
EOF
Then check the output logs of the cuda-vector-add
test pod:
kubectl --kubeconfig=<cluster-name>.kubeconf logs cuda-vector-add -n default
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Example yaml file
This is an example file using a modified version of cluster-template-failure-domain-spread.yaml
to target AD 2 (example: US-ASHBURN-AD-2
).
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
labels:
cluster.x-k8s.io/cluster-name: "${CLUSTER_NAME}"
name: "${CLUSTER_NAME}"
namespace: "${NAMESPACE}"
spec:
clusterNetwork:
pods:
cidrBlocks:
- ${POD_CIDR:="192.168.0.0/16"}
serviceDomain: ${SERVICE_DOMAIN:="cluster.local"}
services:
cidrBlocks:
- ${SERVICE_CIDR:="10.128.0.0/12"}
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: OCICluster
name: "${CLUSTER_NAME}"
namespace: "${NAMESPACE}"
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: "${CLUSTER_NAME}-control-plane"
namespace: "${NAMESPACE}"
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: OCICluster
metadata:
labels:
cluster.x-k8s.io/cluster-name: "${CLUSTER_NAME}"
name: "${CLUSTER_NAME}"
spec:
compartmentId: "${OCI_COMPARTMENT_ID}"
---
kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
name: "${CLUSTER_NAME}-control-plane"
namespace: "${NAMESPACE}"
spec:
version: "${KUBERNETES_VERSION}"
replicas: ${CONTROL_PLANE_MACHINE_COUNT}
machineTemplate:
infrastructureRef:
kind: OCIMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
name: "${CLUSTER_NAME}-control-plane"
namespace: "${NAMESPACE}"
kubeadmConfigSpec:
clusterConfiguration:
kubernetesVersion: ${KUBERNETES_VERSION}
apiServer:
certSANs: [localhost, 127.0.0.1]
dns: {}
etcd: {}
networking: {}
scheduler: {}
initConfiguration:
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
provider-id: oci://{{ ds["id"] }}
joinConfiguration:
discovery: {}
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
provider-id: oci://{{ ds["id"] }}
---
kind: OCIMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
metadata:
name: "${CLUSTER_NAME}-control-plane"
spec:
template:
spec:
imageId: "${OCI_IMAGE_ID}"
compartmentId: "${OCI_COMPARTMENT_ID}"
shape: "${OCI_CONTROL_PLANE_MACHINE_TYPE=VM.Standard.E4.Flex}"
shapeConfig:
ocpus: "${OCI_CONTROL_PLANE_MACHINE_TYPE_OCPUS=1}"
metadata:
ssh_authorized_keys: "${OCI_SSH_KEY}"
isPvEncryptionInTransitEnabled: ${OCI_CONTROL_PLANE_PV_TRANSIT_ENCRYPTION=true}
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: OCIMachineTemplate
metadata:
name: "${CLUSTER_NAME}-md"
spec:
template:
spec:
imageId: "${OCI_IMAGE_ID}"
compartmentId: "${OCI_COMPARTMENT_ID}"
shape: "${OCI_NODE_MACHINE_TYPE=VM.Standard.E4.Flex}"
shapeConfig:
ocpus: "${OCI_NODE_MACHINE_TYPE_OCPUS=1}"
metadata:
ssh_authorized_keys: "${OCI_SSH_KEY}"
isPvEncryptionInTransitEnabled: ${OCI_NODE_PV_TRANSIT_ENCRYPTION=true}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
kind: KubeadmConfigTemplate
metadata:
name: "${CLUSTER_NAME}-md"
spec:
template:
spec:
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
provider-id: oci://{{ ds["id"] }}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: "${CLUSTER_NAME}-fd-2-md-0"
spec:
clusterName: "${CLUSTER_NAME}"
replicas: ${NODE_MACHINE_COUNT}
selector:
matchLabels:
template:
spec:
clusterName: "${CLUSTER_NAME}"
version: "${KUBERNETES_VERSION}"
bootstrap:
configRef:
name: "${CLUSTER_NAME}-md"
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
infrastructureRef:
name: "${CLUSTER_NAME}-md"
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: OCIMachineTemplate
# Cluster-API calls them Failure Domains while OCI calls them Availability Domains
# In the example this would be targeting US-ASHBURN-AD-2
failureDomain: "2"