GPUs for AI
Oracle Backend for Microservices and AI provides an option during installation to provision a set of Kubernetes nodes with NVIDIA A10 GPUs that are suitable for running AI workloads. If you choose that option during installation, you may also specify how many nodes are provisioned. The GPU nodes will be in a separate Node Pool to the normal CPU nodes, which allows you to scale it independently of the CPU nodes. They are also labeled so that you can target appropriate workloads to them using node selectors and/or affinity rules.
To view a list of nodes in your cluster with a GPU, you can use this command:
$ kubectl get nodes -l 'node.kubernetes.io/instance-type=VM.GPU.A10.1'
NAME STATUS ROLES AGE VERSION
10.22.33.45 Ready node 2m44s v1.30.1
One very common use for GPU nodes is to run a self-hosted Large Language Model (LLM) such as llama3
for inferencing or nomic-embed-text
for embedding.
Companies often want to self-host an LLM to avoid sending private or sensitive data outside of their organization to a third-party provider, or to have more control over the costs of running the LLM and associated infrastructure.
One excellent way to self-host LLMs is to use Ollama.
To install Ollama on your GPU nodes, you can use the following commands:
-
Add the Ollama helm repository:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
-
Update your helm repositories:
helm repo update
-
Create a
values.yaml
file to configure how Ollama should be installed, including which node(s) to run it on. Here is an example that will run Ollama on a GPU node and will pull thellama3
model.ollama: gpu: enabled: true type: 'nvidia' number: 1 models: - llama3 nodeSelector: node.kubernetes.io/instance-type: VM.GPU.A10.1
For more information on how to configure Ollama using the helm chart, refer to its documentation.
-
Create a namespace to deploy Ollama in:
kubectl create ns ollama
-
Deploy Ollama using the helm chart:
helm install ollama ollama-helm/ollama --namespace ollama --values ollama-values.yaml
You can interact with Ollama using the provided command line tool, called ollama
. For example, to list the available models, use the ollama ls
command:
kubectl -n ollama exec svc/ollama -- ollama ls
NAME ID SIZE MODIFIED
llama3:latest 365c0bd3c000 4.7 GB 2 minutes ago
To ask the LLM a question, you can use the ollama run
command:
$ kubectl -n ollama exec svc/ollama -- ollama run llama3 "what is spring boot?"
Spring Boot is an open-source Java-based framework that simplifies the development of web applications and microservices. It's a subset of the larger Spring ecosystem, which provides a comprehensive platform for building enterprise-level applications.
...
Our self-paced hands-on example CloudBank AI includes an example of how to build a simple chatbot using Spring AI and Ollama.