A simple ChatBot
In this module, you will learn how to build a simple chatbot using Spring AI and Ollama.
Note: The example below can also be run on a regular CPU if you are using an environment where a GPU is not available, it will just be slower.
Oracle Backend for Microservices and AI provides an option during installation to provision a set of Kubernetes nodes with NVIDIA A10 GPUs that are suitable for running AI workloads. If you choose that option during installation, you may also specify how many nodes are provisioned. The GPU nodes will be in a separate Node Pool to the normal CPU nodes, which allows you to scale it independently of the CPU nodes. They are also labeled so that you can target appropriate workloads to them using node selectors and/or affinity rules.
To view a list of nodes in your cluster with a GPU, you can use this command:
$ kubectl get nodes -l 'node.kubernetes.io/instance-type=VM.GPU.A10.1'
NAME STATUS ROLES AGE VERSION
10.22.33.45 Ready node 2m44s v1.30.1
Running a Large Language Model on your GPU nodes
One very common use for GPU nodes is to run a self-hosted Large Language Model (LLM) such as llama3
for inferencing or nomic-embed-text
for embedding.
Companies often want to self-host an LLM to avoid sending private or sensitive data outside their organization to a third-party provider, or to have more control over the costs of running the LLM and associated infrastructure.
One excellent way to self-host LLMs is to use Ollama.
To install Ollama on your GPU nodes, you can use the following commands:
-
Add the Ollama helm repository:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
-
Update your helm repositories:
helm repo update
-
Create a
ollama-values.yaml
file to configure how Ollama should be installed, including which node(s) to run it on. Here is an example that will run Ollama on a GPU node and will pull thellama3
model.ollama: gpu: enabled: true type: 'nvidia' number: 1 models: - llama3 nodeSelector: node.kubernetes.io/instance-type: VM.GPU.A10.1
For more information on how to configure Ollama using the helm chart, refer to its documentation.
Note: If you are using an environment where no GPU is available, you can run this on a CPU by changing the
values.yaml
file to the following:ollama: gpu: enabled: false models: - llama3
-
Create a namespace to deploy Ollama in:
kubectl create ns ollama
-
Deploy Ollama using the helm chart:
helm install ollama ollama-helm/ollama --namespace ollama --values ollama-values.yaml
-
You can verify the deployment with the following command:
kubectl get pods -n ollama -w
When the pod has the status
Running
the deployment is completed.NAME READY STATUS RESTARTS AGE ollama-659c88c6b8-kmdb9 0/1 Running 0 84s
Test your Ollama deployment
You can interact with Ollama using the provided command line tool, called ollama
.
For example, to list the available models, use the ollama ls
command:
kubectl -n ollama exec svc/ollama -- ollama ls
NAME ID SIZE MODIFIED
llama3:latest 365c0bd3c000 4.7 GB 2 minutes ago
To ask the LLM a question, you can use the ollama run
command:
$ kubectl -n ollama exec svc/ollama -- ollama run llama3 "what is spring boot?"
Spring Boot is an open-source Java-based framework that simplifies the development
of web applications and microservices. It's a subset of the larger Spring ecosystem,
which provides a comprehensive platform for building enterprise-level applications.
...
Using LLMs hosted by Ollama in your Spring application
A Kubernetes service named ‘ollama’ with port 11434 will be created so that your applications can talk to models hosted by Ollama.
Now, you will create a simple Spring AI application that uses Llama3 to create a simple chatbot.
Note: The sample code used in this module is available here.
-
Create a new Spring Boot project
In a new directory called
chatbot
, create a file calledpom.xml
with the following content:<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.3.4</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>com.example</groupId> <artifactId>chatbot</artifactId> <version>0.0.1-SNAPSHOT</version> <name>chatbot</name> <description>A Simple ChatBot Application</description> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <version>3.3.4</version> </dependency> </dependencies> <repositories> <repository> <id>spring-milestones</id> <name>Spring Milestones</name> <url>https://repo.spring.io/milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>spring-snapshots</id> <name>Spring Snapshots</name> <url>https://repo.spring.io/snapshot</url> <releases> <enabled>false</enabled> </releases> </repository> </repositories> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0-SNAPSHOT</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
Note that this is very similar to the Maven POM files you have created in previous modules. Spring AI is currently approaching its 1.0.0 release, so you need to enable access to the milestone and snapshot repositories to use it. You will see the
repositories
section in the POM file above does that.The
spring-ai-bom
was added in thedependencyManagement
section to make it easy to select the correct versions of various dependencies.Finally, a dependency for
spring-ai-ollama-spring-boot-starter
was added. This provides access to the Spring AI Ollama functionality and autoconfiguration. -
Configure access to your Ollama deployment
To configure access the Ollama, create a Spring application configuration file called
src/main/resources/application.yaml
with the following content:spring: application: name: chatbot ai: ollama: base-url: http://ollama.ollama.svc.cluster.local:11434 chat: enabled: true options: model: llama3
Note that you are providing the URL to access the Ollama instance that you just deployed in your cluster. You also need to tell Spring AI to enable chat and which model to use.
-
Create the main Spring application class
Create a file called
src/main/java/com/example/chatbot/ChatbotApplication.java
with the following content:package com.example.chatbot; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class ChatbotApplication { public static void main(String[] args) { SpringApplication.run(ChatbotApplication.class, args); } }
This is a very standard main class which you are likely familiar with from previous modules.
-
Create the Chat Controller
Create a file called
ChatController.java
in the directorysrc/main/java/com/example/chatbot/controller
with the following content:package com.example.chatbot.controller; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController; import org.springframework.ai.chat.model.ChatModel; import org.springframework.ai.chat.model.ChatResponse; import org.springframework.ai.chat.prompt.Prompt; import org.springframework.ai.ollama.api.OllamaModel; import org.springframework.ai.ollama.api.OllamaOptions; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestBody; @RestController @RequestMapping("/chat") public class ChatController { final ChatModel chatModel; public ChatController(ChatModel chatModel) { this.chatModel = chatModel; } @PostMapping public String chat(@RequestBody String question) { ChatResponse response = chatModel.call( new Prompt(question, OllamaOptions.builder() .withModel(OllamaModel.LLAMA3) .withTemperature(0.4f) .build() )); return response.getResult().getOutput().getContent(); } }
In this class, you have a
RestController
, which you are likely familiar with from previous modules, and aPostMapping
to create an HTTP POST endpoint.The
chat()
method reads a question from the HTTP body, makes a call to Ollama using the Llama3 model to get a response and then returns the text part of the response to the user in the HTTP body.Notice that a
ChatModel
is injected into this controller to provide access to Ollama’s chat interface.You may also notice that we set the temperature to 0.4. Temperature is a parameter of LLMs that controls how creative the answer will be. Lower numbers (approaching zero) will tend to produce less creative answers, whereas higher numbers will tend to produce more creative answers.
LLMs are not deterministic, so you will tend to get a different answer each time you ask them the same question, unless the temperature is very low.
Deploy your application
-
Build a JAR file for deployment
Run the following command to build the JAR file (it will also remove any earlier builds).
$ mvn clean package
The service is now ready to deploy to the backend.
-
Prepare the backend for deployment
The Oracle Backend for Microservices and AI admin service is not exposed outside the Kubernetes cluster by default. Oracle recommends using a kubectl port forwarding tunnel to establish a secure connection to the admin service.
Start a tunnel using this command in a new terminal window:
$ kubectl -n obaas-admin port-forward svc/obaas-admin 8080
Get the password for the
obaas-admin
user. Theobaas-admin
user is the equivalent of the admin or root user in the Oracle Backend for Microservices and AI backend.$ kubectl get secret -n azn-server oractl-passwords -o jsonpath='{.data.admin}' | base64 -d
Start the Oracle Backend for Microservices and AI CLI (oractl) in a new terminal window using this command:
$ oractl _ _ __ _ ___ / \ |_) _. _. (_ / | | \_/ |_) (_| (_| __) \_ |_ _|_ ======================================================================================== Application Name: Oracle Backend Platform :: Command Line Interface Application Version: (1.3.0) :: Spring Boot (v3.3.3) :: Ask for help: - Slack: https://oracledevs.slack.com/archives/C03ALDSV272 - email: obaas_ww@oracle.com oractl:>
Connect to the Oracle Backend for Microservices and AI admin service using the
connect
command. Enterobaas-admin
and the username and use the password you collected earlier.oractl> connect username: obaas-admin password: ************** Credentials successfully authenticated! obaas-admin -> welcome to OBaaS CLI. oractl:>
-
Deploy the chatbot
You will now deploy your chatbot to the Oracle Backend for Microservices and AI using the CLI. You will deploy into the
application
namespace, and the service name will bechatbot
. Run this command to deploy your service, make sure you provide the correct path to your JAR file. Note that this command may take 1-3 minutes to complete:oractl:> deploy --app-name application --service-name chatbot --artifact-path /path/to/chatbot-0.0.1-SNAPSHOT.jar --image-version 0.0.1 --java-version ghcr.io/oracle/graalvm-native-image-obaas:21 uploading: /Users/atael/tmp/cloudbank/chatbot/target/chatbot-0.0.1-SNAPSHOT.jar building and pushing image... creating deployment and service... obaas-cli [deploy]: Application was successfully deployed oractl:>
What happens when you use the Oracle Backend for Microservices and AI CLI (oractl) deploy command? When you run the deploy command, the OOracle Backend for Microservices and AI CLI does several things for you:
- Uploads the JAR file to server side
- Builds a container image and push it to the OCI Registry
- Inspects the JAR file and looks for bind resources (JMS)
- Create the microservices deployment descriptor (k8s) with the resources supplied
- Applies the k8s deployment and create k8s object service to microservice
Use your application
The simplest way to verify the application is to use a kubectl tunnel to access it.
-
Create a tunnel to access the application
Start a tunnel using this command:
kubectl -n application port-forward svc/chatbot 8080 &
-
Test your application
Make a POST request through the tunnel to ask the chatbot a question:
$ curl -X POST -d 'what is spring boot?' http://localhost:8080/chat A popular question! Spring Boot is an open-source Java-based framework that provides a simple and efficient way to build web applications, RESTful APIs, and microservices. It's built on top of the Spring Framework, but with a more streamlined and opinionated approach. ... ...