Improve availability using Rolling Updates
Self guided student - video introduction
This video is an introduction to the Kubernetes rolling upgrades lab. Depending on your browser settings it may open in this tab / window or open a new one. Once you've watched it please return to this page to continue the labs. [![Kubernetes rolling upgrades lab Introduction Video](https://img.youtube.com/vi/x2hXZrUWM0c/0.jpg)](https://youtu.be/x2hXZrUWM0c "Kubernetes rolling upgrades lab introduction video") ---Introduction
This is one of the core Kubernetes labs
Estimated module duration 20 mins.
Objectives
This module demonstrates how the rolling update capabilities in Kubernetes can be used to update a microservice container or configuration with no service interruption. It also looks at how to undo an update if there is a problem with it.
Prerequisites
You need to complete the Auto Scaling module.
Task 1: Why Rolling updates
One of the problems when deploying an application is how to update it while still delivering service, and perhaps more important (but usually given little consideration) how to revert the changes in the event that the update fails to work in some way.
Changes by the way come in multiple areas, it could be application code changes (Kubernetes sees these as a change in the container image) or it could be a change in the configuration defining a deployment (and it’s replica sets / pods)
The update process for code changes involves using your development tooling to create a new container based on the changed code (You presumably have separate processes for managing your source code versions). As part of your container creation process you must give it a different version number so it’s easy to identify which container comes from which version of your source code, and also to ensure you differentiate between different releases at the image level. You’d then update the service definition to use the new image by editing and applying the yaml file or using kubectl to directly change the container image.
The update process for configuration changes is to modify the yaml file that defines your service, for example defining different volume mounts, and then applying the change, or to issue a kubectl command to directly update the change.
For both approaches Kubernetes will keep track of the changes and will undertake a rolling upgrade strategy.
As a general observation though it may be tempting to just go in and modify the configuration directly with kubectl … this is a bad thing to do, it’s likely to lead to unrecorded changes in your configuration management system so in the event that you had to do a complete restart of the system changes manually done with kubectl are likely to be forgotten. It is strongly recommended that you make changes by modifying your yaml file, and that the yaml file itself has a versioning scheme so you can identify exactly what versions of the service a given yaml file version provides. If you must make changes using kubectl (say you need to make a minor change in a test environment) then as soon as you decide it should be permanent then make the corresponding change in the yaml file and do a rolling upgrade using the yaml file to ensure you are using the correct configuration (after all, you may have made a typo in either the kubectl or yaml file).
Task 2: How to do a rolling upgrade in our setup
So far we’ve been stopping our services (the undeploy.sh script deletes the deployments) and then creating new ones (the deploy.sh script applies the deployment configurations for us) This results in service down time, and we don’t want that. But before we can switch to properly using rolling upgrades there are a few bits of configuration we should do
Task 2a: Defining the rolling upgrade
Kubernetes aims to keep a service running during the rolling upgrade, it does this by starting new pods to run the service, then stopping old ones once the new ones are ready. Through the magic of services and using labels as selectors the Kubernetes run time adds and removed pods from the service. This will work with a deployment whose replica sets only contain a single pod (the new pod will be started before the old one is stopped) but if your service contains multiple pods it will use some configuration rules to try and manage the process in a more balanced manner and sticking reasonably closely to the number of pods you’ve asked for (or the auto scaler has).
We are going to once again edit the storefront-deployment.yaml file to give Kubernetes some rules to follow when doing a rolling upgrade. Importantly however we’re going to edit a Copy of the file so we have a history.
-
In the OCI Cloud Shell navigate to the folder
$HOME/helidon-kubernetes
-
Copy the storefront-deployment yaml file:
<copy>cp storefront-deployment.yaml storefront-deployment-v0.0.1.yaml</copy>
- Edit the new file
storefront-deployment-v0.0.1.yaml
I’m using vi but use any available editor you liike
<copy>vi storefront-deployment-v0.0.1.yaml</copy>
The current contents of the section of the file looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: storefront
spec:
replicas: 1
selector:
matchLabels:
app: storefront
template:
metadata:
labels:
app: storefront
- Set the number of replicas to 4:
replicas: 4
We’re now going to tell Kubernetes to use a rolling upgrade strategy for any upgrades.
- After the replicas:4 line, add
strategy:
type: RollingUpdate
Finally we’re going to tell Kubernetes what limits we want to place on the rolling upgrade.
- Under the type line above, and at the same indent add the following
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
This limits the rollout process to having no more than 1 additional pods online above the normal replicas set, and only one pod below that specified in the replica set unavailable. So the roll out (in this case) allows us to have up to 5 pods running during the rollout and requires that at least 3 are running.
Note that unless you have very specific reasons don’t change the default settings for strategy type and maxSurge / minUnavailable. We are setting these for two reasons. First to show that the settings are available, and secondly for the purposes of this lab to show the roll out process in a way that let’s us actually see what’s happening by slowing things down (of course in a production you’d want it to run as fast as possible, so think about the settings used if you do override the defaults)
The section of the file after the changes will look like this
apiVersion: apps/v1
kind: Deployment
metadata:
name: storefront
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
selector:
matchLabels:
app: storefront
template:
metadata:
labels:
app: storefront
- Save the changes
Task 2b: Applying the rollout strategy
To do the roll out we’re just going to apply the new file. Kubernetes will compare that to the old config and update appropriately.
- Apply the new config
<copy>kubectl apply -f storefront-deployment-v0.0.1.yaml</copy>
deployment.apps/storefront configured
- Kubernetes will automatically keep a record of the previous versions and the command used (e.g.
kubectl apply -f storefront-deployment.yaml
), but the command may not be that usefull and we may want to describe the change. To provide a meaningful description we’re going to apply an annotation to the deployment and when we then look at the deployment history we will see the change description.
<copy>kubectl annotate deployment/storefront kubernetes.io/change-cause="Changed rollout settings"</copy>
deployment.apps/storefront annotated
- We can have a look at the status of the rollout
<copy>kubectl rollout status deployment storefront</copy>
deployment "storefront" successfully rolled out
If you get a message along the lines of Waiting for deployment "storefront" rollout to finish: 3 of 4 updated replicas are available...
this just means that the roll out is still in progress, once it’s complete you should see the success message.
- Let’s also look at the history of this deployment :
<copy>kubectl rollout history deployment storefront</copy>
deployment.apps/storefront
REVISION CHANGE-CAUSE
1 Changed rollout settings
The update has the change-cause of "Changed rollout settings"
from the kubectl annotate
command.
One point to note here, these changes only modified the deployment roll out configuration, so there was no need for Kubernetes to actually restart any pods as those were unchanged, however additional pods may have needed to be started to meet the replica count.
Making a change that updates the pods
Of course normally you would make a change to the deployment yaml, test it and build it, then push to the registry, you would probably use some form of CI/CD tooling to manage the process, for example a pipeline built using the Oracle Developer Cloud Service, or Argo CD (other options include the open source tools Jenkins / Hudson and Ansible).
For this lab we are focusing on Helidon and Kubernetes, not the entire CI/CD chain so like any good cooking program we’re going to use a v0.0.2 image that we created for you. For the purposes of this module the image is basically the same as the v0.0.1 version, except it reports it’s version as 0.0.2
Applying our new image
To apply the new v0.0.2 image we need to upgrade the configuration again. As discussed above this we would normally and following best practice do this by creating a new version of the deployment yaml file (say storefront-deploymentv0.0.2.yaml to match the container and code versions)
However … for the purpose of showing how this can be done using kubectl we are going to do this using the command line, not a configuration file change. This might be something you’d do in a test environment, but don’t do it in a production environment or your change management processes will almost certainly end up damaged.
- Edit the
storefront-deployment-v0.0.1.yaml
file and locate the image line, in my case it looks likeimage: fra.ocir.io/nrrwtjdl235/tg_labs_base_repo/storefront:0.0.1
but yours will be different. I’m using vi here but use the editor you prefer
<copy>vi storefront-deployment-v0.0.1.yaml</copy>
-
Copy your image location details to a notepad or similar. In my case the image location is
fra.ocir.io/nrrwtjdl235/tg_labs_base_repo/storefront:0.0.1
but yours will be different, replace the version with the micro release2
so for me after making the chance it will look like ``fra.ocir.io/nrrwtjdl235/tg_labs_base_repo/storefront:0.0.2` -
In the OCI cloud shell Execute the command below, replacing
[image location]
with the one you just got
kubectl set image deployment storefront storefront=[image location]
In my case the command is kubectl set image deployment storefront storefront=fra.ocir.io/nrrwtjdl235/tg_labs_base_repo/storefront:0.0.2
but of course yours will be different
deployment.apps/storefront image updated
- As is good practice we’ll update the history of the deployment so we know what the change is.
<copy>kubectl annotate deployment/storefront kubernetes.io/change-cause="Updated to v0.0.2 image"</copy>
- Let’s look at the status of our setup during the roll out
<copy>kubectl get all</copy>
NAME READY STATUS RESTARTS AGE
pod/stockmanager-6759d989bf-mtn76 1/1 Running 0 28m
pod/storefront-5f777cb4f5-7tlkb 1/1 Running 0 28m
pod/storefront-5f777cb4f5-8wnfm 1/1 Running 0 27m
pod/storefront-5f777cb4f5-gsbwd 1/1 Running 0 27m
pod/storefront-79d7d954d6-5g5ng 0/1 Running 0 5s
pod/storefront-79d7d954d6-m6qrg 0/1 Running 0 5s
pod/zipkin-88c48d8b9-r9vx2 1/1 Running 0 28m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/stockmanager ClusterIP 10.110.224.255 <none> 8081/TCP,9081/TCP 28m
service/storefront ClusterIP 10.99.139.139 <none> 8080/TCP,9080/TCP 28m
service/zipkin ClusterIP 10.104.158.61 <none> 9411/TCP 28m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/stockmanager 1/1 1 1 28m
deployment.apps/storefront 3/4 2 3 28m
deployment.apps/zipkin 1/1 1 1 28m
NAME DESIRED CURRENT READY AGE
replicaset.apps/stockmanager-6759d989bf 1 1 1 28m
replicaset.apps/storefront-6ha27g8ef4 0 0 0 35m
replicaset.apps/storefront-5f777cb4f5 3 3 3 28m
replicaset.apps/storefront-79d7d954d6 2 2 0 5s
replicaset.apps/zipkin-88c48d8b9 1 1 1 28m
We’re going to look at these in a different order to the output
Firstly the deployments info. We can see that 3 out of 4 pods are available, this is because we specified a maxUnavailable of 1, so as we have 4 replicas we must always have 3 of them available.
If we look at the replica sets we seem something unusual. There are two replica sets for the storefront. the original replica set (storefront-5f777cb4f5z
) has 3 pods available and running, one of them was stopped as we allow one a maxUnavailable of 1. There is however an additional storefront replica set storefront-79d7d954d6
This has 2 pods in it, at the time the data was gathered neither of them was ready. But why 2 pods when we’d only specified a surge over the replicas count of 1 pod ? That’s because we have one pod count “available” to us from the surge, and another “available” to us because we’re allowed to kill of one pod below the replicas count, making a total of two new pods that can be started.
Finally if we look at the pods themselves we see that there are five storefront pods. A point on pod naming, the first part of the pod name is actually the replica set the pod is in, so the three pods starting storefront-5f777cb4f5-
are actually in the replic set storefront-5f777cb4f5
(the old one) and the two pods starting storefront-79d7d954d6-
are in the storefront-79d7d954d6
replica set (the new one)
Basically what Kuberntes has done is created a new replica set and started some new pods in it by adjusting the number of pod replicas in each set, maintaining the overall count of having 3 pods available at all times, and only one additional pod over the replica count set in the deployment. Over time as those new pods come online in the new replica set and pass their readiness test, then they can provide the service and the old replica set will be reduced by one pod, allowing another new pod to be started. At all times there are 3 pods running.
- Rerun the status command a few times to see the changes
<copy>kubectl get all</copy>
If we look at the output again we can see the progress (note that the exact results will vary depending on how long after the previous kubectl get all command you ran this one).
NAME READY STATUS RESTARTS AGE
pod/stockmanager-6759d989bf-mtn76 1/1 Running 0 29m
pod/storefront-5f777cb4f5-7tlkb 1/1 Running 0 29m
pod/storefront-79d7d954d6-5g5ng 1/1 Running 0 63s
pod/storefront-79d7d954d6-7z2df 0/1 Running 0 17s
pod/storefront-79d7d954d6-h6qv7 0/1 Running 0 16s
pod/storefront-79d7d954d6-m6qrg 1/1 Running 0 63s
pod/zipkin-88c48d8b9-r9vx2 1/1 Running 0 29m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/stockmanager ClusterIP 10.110.224.255 <none> 8081/TCP,9081/TCP 29m
service/storefront ClusterIP 10.99.139.139 <none> 8080/TCP,9080/TCP 29m
service/zipkin ClusterIP 10.104.158.61 <none> 9411/TCP 29m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/stockmanager 1/1 1 1 29m
deployment.apps/storefront 3/4 4 3 29m
deployment.apps/zipkin 1/1 1 1 29m
NAME DESIRED CURRENT READY AGE
replicaset.apps/stockmanager-6759d989bf 1 1 1 29m
replicaset.apps/storefront-6ha27g8ef4 0 0 0 36m
replicaset.apps/storefront-5f777cb4f5 1 1 1 29m
replicaset.apps/storefront-79d7d954d6 4 4 2 63s
replicaset.apps/zipkin-88c48d8b9 1 1 1 29m
- Kubectl provides an easier way to look at the status of our rollout
<copy>kubectl rollout status deployment storefront</copy>
Waiting for deployment "storefront" rollout to finish: 3 out of 4 new replicas have been updated...
Waiting for deployment "storefront" rollout to finish: 3 out of 4 new replicas have been updated...
Waiting for deployment "storefront" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "storefront" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "storefront" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "storefront" rollout to finish: 3 of 4 updated replicas are available...
deployment "storefront" successfully rolled out
Kubectl provides us with a monitor which updates over time. Once all of the deployment is updated then kubectl returns.
During the rollout if you had accessed the status page for the storefront (on /sf/status) you would sometimes have got a version 0.0.1 in the response, and other times 0.0.2 This is because during the rollout there are instances of both versions running.
- If we look at the setup now we can see that the storefront is running only the new pods, and that there are 4 pods providing the service.
<copy>kubectl get all</copy>
NAME READY STATUS RESTARTS AGE
pod/stockmanager-6759d989bf-mtn76 1/1 Running 0 30m
pod/storefront-79d7d954d6-5g5ng 1/1 Running 0 108s
pod/storefront-79d7d954d6-7z2df 1/1 Running 0 62s
pod/storefront-79d7d954d6-h6qv7 1/1 Running 0 61s
pod/storefront-79d7d954d6-m6qrg 1/1 Running 0 108s
pod/zipkin-88c48d8b9-r9vx2 1/1 Running 0 30m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/stockmanager ClusterIP 10.110.224.255 <none> 8081/TCP,9081/TCP 30m
service/storefront ClusterIP 10.99.139.139 <none> 8080/TCP,9080/TCP 30m
service/zipkin ClusterIP 10.104.158.61 <none> 9411/TCP 30m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/stockmanager 1/1 1 1 30m
deployment.apps/storefront 4/4 4 4 30m
deployment.apps/zipkin 1/1 1 1 30m
NAME DESIRED CURRENT READY AGE
replicaset.apps/stockmanager-6759d989bf 1 1 1 30m
replicaset.apps/storefront-6ha27g8ef4 0 0 0 37m
replicaset.apps/storefront-5f777cb4f5 0 0 0 30m
replicaset.apps/storefront-79d7d954d6 4 4 4 108s
replicaset.apps/zipkin-88c48d8b9 1 1 1 30m
One important point is that you’ll see that the old replica set is still around, even though it hasn’t got any pods assigned to it. This is because it still holds the configuration that was in place before if we wanted to rollback (we’ll see this later)
- if we now look at the history we see that there have been two sets of changes
<copy>kubectl rollout history deployment storefront</copy>
deployment.apps/storefront
REVISION CHANGE-CAUSE
1 Changed rollout settings
2 Updated to v0.0.2 image
Note the change cause is what we set with the kubectl annotate command
- Let’s check on our deployment to make sure that the image is the v0.0.2 we expect
<copy>kubectl describe deployment storefront</copy>
Name: storefront
Namespace: tg-helidon
CreationTimestamp: Fri, 03 Jan 2020 11:58:05 +0000
Labels: app=storefront
Annotations: deployment.kubernetes.io/revision: 2
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"storefront","namespace":"tg-helidon"},"spec":{...
Selector: app=storefront
Replicas: 4 desired | 4 updated | 4 total | 4 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=storefront
Containers:
storefront:
Image: fra.ocir.io/oractdemeabdmnative/h-k8s_repo/storefront:0.0.2
...
Lots of output
...
Normal ScalingReplicaSet 23m deployment-controller Scaled up replica set storefront-79d7d954d6 to 1
Normal ScalingReplicaSet 23m deployment-controller Scaled down replica set storefront-5f777cb4f5 to 3
Normal ScalingReplicaSet 23m deployment-controller Scaled up replica set storefront-79d7d954d6 to 2
Normal ScalingReplicaSet 22m deployment-controller Scaled down replica set storefront-5f777cb4f5 to 2
Normal ScalingReplicaSet 22m deployment-controller Scaled up replica set storefront-79d7d954d6 to 3
Normal ScalingReplicaSet 22m deployment-controller Scaled down replica set storefront-5f777cb4f5 to 1
Normal ScalingReplicaSet 22m deployment-controller Scaled up replica set storefront-79d7d954d6 to 4
Normal ScalingReplicaSet 21m deployment-controller Scaled down replica set storefront-5f777cb4f5 to 0
We see the usual deployment info, the Image is indeed the new one we specified (in this case fra.ocir.io/oractdemeabdmnative/h-k8s_repo/storefront:0.0.2
) and the events log section shows us the various stages of rolling out the update.
If your cloud shell session is new or has been restarted then the shell variable $EXTERNAL_IP
may be invalid, expand this section if you think this may be the case to check and reset it if needed.