Checkpoint in a Kubernetes Job
A Java application can be executed on Kubernetes as a "canary" to create a checkpoint for later runs. This document describes how to perform a checkpoint and restore end-to-end inside Kubernetes using a minikube cluster, rather than locally on a Linux machine or triggering the checkpoint in a container.
-
Create a new namespace
exampleminikube start eval $(minikube docker-env) kubectl create ns example kubectl config set-context --current --namespace=example -
Build a container image locally, based on this example Dockerfile.
docker build -t azul-crac-example:k8s-spring-boot -f k8s/spring-boot/Dockerfile .The first stage builds the application and the second stage adds the netcat utility and two scripts:
-
checkpoint.shstarts the application with-XX:CRaCCheckpointTo=…and netcat server listening on port 1111. When somebody connects to this port, the checkpoint via jcmd will be triggered. -
restore-or-start.shwill check the presence of checkpoint image files and either restores from this image, or fallbacks to a regular application startup.
-
-
Create resources in the Kubernetes cluster, using the example k8s-file.
kubectl apply -f k8s/spring-boot/k8s.yamlThe following resources are created:
-
PersistentVolumeClaimrepresenting a storage (in minikube this is bound automatically to aPersistentVolume) -
Deploymentthat will create the application using therestore-or-start.shscript -
Jobthat will create the checkpoint image.
-
-
Check that the pods are running:
$ kubectl get po NAME READY STATUS RESTARTS AGE create-checkpoint-fsfs4 2/2 Running 0 4s example-spring-boot-68b69cc8-bbxnx 1/1 Running 0 4s -
Check if the application started normally
Explore the application logs (
kubectl logs example-spring-boot-68b69cc8-bbxnx); the checkpoint image was not created yet. The other pod, though, hosts two containers: one runningcheckpoint.shand the other warming the application up using siege, and then triggering the checkpoint through connection on port 1111 (this is not a built-in feature, remember that we use netcat in the background). -
After a while the job completes:
$ kubectl get job NAME STATUS COMPLETIONS DURATION AGE create-checkpoint Complete 1/1 19s 44m -
Rollout a new deployment
This time the application should restore from the checkpoint image:
kubectl rollout restart deployment/example-spring-boot -
After a short moment that application is back up
NAME READY STATUS RESTARTS AGE create-checkpoint-fsfs4 0/2 Completed 0 95s example-spring-boot-79b98966db-ml2pj 1/1 Running 0 15sIn the logs you can check if the restore is performed:
INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '' INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed (restored JVM running for 45 ms) -
Verify if the application responds to requests
You must get the "Greetings from Spring Boot!" reply:
kubectl expose deployment example-spring-boot --type=NodePort --port=8080 URL=$(minikube service example-spring-boot -n example --url) curl $URL