Visit Azul.com Support

Checkpoint in a Kubernetes Job

Need help?
Schedule a consultation with an Azul performance expert.
Contact Us

A Java application can be executed on Kubernetes as a "canary" to create a checkpoint for later runs. This document describes how to perform a checkpoint and restore end-to-end inside Kubernetes using a minikube cluster, rather than locally on a Linux machine or triggering the checkpoint in a container.

  1. Create a new namespace example

     
    minikube start eval $(minikube docker-env) kubectl create ns example kubectl config set-context --current --namespace=example
  2. Build a container image locally, based on this example Dockerfile.

     
    docker build -t azul-crac-example:k8s-spring-boot -f k8s/spring-boot/Dockerfile .

    The first stage builds the application and the second stage adds the netcat utility and two scripts:

    • checkpoint.sh starts the application with -XX:CRaCCheckpointTo=…​ and netcat server listening on port 1111. When somebody connects to this port, the checkpoint via jcmd will be triggered.

    • restore-or-start.sh will check the presence of checkpoint image files and either restores from this image, or fallbacks to a regular application startup.

  3. Create resources in the Kubernetes cluster, using the example k8s-file.

     
    kubectl apply -f k8s/spring-boot/k8s.yaml

    The following resources are created:

    • PersistentVolumeClaim representing a storage (in minikube this is bound automatically to a PersistentVolume)

    • Deployment that will create the application using the restore-or-start.sh script

    • Job that will create the checkpoint image.

  4. Check that the pods are running:

     
    $ kubectl get po NAME READY STATUS RESTARTS AGE create-checkpoint-fsfs4 2/2 Running 0 4s example-spring-boot-68b69cc8-bbxnx 1/1 Running 0 4s
  5. Check if the application started normally

    Explore the application logs (kubectl logs example-spring-boot-68b69cc8-bbxnx); the checkpoint image was not created yet. The other pod, though, hosts two containers: one running checkpoint.sh and the other warming the application up using siege, and then triggering the checkpoint through connection on port 1111 (this is not a built-in feature, remember that we use netcat in the background).

  6. After a while the job completes:

     
    $ kubectl get job NAME STATUS COMPLETIONS DURATION AGE create-checkpoint Complete 1/1 19s 44m
  7. Rollout a new deployment

    This time the application should restore from the checkpoint image:

     
    kubectl rollout restart deployment/example-spring-boot
  8. After a short moment that application is back up

     
    NAME READY STATUS RESTARTS AGE create-checkpoint-fsfs4 0/2 Completed 0 95s example-spring-boot-79b98966db-ml2pj 1/1 Running 0 15s

    In the logs you can check if the restore is performed:

     
    INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '' INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed (restored JVM running for 45 ms)
  9. Verify if the application responds to requests

    You must get the "Greetings from Spring Boot!" reply:

     
    kubectl expose deployment example-spring-boot --type=NodePort --port=8080 URL=$(minikube service example-spring-boot -n example --url) curl $URL