Alternative Images
In modern cloud environments, a single checkpoint image may not be enough. With Java portability, your application can run on different CPU architectures and generations. Sometimes the application might not be optimally warmed up before the checkpoint. The requirements might change over time. In all these situations it is useful to generate more checkpoint images and select the best one during restore.
Checkpoint Path
Traditionally, you use -XX:CRaCCheckpointTo=/path/to/image to set the destination for the checkpoint image. If the image already exists in that location, it is overwritten. Rather than setting such a fixed path, you can use placeholders that are automatically replaced when the image is created. For example, /path/to/my_app_v1.2.3/image_%a.%p becomes /path/to/my_app_v1.2.3/image_aarch64.4321.
The following placeholders are available:
-
%%: Single%character. -
%a: Architecture, the same value as the system propertyos.arch. -
%f: CPU features hex string. Empty string if the architecture does not use optional CPU features. -
%u: UUID (version 4 = random). -
%g: checkpoint generation (starting with 1, 0 is reserved for ‘unknown’) -
%t: Checkpoint date and time in the ISO-8601 format in UTC. This is the basic format (without separators) with second precision, e.g.20250909T141711Z. -
%T: Checkpoint epoch time (second precision). -
%band%B: Process boot time (generation 1), in the same format as%tor%T. -
%rand%R: Last restore time, in the same format as%tor%T. In case of generation 1 this is the same as the process boot time. -
%p: PID of the checkpointed process. -
%c: Number of CPU cores. -
%m: Maximum heap size (-Xmx) in a user-friendly format, usingGorMsuffix.
The numeric placeholders (%T, %B, %R, %p, %c, %m, and %g) support an optional prefix with minimum width, padded with spaces or zeroes if the prefix starts with zero. For example, %3g becomes 1, %03g becomes 001.
CRaC doesn’t parse information from the path during restore. These placeholders serve to create an unique location for each image, and allow you to organize those images (e.g. remove those that are expired). The UUID (%u) is particularly suitable to ensure uniqueness. The application version and environment (JARs, system libraries, container image tag, etc.) must still be encoded in the path as usual.
Image Selection
|
Note
|
This is available only in the Subscriber Availability (SA) builds of Zulu and Zing, when using the Warp engine with -XX:CRaCEngine=warp.
|
The CRaC image is always a directory, and with the Image Selector there is asymmetry between the path used for -XX:CRaCCheckpointTo and -XX:CRaCRestoreFrom. The restore option is handled differently with Image Selection:
-
Regular Restore: the value points to a single image.
-
With Image Selector (
image_selector=true): the value points to a directory with several images. This is an example how to use theCRaCCheckpointToandCRaCRestoreFromvalues with Image Selector:# Command line option to create the checkpoint -XX:CRaCCheckpointTo=/path/to/my_app_v1.2.3/image_%a.%p # Command line option to restore from the checkpoint # This maps to the parent level of the CRaCCheckpointTo value -XX:CRaCRestoreFrom=/path/to/my_app_v1.2.3 -XX:CRaCEngineOptions=image_selector=true
Image Selection Criteria
You need to ask two questions about each image when deciding which image to use:
-
Can I use this image now?
-
Is this the best image?
Provided that the application version matches, the first question usually boils down to the correct CPU architecture and features and hence can be decided automatically. The second one requires some insight into what "best" means. By default, CRaC selects an arbitrary checkpoint image (specifically, the first one that matches the criteria).
Decision Metadata
To provide data for the decision, CRaC stores a list of metadata in the image: pairs of metric name and floating-point value. On restore, you must provide a configuration file that maps these metrics into a score for each image. The image with the highest final score gets selected. The Image Selector can be configured with -XX:CRaCEngineOptions=image_selector.policy=policy.yaml. This file uses the following format:
# Comments allowed as usual
version: 1
score:
my.metric:
weight: 3
required: true
another.metric:
weight: 0.001
default: 1000
When an image is evaluated with my.metric=5 and another.metric=2000, the final score is the scalar product of the metrics and the configured weights:
score = 5 * 3 + 2000 * 0.001 = 17
my.metric ^ ^ weight
When the metric is not present in the image, its value defaults to 0 or the value set through the default property. If you want to entirely exclude an image with missing metric from the selection, you can use the property required as in the example above. It is possible to define additional constraints for the value:
version: 1
score:
my.metric:
constraints:
- gte: 5
- lt: 10
This configuration would only admit images where my.metric >= 5 && my.metric < 10 (the default value of 0 would reject images missing the metric). The operators used in the constraints are:
-
gt: > -
gte/ge: >= -
lt: < -
lte/le: ⇐ -
eq: ==
Inspecting and Recording Metrics
CRaC records a list of internal metrics by default. These include the number of loaded classes, compilations, JVM uptime, number of CRaC resources registered, etc. You can inspect the metrics available in an image with this command:
$JAVA_HOME/bin/warp info /path/to/image | jq .metrics
{
"java.cls.loadedClasses": 115.000000,
"java.cls.sharedLoadedClasses": 1090.000000,
...
}
User-defined metrics can likely provide a better estimation of the future performance, though. For example, you can observe how many requests the application can handle or what is the average latency. On checkpoint, it is possible to record extra metrics using the jcmd command when requesting the checkpoint creation:
jcmd ... JDK.checkpoint metrics=my.metric=5,another.metric=2000
jcmd ... JDK.checkpoint metrics=@/path/to/metrics.txt
With the second option, we expect /path/to/metrics.txt to contain one metric per line, with the name and value separated by =, for example:
my.metric=5 another.metric=2000
In the future, we expect to introduce more ways to record metrics, from the application itself or updating the image after it has been created.
Refine CPU Features
With a single image, we attempt to restore from an image rather than starting from scratch, even if the image was created on a machine with fewer CPU features and therefore the image does not perform optimally. With image autoselection, in an environment with a few available CPU types, it might be more efficient to require an image that has been created on the very same CPU type, though. To enforce an exact CPU match (in terms of features, not CPU cores) you can set:
java -XX:CRaCRestoreFrom=/path/to/image -XX:CheckCPUFeatures=exact
As a result, a restore will refuse an image with a different set. This setting is recognized by the Image Selector, too. See Transparent Restore or Startup for more options related to optimized deployment.