
Analyzing and Tuning Warm-up
Warm-up is the time taken for the Java application to reach the optimum compiled code performance. It is the task of the Just-in-Time (JIT) compiler to deliver optimal performance by producing optimized compiled code from application bytecode. This article will give you a basic understanding of how JIT compilation works and how to optimize warm-up using Azul Zulu Prime Builds of OpenJDK (Azul Zulu Prime JDK).
An Introduction to JIT Compilation
When people think of Java compilers, they usually think about javac
, which turns your Java source code into Java bytecode. But equally important is JIT compilation, which turns your Java bytecode into optimized machine code for the specific hardware on which your Java program is running.
When you first start your Java program, the JVM takes the platform-independent bytecode and runs it in the interpreter, which takes more CPU resources and is slower to execute. After a certain number of invocations (default 1K), the method is promoted to a profiling tier, known as the Tier 1 compiler or C1. Here the JVM monitors the method to build a profile of how many times each method is called, with which code paths, and how they are executed in the profiled method. After the compile threshold is reached (default 10K), the JVM promotes the method to the Tier 2 compiler by putting it in the Tier 2 compile queue. The Tier 2 compiler uses the Tier 1 profile to compile methods into highly optimized machine code.

Because JIT compilation needs to use the same resources that your program runs, JIT compilers are usually very conservative in their operations. The performance of your application is lower and less stable during the warm-up phase when the JVM is identifying and compiling all hot methods. Eventually, compilation activity settles down and your code achieves its optimum stable performance.
Note
|
JIT optimization is often only one part of what is commonly seen as "warm-up". Other parts of warm-up include initializing resources needed by the application, rehydrating data from caches, etc. If you are experiencing long warm-up times, make sure to analyze everything that is happening during warm-up. |
JIT compilers make speculations on the best way to optimize methods based on the usage seen in the life of the program so far. Sometimes those speculations turn out to be incorrect. When this happens, the JVM performs a de-optimization, or deopt, in which the compiled method is discarded and the method is run in the interpreter or in Tier 1 until the JIT compiler can provide a newly compiled method that matches the new usage patterns.
Azul Zulu Prime JDK and the Falcon JIT Compiler
Azul Zulu Prime JDK replaces OpenJDK’s HotSpot JIT compiler with Azul’s Falcon JIT compiler. Unlike HotSpot, Falcon has different levels of optimizations that you can use to balance eventual code speed versus how much time and computer resources you can commit to JIT warmup.
Wherever you have enough CPU capacity and time to warm up using full Falcon optimizations running locally on your JVM, you should do so. Full Falcon optimizations deliver the best performance and infrastructure savings.
There are many reasons to be sensitive to long warm-up, even if it delivers higher eventual speed:
-
Long warm-up times make CI/CD rolling upgrades of a fleet of VMs is too long.
-
Your SLAs mean you can’t start accepting traffic on a newly started node until it can serve requests at a certain speed.
-
You have policies in place that throttle traffic or spin up new instances when CPU utilization goes over a certain percentage.
-
You have to reserve capacity on your machines for the spike in CPU activity during warm-up, even though you do not need those resources for the regular running of your application.
There are several ways you can affect the warm-up of Azul Zulu Prime JDK:
Tuning JIT Compilations
Tune the Delivery of Full Falcon Optimizations
The Falcon compiler can be tuned in several ways:
-
Give Falcon more threads. The normal heuristic is for Falcon to be allocated 2/3 of the total threads, but on a small machine, it can get rounded down to 1. You give Falcon a specific number of threads using the following flag:
-XX:CIMaxCompilerThreads=3
(= 3 threads in this example).
Note
|
-XX:CIMaxCompilerThreads=3 will increase both Tier 1 (C1) and Tier 2 (C2) compiler threads. In case you want to distinguish between Tier 1 and Tier 2, the flags -XX:C1MaxCompilerThreads=3 and -XX:C2MaxCompilerThreads=3 can be used.
|
-
Lower the Falcon compile threshold. The default threshold is 10K, which means a method must be invoked 10K times before it is put in the compilation queue. Lowering this number improves the warm-up curve but means there will be more compilation activity as more methods would be compiled when reaching 5K invocations while not reaching 10K invocations at all with the default value. Set the compile threshold using
-XX:Tier2CompileThreshold=5000
.
Note
|
With Azul Zulu Prime JDK -XX:Tier2CompileThreshold=5000 behaves the same way as -XX:FalconCompileThreshold=5000 , as Falcon is the Tier 2 compiler, or C2, in Azul Prime. This means -XX:Tier2CompileThreshold works both for OpenJDK and Azul Prime.
|
-
Lower the C1 compile threshold using
XX:C1CompileThreshold=100
. C1, or client compiler, generally uses less memory and compiles methods quicker than C2 but not at a cost. Since C2 compiled code is better optimized, it is often worthwhile to use C2 compiled code rather than C1 compiled code, but only where total startup time is not a concern. If startup time is a concern and the goal is to get a better warmup time, it is better to run through methods using C1 compilation. This is easily achieved by lowering the C1 compile threshold. In Azul Prime,C1CompileThreshold
is set to 1000 by default. -
Give extra resources to the compiler for a set amount of time Normally, the Compiler must share resources with executed code. Using
-XX:CompilerWarmupPeriodSeconds
, setting a timeframe to exclusively run the compiler during warmup, together with-XX:CompilerWarmupExtraThreads
, allocating an extra number of threads to the compiler during warmup, will tell the JVM to give all available resources to the compiler for a set amount of time. After which, resources can finally be used by the application. This can greatly speed up warmup time but also restricts the use of the application during warmup.
Using Lower Optimization Levels
If you have tweaked the above settings and your warm-up time is still too long, you can lower optimization levels from Full Falcon down to the KestrelC2 compiler (light-weight Falcon). Each optimization level will give lower compile time and lower code speed. Each lower optimization level yields a drop in speed of C2 compilation from the next higher optimization level, in most cases, but also reduces the total compile time.
Available levels of optimization are described below:
-
Falcon Optimization Level 2 - Full Falcon: the full set of super-optimizations that deliver on average 20-30% faster code than OpenJDK’s HotSpot compiler. This is the recommended approach and the default configuration.
-
Falcon Optimization Level 1 - code runs about 5% slower than Opt level 2 and reduces the compile time by about 50%.
Enabled using the following option:
-
-XX:FalconOptimizationLevel=1
-
-
Falcon Optimization Level 0 - code runs about 30% slower than Opt level 2 and reduces the compile time by about 70%. Opt level 0 is roughly comparable to Zulu C2 performance.
Enabled using the following option:
-
-XX:FalconOptimizationLevel=0
-
-
KestrelC2: a limited set of optimizations designed to approximate the warm-up costs and eventual code speed of OpenJDK’s HotSpot compiler. Code compiled with KestrelC2 will have a lower eventual speed than code compiled with full Falcon optimizations, but will reach an optimal state faster and with fewer resources.
Enabled using the following option:
-
-XX:+UseKestrelC2
-
Different compiler options can be used to lowering the optimization levels and affect the amount of time and CPU Falcon uses to optimize your code. For more info on this topic, check Command Line Options > Falcon Compiler Options.
A solution for the slower optimized code in these circumstances is provided by Azul’s Cloud Native Compiler.
Setting Falcon to Prioritize Method Compilation Based on Hotness
As your compile queue builds up with methods reaching the compile threshold, it can contain many methods that were once hot but are now no longer being called, and therefore are not as critical to compile now. An example is an application platform that first performs many initialization operations and then has a different set of methods that are called once the application is initialized.
You may have a lot of methods in your compile queue for things that the application was doing when initializing but are no longer being called. Compiling these methods once initialization is already done is therefore wasted work and taking resources away from the running application.
One step to tell Falcon to focus only on compiling hot methods is to use the -XX:TopTierCompileThresholdTriggerMillis
flag. When a method hits the compile threshold, it is promoted to the compile queue. This flag tells the compiler to consider the minimum amount of time between promotions, also known as min-time-between-promotions, after a method has been promoted but before triggering compilation. The min-time-between-promotions value is first set according to the method’s first call and its first time hitting the compile threshold. Afterwards, min-time-between-promotions is updated every time the method hits the compile threshold again, considering the time interval is less than previous values.
Based on a method’s min-time-between-promotions value, Falcon is also able to assign a rank to each method, either hot, warm, or cold. Hot methods are given priority of resource allocation for compilation over warm methods, while cold methods are not given any resources for compilation. In other words, cold methods will never be sent for compilation even if they made it to the compile queue. Bounds for these ranks are set using the options TopTierHotCompileThresholdTriggerMillis
and TopTierWarmCompileThresholdMillis
.
In literal terms of method ranking, the previously mentioned option TopTierCompileThresholdTriggerMillis
sets TopTierHotCompileThresholdTriggerMillis
and TopTierWarmCompileThresholdTriggerMillis
to the same value, creating a simple boundary between hot and cold ranks and eliminating the warm rank. Therefore, you can not use TopTierCompileThresholdTriggerMillis
together with the options TopTierHotCompileThresholdTriggerMillis
and TopTierWarmCompileThresholdTriggerMillis
.
In terms of CPU thread allocation, Falcon’s CPU budgeting will always be set to 100% for hot methods and 0% for cold methods. CPU thread allocation for warm methods is variable and set using the option TopTierWarmCompileCpuPercent
, which is set to 25
by default.
The bounds for min-time-between-promotions are set using the following flags (when TopTierCompileThresholdTriggerMillis
is not set):
Option | Description | Default Value (in ms) |
---|---|---|
-XX:TopTierHotCompileThresholdTriggerMillis |
Set the upper bound for the hot rank and lower bound for the warm rank. The hot rank’s upper bound is 1ms less than the value set. |
60000 |
-XX:TopTierWarmCompileThresholdTriggerMillis |
Sets the upper bound for the warm rank and lower bound for the cold rank. The warm rank’s upper bound is 1ms less than the value set. |
600000 |
In other words, the range of min-time-between-promotions for each rank is set in the following way (when TopTierCompileThresholdTriggerMillis
is not set):
Rank | Range |
---|---|
hot |
min-time-between-promotions from 0 to ( |
warm |
min-time-between-promotions from |
cold |
min-time-between-promotions from |
For example, setting the options -XX:TopTierHotCompileThresholdTriggerMillis=100 -XX:TopTierWarmCompileThresholdTriggerMillis=1000 -XX:TopTierWarmCompileCpuPercent=50
results in the following ranks:
-
hot: 0-99ms min-time-between-promotions, cpu budget 100%
-
warm: 100-999ms min-time-between-promotions, cpu budget 50%
-
cold: 1000ms-inf min-time-between-promotions, cpu budget 0%
Note
|
hot or warm bounds values can also be set to 0 or -1. In case they are both 0, no compilation can get promoted to tier 2. A value of -1 sets the boundary to infinity. |
Analyzing Java Warm-up
So how do you know if your application is warmed up? The best way is by measuring the performance of your program by whatever metric you would normally measure it with. This could be operations/second or service time. Run a very long test and see how long it takes to reach 99% of peak performance and remain steadily at that level for a long period of time.
Note that JIT compilers often keep performing optimizations long after you’ve reached your optimal peak performance. Therefore, you see higher CPU activity even after your code is running at optimal peak performance.
When analyzing the compiler behavior, Azul Platform Prime’s Garbage Collector (GC) log file is the first information source to look into because this log file not only tracks GC information. On Azul Platform Prime, the log also contains much information about compilation activity and general system resources utilization.
Add this flag to the Java command line to enable GC logging: -Xlog:gc,safepoint:gc.log::filecount=0
. The filecount=0
setting disables log file rotation, resulting in one single log file which simplifies the tuning workflow. For further details about GC logging see Unified Garbage Collection Logging Recommendations and Advanced tuning hints.
After your test, open the log file in the Azul GC Log Analyzer (download). With this tool you can check the following information:
-
Number of threads/cores available for JIT compilation
-
Total number of threads/cores and RAM for the process
-
Compiler Queues: shows the number of methods waiting for compilation over time. Large numbers of methods in the compiler queues mean that Falcon does not have enough resources to handle all incoming requests
-
Compiler Threads: shows how many threads were used over time for the JIT compilation
-
Tier 2 Compile Counts and Tier 2 Wait Time Distribution: shows the full amount of compilations requested over the life of the process and how long it took to fulfill requests.
Example Case
Let’s compare the results of running the same application with or without certain parameters to see the impact on the warm-up. We let the application run for the same duration, which is long enough to reach a stable state, with the same load to ensure the maximum benefit from the Falcon compiler is reached, and similar use cases are compared.
This test application is run on a small machine to see the impact of threads on warm-up. First, no parameters were used. In a second run of the same test, startup parameters were added: -XX:CIMaxCompilerThreads=3
to use more threads, and -XX:Tier2CompileThreshold=5000
for a lower compiler threshold instead of the default 10K.
System Information
Let’s look at an example GC log. Open the log by running java -jar GCLogAnalyzer2.jar gc.log
. Click the button to see the overall information about the process:

You can see that the process is running on 6 threads overall. Scrolling down to the bottom, you see the following for the test without additional parameters:

So there is only one thread for JIT compilation, which is generally not recommended for on-JVM JIT compilation on Azul Zulu Prime JDK.
Note
|
If you need to run on resource-constrained machines, consider off-loading JIT compilation to Cloud Native Compiler. |
Compiler Queues
Click "Compiler Statistics" > "Compiler Queues" to see the backlog of methods in the Tier 2 Falcon compile queue. The left image shows a large backlog. In the second run a much smaller amount of compile queues is being handled, much more quickly.
Compiler Threads
Clicking "Compiler Statistics" > "Compiler Threads" shows there is just one Tier 2 thread which is getting maxed out. With the same additional flags for extra threads and lower compiler threshold, there is a more reasonable use of the three compiler threads over time rather than one thread being constantly maxed out.
Compile Counts
Clicking "Compiler Statistics" > "Tier 2 Compile Counts" shows a large number of the methods being evicted from the queue before they can be compiled. In the second run, the Compile Counts shows a much smaller number of methods getting evicted from the queue.
Note
|
A lot of evicted methods is not always a bad thing. It just demonstrates that the application has phases and some methods are not used within some period of time. For example: Falcon didn’t compile the incoming requests in time and the application just switched to another phase and certain methods are no longer needed. If methods are executed by the application later again they would be enqueued again, so no worries. |
Eviction From the Compiler Queue
The JVM enqueues a massive number of methods for compilation as your program starts. Most programs have different phases of execution. For example, your program could have an initialization phase followed by a steady run phase. The methods that are the hottest in the initialization phase may not be the same methods that are needed when you move to your steady run phase.
Azul Zulu Prime JDK optimizes for this situation by continuing to count invocations after the compilation threshold has been reached. Every time there are another 10K invocations, the JVM increments a counter on the method. If the counter hasn’t been incremented in 20s, meaning it hasn’t been called 10K times in the last 20 seconds, the method is evicted from the compile queue. You can disable the eviction policy using -XX:TopTierCompileQueueEvictAfterMs=-1
.
Use Cloud Native Compiler
We often see cases where customers want to take advantage of full Falcon super-optimizations but are running on small machines whose resource constraints make it difficult. That’s why Azul has developed Cloud Native Compiler. Cloud Native Compiler provides a server-side optimization solution that offloads JIT compilation to dedicated hardware, providing more processing power to JIT compilation while freeing your client JVMs from the load of doing JIT compilation.
For more information, see the Cloud Native Compiler documentation.
Use ReadyNow Warm-Up Optimizer
ReadyNow is a feature of Azul Zulu Prime JDK that can dramatically reduce your warm-up time. ReadyNow persists the profiling information gathered during the run of the application so that subsequent runs do not have to learn again from scratch. On the next run, ReadyNow pre-compiles all the methods in the profile before launching the Main method.
For more information, see the ReadyNow documentation.
Advanced Tuning Hints
When problems have been identified from the log-file analysis, you can dive even deeper into this process by running your application with additional flags that will give you more information.
To get a full picture of JIT compilation, use the -XX:+PrintCompilation
and -XX:+TraceDeoptimizations
flags to print info to the vm output. You can also redirect this output into a separate log file by using -XX:+LogVMOutput -XX:-DisplayVMOutput -XX:LogFile=vm.log -XX:+PrintCompilation -XX:+TraceDeoptimization
.