Safepoint Profiler

Table of Contents

Safepoint Profiler Overview
Configuring Safepoint Profiler
Understanding Safepoint Profiler Output

Need help?

Schedule a consultation with an Azul performance expert.

The Safepoint Profiler is a lightweight profiling tool that is built into the Azul Zing Builds of OpenJDK (Zing). It reports information about threads that are slow to reach safepoint. This is referred to as Time To Safepoint (TTSP).

All JVMs use safepoints to bring all the application threads to a known state so the JVM can perform certain operations. Safepoints are used during Garbage Collection, during JIT compilation, for Thread Dumps, and many other operations. When a safepoint call is issued all the application threads should "come to safepoint" as fast as possible. Threads that have come to safepoint block until the JVM releases them. Once all the threads are at safepoint, the JVM performs the operation — GC, compile, thread dump, etc. — and then releases all the threads to run again. But when one or more application threads take a long time to come to safepoint, all the other threads, which are now blocked, have to wait for the tardy thread(s).

Zing’s Safepoint Profiler will identify those tardy threads and help you to figure out why they are slow to reach safepoint.

Safepoint Profiler Overview

Safepoint Profiler is an in-process profiling tool that collects stacks of the threads that were late to safepoint at frequent intervals asynchronously and also collects other system related information like the number of cores assigned to the thread. These logs can be collected and analyzed later to find the reason for the high TTSP issue.

Safepoint Profiler is designed to have the minimal possible impact on Zing performance. The tool uses Linux signals to collect asynchronous stack traces of “slow” threads. The waiting VMThread sends RealTime signals (the “TTSP signal”) to slow Java threads after exceeding the profiling threshold (tunable by flags). System level attributes are also recorded for the slow threads (wchan, thread state, etc). These attributes are stored in private buffers assigned to slow Java threads. When a slow JavaThread receives the TTSP signal, it records its stacks into its buffer. At the end of safepoint, the VMThread walks all the Java threads in the process and prints out the buffers of the corresponding thread if available.

To make TTSP overhead as lightweight as possible, there is no need to wait until slow Java threads respond to the signal that was sent by the VMThread. Since the Linux kernel sometimes delays delivering signals to a thread, it is entirely possible for a slow JavaThread to receive the signal after the whole safepoint is complete. An additional stack recording mechanism called "synchronous stack" serves as an extra stack that indicates the path from where the JavaThread gave up the jvm_lock (while reaching Safepoint). This way ensures there is at least one stack even if the signal delivery mechanism was not fast enough and didn’t record any asynchronous stack.

Synchronous stack can be identified among the other stacks of a thread by looking at the top two frames in the StackTrace. It would indicate the path through which a JavaThread gave up the jvm_lock.

 #0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
#1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()
#2 0x000000002067a1c6 JavaThread::jvm_unlock_self()
#3 0x00000000206bdd62 JavaThread::poll_at_safepoint()
#4 0x0000000020a36fab JavaThread::poll_at_safepoint_static(JavaThread*)
#5 0x00000000300068ba StubRoutines::safepoint_handler
#6 0x000000003000b43a ContendedClass.run()V
#7 0x00000000319a65ca interpreter
#8 0x0000000030003035 StubRoutines::call_stub

Configuring Safepoint Profiler

Use the following flags to configure the Safepoint Profiler:

Configuring option	Description
-XX:+SafepointWaitTimeProfiler	This is the flag that controls the turning on and off whole profiling mechanism.
-XX:SafepointWaitTimeProfilerLog=<log file>	By default the profiling information is logged onto the stderr, this flag redirects the output to a file.
-XX:SafepointProfilerThreshold	This is a threshold in percentage of safepoint/ checkpoint that controls when to activate profiling. So, for the current default value of SafepointTimoutDelay 50000, we would start profiling the threads once we cross 1000 ms (2% of timeout). The default value of this flag is 2.
-XX:SafepointProfilerInterval	This is the interval in percentage of safepoint/ checkpoint that controls the frequency at which you would like to collect multiple stack samples for a TTSP issue. The default value is 10 and minimum value is 5%.
-XX:SafepointProfilerThresholdMS	Specifies the number of milliseconds, which the VMThread waits before sending the first, second, third, fourth, and fifth backtrace request for each laggard thread. Using this flag replaces the use of SafepointProfilerThreshold, which applies to both checkpoints and safepoints. After the fifth backtrace without reaching [safe/check] point, subsequent wait intervals are computed by using the SafepointProfilerInterval value.

Configuring option

Description

-XX:+SafepointWaitTimeProfiler

This is the flag that controls the turning on and off whole profiling mechanism.

-XX:SafepointWaitTimeProfilerLog=<log file>

By default the profiling information is logged onto the stderr, this flag redirects the output to a file.

-XX:SafepointProfilerThreshold

This is a threshold in percentage of safepoint/ checkpoint that controls when to activate profiling. So, for the current default value of SafepointTimoutDelay 50000, we would start profiling the threads once we cross 1000 ms (2% of timeout). The default value of this flag is 2.

-XX:SafepointProfilerInterval

This is the interval in percentage of safepoint/ checkpoint that controls the frequency at which you would like to collect multiple stack samples for a TTSP issue. The default value is 10 and minimum value is 5%.

-XX:SafepointProfilerThresholdMS

Specifies the number of milliseconds, which the VMThread waits before sending the first, second, third, fourth, and fifth backtrace request for each laggard thread. Using this flag replaces the use of SafepointProfilerThreshold, which applies to both checkpoints and safepoints. After the fifth backtrace without reaching [safe/check] point, subsequent wait intervals are computed by using the SafepointProfilerInterval value.

Understanding Safepoint Profiler Output

To review a sample output of the Safepoint Profiler, run the command:

 $taskset -c 0 java -XX:SafepointTimeoutDelay=10000 -XX:CheckpointTimeoutDelay=10000 -XX:SafepointProfilerThreshold=1 -XX:SafepointProfilerInterval=1 -XX:-DieOnSafepointTimeout <Test program>
This command prints the following details:

Detected TTSP issue: start: 31.259 wait: 17.526
Dumping stack for thread 0x0000440005e00000
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff
lock_release: 31.276
last_cpu: 20 cpu_time: 17563
0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()
2 0x000000002067a1c6 JavaThread::jvm_unlock_self()
3 0x00000000206bdd62 JavaThread::poll_at_safepoint()
4 0x0000000020a36fab JavaThread::poll_at_safepoint_static(JavaThread*)
5 0x00000000300068ba StubRoutines::safepoint_handler
6 0x000000003000b43a ContendedClass.run()V
7 0x00000000319a65ca interpreter
8 0x0000000030003035 StubRoutines::call_stub
signal_sent: 31.271 signal_responded: 31.276
state: R wchan: 0
last_cpu: 20 cpu_time: 17563
0 0x000000003000b382 ContendedClass.run()V
1 0x00000000319a65ca interpreter
2 0x0000000030003035 StubRoutines::call_stub

The following table describes each field of the output.

Field	Description
Detected TTSP issue: start: 31.259 wait: 17.526	This is the header that indicates the start of the logging for specific TTSP issue: * start – The VM uptime, this time is used to match against the GC log file to find the corresponding SPS. * wait – The actual wait time in milliseconds as seen in the SPS line.
Dumping stack for thread 0x0000440005e00000	This is the header for each thread that had TTSP and whose stack will be logged.
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff	Thread-1 – The thread name. id – LWP of the thread. prio – Java level priority os_prio – OS level priority sched – Scheduler chosen for the thread. allowed_cpus – The number of the CPUs that the thread is allowed to run. Each bit represents a CPU.
lock_release: 31.276	This line is printed only for the synchronously collected stack. The time here means the time this JavaThread actually gave up the lock. But this does not mean that all the threads are at safepoint at this point in time, since there could be other threads that would have taken longer to reach.
signal_sent: 31.271 signal_responded: 31.276	Applicable to only asynchronously collected stacks. signal sent – The time when VMThread issued a signal to this thread to record its stack. signal responded – The time at which the thread actually responded (received) the signal to dump the stack. If there is a huge difference in the time stamps of these fields, this could possibly indicate the following two among other unknown possibilities: The thread was not scheduled hence the kernel couldn’t deliver the signal immediately. This could possibly hint at starvation issues. The thread was in uninterrupted mode, so the signal was in the pending signal list. Examples: Page fault handler.
state: R wchan: 0	This line is applicable only for asynchronously collected stacks: state – State of the thread when the signal was sent from the VMThread. wchan – Waiting channel state if the thread was waiting for something.
last_cpu: 20 cpu_time: 17563	last cpu – The last CPU the thread was known to be running on. cpu time – Prints the clock CPU time of the thread.

Field

Description

Detected TTSP issue: start: 31.259 wait: 17.526

This is the header that indicates the start of the logging for specific TTSP issue: * start – The VM uptime, this time is used to match against the GC log file to find the corresponding SPS. * wait – The actual wait time in milliseconds as seen in the SPS line.

Dumping stack for thread 0x0000440005e00000

This is the header for each thread that had TTSP and whose stack will be logged.

"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff

Thread-1 – The thread name.

id – LWP of the thread.
prio – Java level priority
os_prio – OS level priority
sched – Scheduler chosen for the thread.
allowed_cpus – The number of the CPUs that the thread is allowed to run. Each bit represents a CPU.

lock_release: 31.276

This line is printed only for the synchronously collected stack. The time here means the time this JavaThread actually gave up the lock. But this does not mean that all the threads are at safepoint at this point in time, since there could be other threads that would have taken longer to reach.

signal_sent: 31.271 signal_responded: 31.276

Applicable to only asynchronously collected stacks.

signal sent – The time when VMThread issued a signal to this thread to record its stack.
signal responded – The time at which the thread actually responded (received) the signal to dump the stack.

If there is a huge difference in the time stamps of these fields, this could possibly indicate the following two among other unknown possibilities:

The thread was not scheduled hence the kernel couldn’t deliver the signal immediately. This could possibly hint at starvation issues.
The thread was in uninterrupted mode, so the signal was in the pending signal list. Examples: Page fault handler.

state: R wchan: 0

This line is applicable only for asynchronously collected stacks:

state – State of the thread when the signal was sent from the VMThread.
wchan – Waiting channel state if the thread was waiting for something.

last_cpu: 20 cpu_time: 17563

last cpu – The last CPU the thread was known to be running on.

cpu time – Prints the clock CPU time of the thread.