This is the header that indicates the start of the logging for specific TTSP issue: * start – The VM uptime, this time is used to match against the GC log file to find the corresponding SPS. * wait – The actual wait time in milliseconds as seen in the SPS line.
Safepoint Profiler
The Safepoint Profiler is a lightweight profiling tool that is built into the Azul Zing Builds of OpenJDK (Zing). It reports information about threads that are slow to reach safepoint. This is referred to as Time To Safepoint (TTSP).
All JVMs use safepoints to bring all the application threads to a known state so the JVM can perform certain operations. Safepoints are used during Garbage Collection, during JIT compilation, for Thread Dumps, and many other operations. When a safepoint call is issued all the application threads should "come to safepoint" as fast as possible. Threads that have come to safepoint block until the JVM releases them. Once all the threads are at safepoint, the JVM performs the operation — GC, compile, thread dump, etc. — and then releases all the threads to run again. But when one or more application threads take a long time to come to safepoint, all the other threads, which are now blocked, have to wait for the tardy thread(s).
Zing’s Safepoint Profiler will identify those tardy threads and help you to figure out why they are slow to reach safepoint.
Safepoint Profiler Overview
Safepoint Profiler is an in-process profiling tool that collects stacks of the threads that were late to safepoint at frequent intervals asynchronously and also collects other system related information like the number of cores assigned to the thread. These logs can be collected and analyzed later to find the reason for the high TTSP issue.
Safepoint Profiler is designed to have the minimal possible impact on Zing performance. The tool uses Linux signals to collect asynchronous stack traces of “slow” threads. The waiting VMThread
sends RealTime signals (the “TTSP signal”) to slow Java threads after exceeding the profiling threshold (tunable by flags). System level attributes are also recorded for the slow threads (wchan, thread state, etc). These attributes are stored in private buffers assigned to slow Java threads. When a slow JavaThread
receives the TTSP signal, it records its stacks into its buffer. At the end of safepoint, the VMThread walks all the Java threads in the process and prints out the buffers of the corresponding thread if available.
To make TTSP overhead as lightweight as possible, there is no need to wait until slow Java threads respond to the signal that was sent by the VMThread
. Since the Linux kernel sometimes delays delivering signals to a thread, it is entirely possible for a slow JavaThread
to receive the signal after the whole safepoint is complete. An additional stack recording mechanism called "synchronous stack" serves as an extra stack that indicates the path from where the JavaThread
gave up the jvm_lock
(while reaching Safepoint). This way ensures there is at least one stack even if the signal delivery mechanism was not fast enough and didn’t record any asynchronous stack.
Synchronous stack can be identified among the other stacks of a thread by looking at the top two frames in the StackTrace
. It would indicate the path through which a JavaThread
gave up the jvm_lock
.
#0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
#1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()
#2 0x000000002067a1c6 JavaThread::jvm_unlock_self()
#3 0x00000000206bdd62 JavaThread::poll_at_safepoint()
#4 0x0000000020a36fab JavaThread::poll_at_safepoint_static(JavaThread*)
#5 0x00000000300068ba StubRoutines::safepoint_handler
#6 0x000000003000b43a ContendedClass.run()V
#7 0x00000000319a65ca interpreter
#8 0x0000000030003035 StubRoutines::call_stub
Configuring Safepoint Profiler
Use the following flags to configure the Safepoint Profiler:
Configuring option | Description |
---|---|
-XX:+SafepointWaitTimeProfiler |
This is the flag that controls the turning on and off whole profiling mechanism. |
-XX:SafepointWaitTimeProfilerLog=<log file> |
By default the profiling information is logged onto the stderr, this flag redirects the output to a file. |
-XX:SafepointProfilerThreshold |
This is a threshold in percentage of safepoint/ checkpoint that controls when to activate profiling. So, for the current default value of SafepointTimoutDelay 50000, we would start profiling the threads once we cross 1000 ms (2% of timeout). The default value of this flag is 2. |
-XX:SafepointProfilerInterval |
This is the interval in percentage of safepoint/ checkpoint that controls the frequency at which you would like to collect multiple stack samples for a TTSP issue. The default value is 10 and minimum value is 5%. |
-XX:SafepointProfilerThresholdMS |
Specifies the number of milliseconds, which the VMThread waits before sending the first, second, third, fourth, and fifth backtrace request for each laggard thread. Using this flag replaces the use of SafepointProfilerThreshold, which applies to both checkpoints and safepoints. After the fifth backtrace without reaching [safe/check] point, subsequent wait intervals are computed by using the SafepointProfilerInterval value. |
Understanding Safepoint Profiler Output
To review a sample output of the Safepoint Profiler, run the command:
$taskset -c 0 java -XX:SafepointTimeoutDelay=10000 -XX:CheckpointTimeoutDelay=10000 -XX:SafepointProfilerThreshold=1 -XX:SafepointProfilerInterval=1 -XX:-DieOnSafepointTimeout <Test program>
This command prints the following details:
Detected TTSP issue: start: 31.259 wait: 17.526
Dumping stack for thread 0x0000440005e00000
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff
lock_release: 31.276
last_cpu: 20 cpu_time: 17563
0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()
2 0x000000002067a1c6 JavaThread::jvm_unlock_self()
3 0x00000000206bdd62 JavaThread::poll_at_safepoint()
4 0x0000000020a36fab JavaThread::poll_at_safepoint_static(JavaThread*)
5 0x00000000300068ba StubRoutines::safepoint_handler
6 0x000000003000b43a ContendedClass.run()V
7 0x00000000319a65ca interpreter
8 0x0000000030003035 StubRoutines::call_stub
signal_sent: 31.271 signal_responded: 31.276
state: R wchan: 0
last_cpu: 20 cpu_time: 17563
0 0x000000003000b382 ContendedClass.run()V
1 0x00000000319a65ca interpreter
2 0x0000000030003035 StubRoutines::call_stub
The following table describes each field of the output.
Field | Description |
---|---|
Detected TTSP issue: start: 31.259 wait: 17.526 |
|
Dumping stack for thread 0x0000440005e00000 |
This is the header for each thread that had TTSP and whose stack will be logged. |
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff |
Thread-1 – The thread name.
|
lock_release: 31.276 |
This line is printed only for the synchronously collected stack. The time here means the time this JavaThread actually gave up the lock. But this does not mean that all the threads are at safepoint at this point in time, since there could be other threads that would have taken longer to reach. |
signal_sent: 31.271 signal_responded: 31.276 |
Applicable to only asynchronously collected stacks.
If there is a huge difference in the time stamps of these fields, this could possibly indicate the following two among other unknown possibilities:
|
state: R wchan: 0 |
This line is applicable only for asynchronously collected stacks:
|
last_cpu: 20 cpu_time: 17563 |
last cpu – The last CPU the thread was known to be running on. cpu time – Prints the clock CPU time of the thread. |