Back to Zing Documentation Home
Safepoint Profiler
The Safepoint Profiler is a lightweight profiling tool that is built into the Zing VM. It reports information about threads that are slow to reach safepoint. This is referred to as Time To Safepoint (TTSP).
All JVM's use safepoints to bring all of the application threads to a known state so the JVM can perform certain operations. Safepoints are used during Garbage Collection, during JIT compilation, for Thread Dumps, and many other operations. When a safepoint call is issued all of the application threads should "come to safepoint" as fast as possible. Threads that have come to safepoint block until the JVM releases them. Once all of the threads are at safepoint, the JVM performs the operation -- GC, compile, thread dump, etc. -- and then releases all the threads to run again. But when one or more application threads take a long time to come to safepoint, all of the other threads, which are now blocked, have to wait for the tardy thread(s).
Zing's Safepoint Profiler will identify those tardy threads and help you to figure out why they are slow to reach safepoint.
Safepoint Profiler Overview
Safepoint Profiler is an in-process profiling tool that collects stacks of the threads that were late to safepoint at frequent intervals asynchronously and also collects other system related information like the number of cores assigned to the thread. These logs can be collected and analyzed later to find the reason for the high TTSP issue.
Safepoint Profiler is designed to have the minimal possible impact on the ZVM performance. The tool uses Linux signals to collect asynchronous stack traces of “slow” threads. The waiting VMThread sends RealTime signals (the “TTSP signal”) to slow Java threads after exceeding the profiling threshold (tunable by flags). System level attributes are also recorded for the slow threads (wchan, thread state, etc). These attributes are stored in private buffers assigned to slow Java threads. When a slow JavaThread receives the TTSP signal, it records its stacks into its buffer. At the end of safepoint, the VMThread walks all the Java threads in the process and prints out the buffers of the corresponding thread if available.
To make TTSP overhead as lightweight as possible, there is no need to wait until slow Java threads respond to the signal that was sent by the VMThread. Since the Linux kernel sometimes delays delivering signals to a thread, it is entirely possible for a slow JavaThread to receive the signal after the whole safepoint is complete. An additional stack recording mechanism called "synchronous stack" serves as an extra stack that indicates the path from where the JavaThread gave up the jvm_lock (while reaching Safepoint). This way ensures there is at least one stack even if the signal delivery mechanism was not fast enough and didn't record any asynchronous stack.
Synchronous stack can be identified among the other stacks of a thread by looking at the top two frames in the StackTrace. It would indicate the path through which a JavaThread gave up the jvm_lock.
#0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
#1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()
#2 0x000000002067a1c6 JavaThread::jvm_unlock_self()
#3 0x00000000206bdd62 JavaThread::poll_at_safepoint()
#4 0x0000000020a36fab JavaThread::poll_at_safepoint_static(JavaThread*)
#5 0x00000000300068ba StubRoutines::safepoint_handler
#6 0x000000003000b43a ContendedClass.run()V
#7 0x00000000319a65ca interpreter
#8 0x0000000030003035 StubRoutines::call_stub
Configuring Safepoint Profiler
Use the following flags to configure the Safepoint Profiler:
Configuring option |
Description |
---|---|
-XX:+SafepointWaitTimeProfiler |
This is the flag that controls the turning on and off whole profiling mechanism. |
-XX:SafepointWaitTimeProfilerLog=<log file> |
By default the profiling information is logged onto the stderr, this flag redirects the output to a file. |
-XX:SafepointProfilerThreshold |
This is a threshold in percentage of safepoint/ checkpoint that controls when to activate profiling. So, for the current default value of |
-XX:SafepointProfilerInterval |
This is the interval in percentage of safepoint/ checkpoint that controls the frequency at which you would like to collect multiple stack samples for a TTSP issue. The default value is 10 and minimum value is 5%. |
-XX:SafepointProfilerThresholdMS |
Specifies the
number of milliseconds, which the |
Understanding Safepoint Profiler Output
To review a sample output of the Safepoint Profiler, run the command:
$taskset -c 0 java -XX:SafepointTimeoutDelay=10000 -XX:CheckpointTimeoutDelay=10000 -XX:SafepointProfilerThreshold=1 -XX:SafepointProfilerInterval=1 -XX:-DieOnSafepointTimeout <Test program>
This command prints the following details:
Detected TTSP issue: start: 31.259 wait: 17.526
Dumping stack for thread 0x0000440005e00000
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff
lock_release: 31.276
last_cpu: 20 cpu_time: 17563
0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()
2 0x000000002067a1c6 JavaThread::jvm_unlock_self()
3 0x00000000206bdd62 JavaThread::poll_at_safepoint()
4 0x0000000020a36fab JavaThread::poll_at_safepoint_static(JavaThread*)
5 0x00000000300068ba StubRoutines::safepoint_handler
6 0x000000003000b43a ContendedClass.run()V
7 0x00000000319a65ca interpreter
8 0x0000000030003035 StubRoutines::call_stub
signal_sent: 31.271 signal_responded: 31.276
state: R wchan: 0
last_cpu: 20 cpu_time: 17563
0 0x000000003000b382 ContendedClass.run()V
1 0x00000000319a65ca interpreter
2 0x0000000030003035 StubRoutines::call_stub
The following table describes each field of the output.
Field |
Description |
---|---|
Detected TTSP issue: start: 31.259 wait: 17.526 |
This is the header that indicates the start of the logging for specific TTSP issue:
|
Dumping stack for thread 0x0000440005e00000 |
This is the header for each thread that had TTSP and whose stack will be logged. |
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff |
|
lock_release: 31.276 |
This line is printed only for the synchronously collected stack. The time here means the time this JavaThread actually gave up the lock. But this does not mean that all the threads are at safepoint at this point in time, since there could be other threads that would have taken longer to reach. |
signal_sent: 31.271 signal_responded: 31.276 |
Applicable to only asynchronously collected stacks.
If there is a huge difference in the time stamps of these fields, this could possibly indicate the following two among other unknown possibilities: a) The thread was not scheduled hence the kernel couldn't deliver the signal immediately. This could possibly hint at starvation issues. b) The thread was in uninterrupted mode, so the signal was in the pending signal list. Examples: Page fault handler. |
state: R wchan: 0 |
This line is applicable only for asynchronously collected stacks:
|
last_cpu: 20 cpu_time: 17563 |
|
© Azul Systems, Inc. 2020 All rights reserved. |
![]() ![]() ![]() |
|