Understanding a Garbage Collection Log Format

Table of Contents

Detailed GC Logging
Regular GC Logging
GC Log Format
Parsing GC Log Data

Need help?

Schedule a consultation with an Azul performance expert.

A garbage collection (GC) log is a text file with the collected and written metrics of the garbage collector’s work produced by the Java Virtual Machine (JVM). The GC Log Analyzer extracts various JVM and system metrics from GC logs.

Unlike logs produced by other JVMs, Zing GC logs contain detailed information not only about the GC process but about the system state as a whole.

GC logs can be very helpful in diagnosing CPU and memory-related issues, as well as in optimizing application performance.

GC log contents depend on the command-line options used to start Zing. Some details are available only when explicitly enabled with specific flags (see Garbage Collector Options in Using Zing Command-Line Options).

Zing uses an efficient, extendable text-based format to log events.

Detailed GC Logging

If detailed logging is enabled with -XX:+PrintGCDetails, very detailed and verbose information is printed to the log in a human-readable format that is NOT documented and can change at any time without notice.

A detailed GC log reading requires a deep understanding of Zing GC internals and is mostly intended for Azul engineers to use.

Regular GC Logging

The most common practice is to enable non-detailed GC logging with -XX:+PrintGC -XX:+PrintGCTimeStamps to minimize GC log file size, yet to get a lot of valuable information.

Zing writes GC logs very efficiently, so logging can be safely enabled even for low-latency production applications.

GC Log Format

The format of a GC log is universal. You do not need to know the whole range of possible fields to parse a log if you are familiar with the format.

Example 1. Zing GC log line format

 <date>: <rel_time>: [<ID> <value> … ]

Where:

(optional) <date> is an absolute time stamp of the logged event, enabled with -XX:+PrintGCDateStamps
(optional) <rel_time> is the time passed since the start of Zing in seconds, enabled with -XX:+PrintGCTimeStamps
<ID> is the identifier of a line type, it defines how to interpret data that follows
<value> is a set of data fields, separated with a space, colon, or both

Example 2. A GC log line

 1.816: [OBJCREAT 33892 26764 18432 : 1.815713 ]

The meaning of fields can vary in different releases of Zing.

To associate a field with its meaning, read a header line for a particular type of the line.

Header lines are printed close to the beginning of the log and get replicated under some conditions (e.g., when GC log rotation is in use).

The format of the header line is very similar to the data line, except that <ID> is appended with the H letter followed by textual field names.

Example 3. A GC log header line

 0.074: [OBJCREATH newGen(kB) permGen(kB) oldGen(kB) : end(s)]

Some headers consist of two consequent lines.

Example 4. A GC log two-line header

 0.074: [TAGH Group 1 Group 2 : AnotherGroup]
0.074: [TAGH field data data : data ]

In case of two header lines, the first line denotes groups and the second line denotes names. A combination of group + name is guaranteed to be unique within the same line type.

Unlike names, groups can contain spaces. Groups are divided by either a colon or two (or more) spaces.

Example 4 above defines the following three groups and four fields:

 Group 1/field
Group 1/data
Group 2/data
AnotherGroup/data

Although names of the fields can repeat (as shown in the example above), they are unique within the same line type in most cases. Names of the groups are guaranteed to be unique within the same line type.

Note how Group 2 is aligned with the second data field.

Below is a general rule for searching the index of a named field in a two-line header:

Find the group part of an ID in the first line.
Find the name part of an ID in the second line, starting from the position found in the first step.

Example 5. Matching groups and fields

 [ USEH memory disk ]
[ USEH min max min max ]
...
[ USE 10 20 30 40 ]

In Example 5, memory/min is the first data field (10), and disk/max is the fourth data field (40).

Many fields, but not all of them, are named so that it is clear what units they are reported in (e.g., cgMem_max_usage_bytes or oldGen(kB)).

Parsing GC Log Data

Please note that if you want to programmatically extract data from a Zing GC log, you cannot just grep for the line-type identifier and extract the N-th field.

The order and position of line fields can vary in different releases of Zing (e.g., a new field can be added to the middle of a log line).

To retrieve particular log data, check out GC log headers to find the position of a required field and retrieve data from a GC log line based on the field index in the particular log.

A script named gcLogScraper is provided along with the Zing installation (see Using Zing GC Log Scraper for details). The gcLogScraper script can be used to extract fields provided you know the fields names from a given log file.