Visit Support

Monitoring Optimizer Hub

You can monitor your Optimizer Hub using the standard Kubernetes monitoring tools (Prometheus and Grafana) and through log files.

Using Prometheus and Grafana

Optimizer Hub components are already configured to expose key metrics for scraping by Prometheus.

In your production systems, you will likely want to use your existing Prometheus and Grafana instances to monitor Optimizer Hub. If you are just evaluating Optimizer Hub, you may want to install a separate instance of Prometheus and Grafana to just monitor your test instance of Optimizer Hub.

Monitoring Optimizer Hub assumes you have a Prometheus and Grafana available, or install one within your Kubernetes cluster.

Grafana Dashboard

You can find a Grafana configuration file cnc_dashboard.json in

Grafana Dashboard Overview

This dashboard expects the following labels to be attached to all application metrics:

  • cluster_id: The identifier of the Kubernetes cluster on which Optimizer Hub is installed. This allows you to switch between Optimizer Hub instances in different clusters.

  • kubernetes_namespace: The Kubernetes namespace on which Optimizer Hub is installed. This setting allows you to switch between Optimizer Hub instances in different namespaces of the same cluster.

  • kubernetes_pod_name: The Kubernetes pod name.

  • app: The value of the app label on the pod, which is provided by the labelmap action from the example Prometheus configuration mentioned below.

You need to manually edit the dashboard file if these labels are named differently in your environment.

The dashboard also relies on some infrastructure metrics from Kubernetes and cAdvisor, such as kube_pod_container_resource_requests and container_cpu_usage_seconds_total.

Prometheus Configuration Instructions

Optimizer Hub components expose their metrics on HTTP endpoints in a format compatible with Prometheus. Annotations are in place in the Helm chart with the details of the endpoint for every component. For example:

annotations: "true" "8080" "/q/metrics"

The following snippet is an example for the Prometheus configuration to scrape the metrics based on the above annotations:

# Example scrape config for pods # # The relabeling allows the actual pod scrape endpoint to be configured via the # following annotations: # # * ``: Only scrape pods that have a value of `true` # * ``: If the metrics path is not `/metrics` override this. # * ``: Scrape the pod on the indicated port instead of the # pod's declared ports (default is a port-free target if none are declared). - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ # mapping of labels, this handles the `app` label - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name metric_relabel_configs: - source_labels: - namespace action: replace regex: (.+) target_label: kubernetes_namespace

Retrieving Optimizer Hub Logs

All Optimizer Hub components, including third-party ones, log some information to stdout. These logs are very important for diagnosing problems.

You can extract individual logs with the following command:

kubectl -n my-opthub logs {pod}

However by default Kubernetes keeps only the last 10 MB of logs for every container, which means that in a cluster under load the important diagnostic information can be quickly overwritten by subsequent logs.

You should configure log aggregation from all Optimizer Hub components, so that logs are moved to some persistent storage and then extracted when some issue needs to be analyzed. You can use any log aggregation One suggested way is to use Loki. You can query the Loki logs using the logcli tool.

Here are some common commands you can run to retrieve logs:

  • Find out host and port where Loki is listening

    export LOKI_ADDR=http://{ip-adress}:{port}
  • Get logs of all pods in the selected namespace

    logcli query --since 24h --forward --limit=10000 '{namespace="zvm-dev-3606"}'
  • Get logs of a single application in the selected namespace

    logcli query --since 24h --forward --limit=10000 '{namespace="zvm-dev-3606" app="compile-broker"}'
  • Get logs of a single pod in the selected namespace

    logcli query --since 24h --forward --limit=10000 '{namespace="zvm-dev-3606",pod="compile-broker-5fd956f44f-d5hb2"}'

Extracting Compilation Artifacts

Optimizer Hub uploads compiler engine logs to the blob storage. By default, only logs from failed compilations are uploaded.

You can retrieve the logs from your blob storage, which uses the directory structure <compilationId>/<artifactName>. The <compilationId> starts with the VM-Id which you can find in connected-compiler-%p.log:

# Log command-line option -Xlog:concomp=info:file=connected-compiler-%p.log::filesize=500M:filecount=20 # Example: [0.647s][info ][concomp] [ConnectedCompiler] received new VM-Id: 4f762530-8389-4ae9-b64a-69b1adacccf2

Note About gw-proxy Metrics

The gw-proxy component in Optimizer Hub uses, by default, /stats/prometheus as target HTTP endpoint to provide metrics. Most other Optimizer Hub components use /q/metrics. If you make manual changes in the configuration of the metrics for individual Kubernetes Deployments in the Optimizer Hub installation, make sure that you don’t use the /q/metrics for the gw-proxy deployment. Doing so would lead to confusion when metrics are processed.