Monitoring Optimizer Hub
You can monitor your Optimizer Hub using the standard Kubernetes monitoring tools (Prometheus and Grafana) and through log files.
Using Prometheus and Grafana
Optimizer Hub components are already configured to expose key metrics for scraping by Prometheus.
In your production systems, you likely want to use your existing Prometheus and Grafana instances to monitor Optimizer Hub. If you are just evaluating Optimizer Hub, you may want to install a separate instance of Prometheus and Grafana to just monitor your test instance of Optimizer Hub.
Note
|
Monitoring Optimizer Hub assumes you have a Prometheus and Grafana available, or install one within your Kubernetes cluster. |
Grafana Dashboard
You can find a Grafana configuration file cnc_dashboard.json
in opthub-install-1.9.3.zip.
This dashboard expects the following labels to be attached to all application metrics:
-
cluster_id
: The identifier of the Kubernetes cluster on which Optimizer Hub is installed. This allows you to switch between Optimizer Hub instances in different clusters. -
kubernetes_namespace
: The Kubernetes namespace on which Optimizer Hub is installed. This setting allows you to switch between Optimizer Hub instances in different namespaces of the same cluster. -
kubernetes_pod_name
: The Kubernetes pod name. -
app
: The value of theapp
label on the pod, which is provided by thelabelmap
action from the example Prometheus configuration mentioned below.
You need to manually edit the dashboard file if these labels are named differently in your environment.
The dashboard also relies on some infrastructure metrics from Kubernetes and cAdvisor, such as kube_pod_container_resource_requests
and container_cpu_usage_seconds_total
.
Prometheus Configuration Instructions
Optimizer Hub components expose their metrics on HTTP endpoints in a format compatible with Prometheus. Annotations are in place in the Helm chart with the details of the endpoint for every component. For example:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/q/metrics"
The following snippet is an example for the Prometheus configuration to scrape the metrics based on the above annotations:
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# mapping of labels, this handles the `app` label
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
metric_relabel_configs:
- source_labels:
- namespace
action: replace
regex: (.+)
target_label: kubernetes_namespace
Retrieving Optimizer Hub Logs
All Optimizer Hub components, including third-party ones, log some information to stdout
. These logs are very important for diagnosing problems.
You can extract individual logs with the following command:
kubectl -n my-opthub logs {pod}
However by default Kubernetes keeps only the last 10 MB of logs for every container, which means that in a cluster under load the important diagnostic information can be quickly overwritten by subsequent logs.
You should configure log aggregation from all Optimizer Hub components, so that logs are moved to some persistent storage and then extracted when some issue needs to be analyzed. You can use any log aggregation One suggested way is to use Loki. You can query the Loki logs using the logcli tool.
Here are some common commands you can run to retrieve logs:
-
Find out host and port where Loki is listening
export LOKI_ADDR=http://{ip-adress}:{port} -
Get logs of all pods in the selected namespace
logcli query --since 24h --forward --limit=10000 '{namespace="zvm-dev-3606"}' -
Get logs of a single application in the selected namespace
logcli query --since 24h --forward --limit=10000 '{namespace="zvm-dev-3606" app="compile-broker"}' -
Get logs of a single pod in the selected namespace
logcli query --since 24h --forward --limit=10000 '{namespace="zvm-dev-3606",pod="compile-broker-5fd956f44f-d5hb2"}'
Extracting Compilation Artifacts
Optimizer Hub uploads compiler engine logs to the blob storage. By default, only logs from failed compilations are uploaded.
You can retrieve the logs from your blob storage, which uses the directory structure <compilationId>/<artifactName>
. The <compilationId>
starts with the VM-Id
which you can find in connected-compiler-%p.log
:
# Log command-line option
-Xlog:concomp=info:file=connected-compiler-%p.log::filesize=500M:filecount=20
# Example:
[0.647s][info ][concomp] [ConnectedCompiler] received new VM-Id: 4f762530-8389-4ae9-b64a-69b1adacccf2
Note About gw-proxy
Metrics
The gw-proxy
component in Optimizer Hub uses, by default, /stats/prometheus
as target HTTP endpoint to provide metrics. Most other Optimizer Hub components use /q/metrics
. If you make manual changes in the configuration of the metrics for individual Kubernetes Deployments in the Optimizer Hub installation, make sure that you don’t use the /q/metrics
for the gw-proxy
deployment. Doing so would lead to confusion when metrics are processed.