Skip to main content

Prometheus/Grafana Patching Example

This document covers a high level patching example for Prometheus/Grafana.

Prometheus/Grafana patches should be tested in sbox to avoid downtime.

It is important to understand what changes are in the version upgrade, especially if there are any breaking changes.

Usually, there will be a renovate pull request in the flux repo that will contain release notes that show you the breaking changes.

Updating - start with sandbox

In order to allow patching of sbox only, a new directory was created for the updated crd URLs.

apps/admin/kube-prometheus-stack-crds-upgrade-v56/kustomization.yaml

&

apps/admin/kube-prometheus-stack-crds-upgrade-v56/kustomize.yaml

See example PR containing these files

This enables us to target specific environments at this new crd version.

We can do this by pointing the desired cluster at this new directory via a patch in the base kustomization file.

Eg: clusters/sbox/base/kustomization.yaml

With the new crds version now available, a version selector block can be added to the sbox 00 & 01 config

Example:

chart:
spec:
  chart: kube-prometheus-stack
  # Update kube-prometheus-stack-crds/kustomization.yaml when updating this
  version: 56.6.2
  sourceRef:
    kind: HelmRepository
    name: prometheus
    namespace: monitoring

Update the cluster kustomization file pointing to the new crd directory created earlier.

Raise a pull request to upgrade the version.

See Example PR

There are checks that take place when you raise a PR to validate the kustomization is valid.

These can be found in the tests folder

Review the pipeline checks for errors. If there are no errors and the PR has been approved, merge the PR.

Checks to see if upgrade worked correctly

Check the pods have come back up:

kubectl get pods -n monitoring | grep kube-prometheus-stack

The uptime should be fairly recent, i.e., the pods should have been redeployed in the last few minutes.

Check pods have the correct chart version:

kubectl describe pod {pod-name} -n monitoring | grep helm.sh/chart=kube-prometheus-stack

Also, you can check on the Helm release to see if it has got correct version in the log.

kubectl get hr -n monitoring

Review pods for any new errors: kubectl logs {pod-name} -n monitoring -f

Prometheus does have a UI which should be checked, i.e. Grafana.

Ensure sds-grafana.sandbox.platform.hmcts.net/ and grafana.sandbox.platform.hmcts.net/ is accessible on SDS.

Check grafana.sandbox.platform.hmcts.net/ on CFT.

You could also delete the prometheus HRs to make sure they come back up.

The example commands below are for both CFT/SDS using prometheus:

kubectl get hr -n monitoring
kubectl delete hr kube-prometheus-stack -n monitoring
kubectl delete hr kube-prometheus-stack -n monitoring

Ensure to monitor the status for the HRs and pods to ensure they come back successfully.

Prod environments

For Prod, there will be a renovate PR that can be merged in order to update:

Once the renovate PR has been merged, remove the previous patches you did for sbox:

This page was last reviewed on 19 February 2024. It needs to be reviewed again on 19 February 2025 by the page owner platops-build-notices .
This page was set to be reviewed before 19 February 2025 by the page owner platops-build-notices. This might mean the content is out of date.