Prometheus/Grafana Patching Example
This document covers a high level patching example for Prometheus/Grafana.
Prometheus/Grafana patches should be tested in sbox to avoid downtime.
It is important to understand what changes are in the version upgrade, especially if there are any breaking changes.
Usually, there will be a renovate pull request in the flux repo that will contain release notes that show you the breaking changes.
Updating - start with sandbox
In order to allow patching of sbox only, a new directory was created for the updated crd URLs.
apps/admin/kube-prometheus-stack-crds-upgrade-v56/kustomization.yaml
&
apps/admin/kube-prometheus-stack-crds-upgrade-v56/kustomize.yaml
See example PR containing these files
This enables us to target specific environments at this new crd version.
We can do this by pointing the desired cluster at this new directory via a patch in the base kustomization file.
Eg: clusters/sbox/base/kustomization.yaml
With the new crds version now available, a version selector block can be added to the sbox 00 & 01 config
Example:
chart:
spec:
chart: kube-prometheus-stack
# Update kube-prometheus-stack-crds/kustomization.yaml when updating this
version: 56.6.2
sourceRef:
kind: HelmRepository
name: prometheus
namespace: monitoring
Update the cluster kustomization file pointing to the new crd directory created earlier.
Raise a pull request to upgrade the version.
See Example PR
There are checks that take place when you raise a PR to validate the kustomization is valid.
These can be found in the tests
folder
Review the pipeline checks for errors. If there are no errors and the PR has been approved, merge the PR.
Checks to see if upgrade worked correctly
Check the pods have come back up:
kubectl get pods -n monitoring | grep kube-prometheus-stack
The uptime should be fairly recent, i.e., the pods should have been redeployed in the last few minutes.
Check pods have the correct chart version:
kubectl describe pod {pod-name} -n monitoring | grep helm.sh/chart=kube-prometheus-stack
Also, you can check on the Helm release to see if it has got correct version in the log.
kubectl get hr -n monitoring
Review pods for any new errors:
kubectl logs {pod-name} -n monitoring -f
Prometheus does have a UI which should be checked, i.e. Grafana.
Ensure sds-grafana.sandbox.platform.hmcts.net/ and grafana.sandbox.platform.hmcts.net/ is accessible on SDS.
Check grafana.sandbox.platform.hmcts.net/ on CFT.
You could also delete the prometheus HRs to make sure they come back up.
The example commands below are for both CFT/SDS using prometheus:
kubectl get hr -n monitoring
kubectl delete hr kube-prometheus-stack -n monitoring
kubectl delete hr kube-prometheus-stack -n monitoring
Ensure to monitor the status for the HRs and pods to ensure they come back successfully.
Prod environments
For Prod, there will be a renovate PR that can be merged in order to update:
Once the renovate PR has been merged, remove the previous patches you did for sbox: