Skip to main content

KEDA Patching Example

This document covers a high level patching example for the KEDA app patching. Please see Patching AKS Apps - Traefik for a lower level example with more information.

KEDA is deployed globally, rather than starting in sbox and working up through the environments, due to its limited use.

It is possible to patch only a specific environment by created a patch file within the KEDA namespace in Flux and applying this patch to the specific environment you want to modify.

Example: Flux PR example of creating a patch file and applying to a single environment/cluster.

KEDA is used to auto scale pods on a cluster. A couple of examples of this are:

  • Azure DevOps agent pods: we use a KEDA scaled job to monitor for new pipeline builds in Azure DevOps and create a pod on the PTL or PTLSBOX clusters to run the pipeline.
  • Jenkins webhook agent: we use a KEDA scaled job to poll Azure Service Bus for new messages when an event is triggered from a repo in GitHub. A pod is then scheduled to process the message and send it to Jenkins to kick off a build.

Patching

Review the KEDA Releases and the Helm chart releases pages to check for breaking changes before updating.

No Production

As shown previously it is possible to patch individual environments or multiple environments without patching KEDA in all environments even though it is deployed globally: example PR.

Using a patch file as shown in this sample PR you can patch one environment at a time until all non-production environments are completed.

Testing after the first environment is highly recommended before moving to any other environments.

ITHC or Demo are the best options to start with as both contain KEDA and also contain plum-recipe-receiver which is discussed in the checks section below.

Production

For production you should merge your patch file changes, the updates to KEDA, into the main keda.yaml file and remove the patch file you created for non-production.

You will also need to remove the patches to each non-production environment as the file will no longer exists: example PR.

SDS

Create a PR in sds-flux-config to patch KEDA in SDS: example PR.

CFT

Create a PR in cnp-flux-config to patch KEDA in CFT: example PR.

Post Patching Checks

All the checks in this section are now automated by keda-patching-checks AI agent skill that is available in the skills repository. Please read the README file of the skill before first usage.

Work through these checks in order: first confirm KEDA itself has upgraded and is healthy, then confirm the workloads that depend on it (ScaledJobs) are still reconciling.

1. Check the Helm release

KEDA is deployed into the keda namespace (the Helm release is named keda). Confirm the release has reconciled to the version you applied via Flux:

kubectl get hr keda -n keda

The STATUS column should show the upgrade succeeded with the new version, for example: Helm upgrade succeeded for release keda/keda.v2 with chart keda@2.20.0.

2. Check the KEDA pods

KEDA runs its own long-lived pods. The chart deploys three of them, so confirm they are all Running:

kubectl get pods -n keda -l app.kubernetes.io/part-of=keda-operator

You should see one pod for each of the following components:

  • keda-operator - watches ScaledObjects/ScaledJobs and performs the scaling
  • keda-operator-metrics-apiserver - serves external metrics to the Kubernetes metrics API
  • keda-admission-webhooks - validates KEDA resources

If a pod is not healthy, check its logs, for example:

kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f

3. Check the workloads that use KEDA

Once KEDA itself is healthy, confirm the ScaledJobs that depend on it are still reconciling.

Warning When checking a ScaledJob, the READY column is the health signal, not ACTIVE. READY=True means KEDA has validated the ScaledJob and it is ready to scale — that is a pass. The ACTIVE column only tells you whether KEDA is scaling right now: ACTIVE=True means there is queued work at that instant, and ACTIVE=False simply means the trigger is idle (no queued jobs/messages). ACTIVE=False is normal and is NOT a failure. Equally, not seeing brand-new pods or recent KEDAJobsCreated events at the moment you check is expected when the queue is idle. Only treat a ScaledJob as failed when READY is not True (or it is missing on a cluster where it is expected).

There are three workloads we use for this:

  • Recipe Receiver - deployed across most (but not all) clusters, so the best general check
  • Azure DevOps agents - Linux agents run widely in SDS; on CFT they are only on the integration service (intsvc) clusters. Windows agents are intsvc-only on both estates
  • Jenkins webhook relay - only on the integration service (intsvc) clusters

Recipe Receiver

Recipe Receiver is an app created for Platform Operations and deployed to both CFT and SDS AKS clusters but under different namespaces:

These are both deployed the same way and have the same setup, resources and testing steps.

It is not deployed to every cluster, so check the relevant environment before testing. The authoritative list is the set of environment overlays in the Flux app folders above (each <env>.yaml is a cluster it is deployed to). At time of writing this is:

Estate Namespace Environments where Recipe Receiver is deployed
CFT cnp sbox, ithc, perftest, demo, aat, prod
SDS toffee sbox, test, ithc, demo, stg, prod

Note: this is the standing deployment managed by Flux. The recipe-receiver app repo separately deploys ephemeral, per-PR resources, and only to CFT Preview and SDS Dev — so don’t rely on those clusters for post-patching checks.

If Recipe Receiver is missing from a cluster where it is expected (per the table above), treat that as a failure to investigate rather than skipping it — a missing ScaledJob can be a symptom of a broken KEDA upgrade (for example the KEDA CRDs failing to install/upgrade).

The resources in question that matter for this testing are Azure Service Bus Queues, each environment has a service bus with a queue that the recipe-receiver monitors for messages and will scale out when messages are added to the queue i.e. If KEDA is working correctly, the messages should be processed from the queue by the scaledJob of the recipe-receiver which creates pods to do the processing.

The service bus naming convention is <app>-servicebus-<environment> e.g. plum-servicebus-aat and the queue is called recipes.

Within the Recipe Receiver repository there is a Golang script that can be used to generate messages for a specific service bus/queue and then monitors those messages until they reach zero, this is perfect for testing the recipe-receiver app and KEDA.

There are examples of how to use this script within the repository itself but the general usage is:

go run messageGenerator/main.go -service-bus <app>-servicebus-<environment>.servicebus.windows.net -queue recipes -messages 50 -watch

Please note the use of go in the command, you will need Golang installed locally: instructions

This script is very reusable across environments simply by changing the app or environment name to suit whichever AKS cluster you are testing.

Steps to test KEDA updates:

  • Run the above script first before the update to make sure Recipe-Receiver is working first.
  • Once confirmed, make your Flux updates and raise a PR with the patch to a specific environment e.g. ITHC or Demo.
  • Have your PR reviewed and merge when approved.
  • When the KEDA release has been updated and new pods are running you can re-run the test again.
    • To check the KEDA release has updated you can run: kubectl get hr -n keda which will show the Helm Release and if updated correctly should show the new version you applied via Flux.
  • Run the script again and monitor the script output to see if the queue message count drops as expected.

Once you are happy that KEDA is working across this environment you can deploy the patch to another non-production environment until you completed all non-production environments. Remember to test each environment after the update.

Azure DevOps / Jenkins agents

These are the most heavily used ScaledJobs in the project: they run our self-hosted Azure DevOps agents and send webhook events to Jenkins to trigger builds. If they are not working they cause widespread problems for application teams and result in BAU tickets, so always verify them after patching.

These workloads are not deployed to every cluster KEDA runs on, so use the table below to know exactly where to run the checks. The Jenkins webhook relay only runs on the integration service (intsvc) clusters; the Azure DevOps agents run more widely in SDS:

Workload Namespace ScaledJob name(s) CFT clusters SDS clusters
Jenkins webhook relay jenkins jenkins-webhook-relay-function cft-ptl-00-aks, cft-ptlsbox-00-aks ss-ptl-00-aks, ss-ptlsbox-00-aks
Azure DevOps agents (Linux) azure-devops azure-devops-agent-function cft-ptl-00-aks, cft-ptlsbox-00-aks All SDS clusters except demo
Azure DevOps agents (Windows) azure-devops azure-devops-agent-windows-function cft-ptl-00-aks, cft-ptlsbox-00-aks ss-ptl-00-aks, ss-ptlsbox-00-aks

Run the checks below against relevant cluster. Set the cluster once with:

# pick the cluster you are checking, e.g. cft-ptl-00-aks, cft-ptlsbox-00-aks, ss-ptl-00-aks ...
CTX=cft-ptl-00-aks

First confirm the agents’ Helm releases are healthy (READY should be True):

kubectl --context "$CTX" get hr azure-devops-agent -n azure-devops
kubectl --context "$CTX" get hr jenkins-webhook-relay -n jenkins

List the ScaledJobs and confirm READY is True:

kubectl --context "$CTX" get scaledjob -n azure-devops
kubectl --context "$CTX" get scaledjob -n jenkins

Remember READY=True is the pass criterion here — ACTIVE=False just means the agent pool / relay queue is currently idle and is not a problem.

Describe a ScaledJob to confirm KEDA is creating jobs for it - look for events with the reason KEDAJobsCreated:

kubectl --context "$CTX" describe scaledjob azure-devops-agent-function -n azure-devops
kubectl --context "$CTX" describe scaledjob jenkins-webhook-relay-function -n jenkins

Recent KEDAJobsCreated events and fresh agent pods are good confirmation that scaling works, but their absence when the trigger is idle (ACTIVE=False) is expected and does not indicate a failure — a READY=True ScaledJob with recently completed jobs is healthy.

Finally, confirm new agent pods are being created. Sorting by creation time makes it easy to see fresh pods appearing after your patch (pods are named after the job, e.g. azure-devops-agent-function-<UNIQUE ID>):

kubectl --context "$CTX" get pods -n azure-devops --sort-by=.metadata.creationTimestamp
kubectl --context "$CTX" get pods -n jenkins --sort-by=.metadata.creationTimestamp

Some application teams have their own ScaledJobs, but if the Azure DevOps agent and Jenkins webhook relay jobs are working we can expect the others to be working too.

Keep an eye on #platops-help for any tickets that may come in from application teams using scaled jobs.

You can find info on how the jenkins webhook relay works here.

This page was last reviewed on 11 June 2026. It needs to be reviewed again on 11 June 2027 by the page owner platops-build-notices .
This page was set to be reviewed before 11 June 2027 by the page owner platops-build-notices. This might mean the content is out of date.