Skip to main content

XDR Agent Installation - Crime

We have developed a method of installing the XDR agent to Crime VMs. This differs from the HMCTS install method which utilised the VM Extension capability on Azure VMs and is delivered via Terraform.

For Crime, we are utilising Ansible as the delivery method and have developed an Ansible Role which will perform the installation.

To trigger the Ansible, we have created two pipelines in the main Crime Jenkins instance which can be scope limited by Ansible’s Limit functionality. This enables us to follow a rollout plan through environments and safely stage our delivery.

When the deploy pipeline is run, this calls an Ansible playbook that includes a Role we have developed which handles the installation.

This document details the procedure to do this.

Ansible Inventory/Variables

Before running the pipelines, two variables need declaring in either Ansible group_vars or host_vars. These are:

  • cortex_env
  • xdr_tags

cortex_env is a string which needs set to either ‘nonprod’ or ‘prod’. MoJ SoC provided a separate installation package for nonprod & prod, this defines which installation package is used.

xdr_tags is a comma-separated string of tags to provide to the Cortex Agent.

Example:

xdr_tags: "hmcts,server,idam"
cortex_env: "nonprod"

An MS Team channel exists called “HMCTS - Tagging Catch Up” with the MoJ SoC team as members. Please reach out to MoJ SoC if unsure of tags to use.

Where to set these?

These need setting in the automation.ansible repository. The vars live at /sp-ansible/group_vars & /sp-ansible/host_vars. Information on Ansible variables is available in the official documentation.

Its possible the level at which you wish to set these variables doesn’t have a pre-existing vars file. e.g. there may not be a host_vars/X file for your host or a group_vars/X file for the group level. If you find this, create the necessary vars file.

Other Role variables

Other variables are defaulted within the Role and do not need setting in automation.ansible repo inventory. The only exception to this is ‘sa_key’. This has been set on the ‘all’ group and is already usable by all hosts.

Deploying the XDR agents

Pipelines

The pipelines in Jenkins for deployment are:

  • xdr-ansible-inventory
  • xdr-ansible-deploy

xdr-ansible-deploy is the pipeline which does the actual installation. xdr-ansible-inventory queries Azure and brings back JSON data used to populate the parameter options for xdr-ansible-deploy.

xdr-ansible-inventory is set on a schedule to run each morning and update its backend JSON data. This schedule should be enough to ensure groups and host parameter options are up-to-date with Azure.

Caveat: If new VMs are deployed during the day and the rollout plan after this, the new VMs will not be included in the parameter option list. This isn't likely as new VMs are rarely deployed. If they are the inventory pipeline can be re-run ad-hoc without issue to get the latest data. Make sure this is only run when xdr-ansible-deploy is NOT running!

Using the Pipelines

Run the xdr-ansible-inventory pipeline first. There are no build parameters to this, simply click ‘Build now’.


Run the xdr-ansible-deploy pipeline, click ‘Build with Parameters’. Only run this when xdr-ansible-inventory pipeline has completed.

The parameters are reactive and will update based on the Subscription & Limit dropdown selection.

Select the subscription which contains the target VMs. Then select the chosen limit scope. When the Limit selection is chosen, the corresponding Group, ResourceGroup or Host dropdowns will populate.

Alternatively, choose Limit type of Custom Limit and provide a custom limit. Be careful with this and formulate your ansible limit correctly. Information on Ansible limiting is available in the official documentation.

Click ‘Build’ and the pipeline will run with the chosen Limit. Failure is possible, please review any error output thoroughly. This is likely to be caused by authentication/sudo escalation issues which will need resolving between PlatOps/the application team.

How it works

Pipeline Design

xdr-ansible-inventory saves JSON data locally on the master Jenkins server. It saves the files in:

/var/lib/jenkins/data/azure-inventory/

xdr-ansible-deploy reads one of the JSON files (one per subscription for nonlive & live) based on the selection of the Subscription dropdown. The Group, ResourceGroup & Host dropdown boxes are Jenkins Active Choices Reactive Parameter type from the Active Choices Jenkins plugin.

‘Reactive’ refers to its capability to dynamically update based on another input selection. ‘Active’ refers to the capability to programmatically return choices based on the output of a groovy script. All these parameters read the appropriate JSON file and applies filtering to populate itself correctly.

Jenkins

Crime have multiple Jenkins instances, some are application specific (such as build-idam) and these live as deployments in AKS. The main PlatOps Jenkins is deployed as a VM (active & standby pairing) & separate instances exist for non-live & live:

  • build[.]mdv[.]cpp[.]nonlive
  • build[.]mdv[.]cpp[.]live

The Jenkinsfile pipeline script of xdr-ansible-deploy is set to run on the Master Jenkins server and not a jenkins slave. This is by design.

The Role

All installation code is contained within the Role. The playbook xdr-ansible-deploy executes only calls the Role. The Role code is separated into its own repository

The Role makes use of the same Shared Access Signature token used in the HMCTS XDR install process for HMCTS. azcopy is installed and used to retrieve the installation package & config from cftptlintsvc storage account under the xdr-collectors blob.

The SAS token itself is stored in Vault and Ansible’s Vault lookup plugin is used to fetch this value.

The Role has the ability to update tag information in-place. Simply update xdr_tags variable and re-run the xdr-ansible-deploy pipeline.

Customization of Existing Components

Jenkins Library ansibleDeploy.groovy

Crime Jenkins uses a shared library like HMCTS. The main component from the Library is ansibleDeployX.groovy. This is located in the vars directory of automation.jenkins-groovy-libraries. This executes the ansible-playbook command and does pre-requisites to ensure it works.

There are multiple versions of ansibleDeploy, we have created a customized version called ansibleDeployV2xdr.groovy for our use. Currently this is in the xdr-ansible-deploy branch which is used by the Jenkinsfile pipeline script of xdr-ansible-deploy. This should be merged to master in the future.

automation.ansible Azure Inventory Script

Ansible in automation.ansible repo utilises a inventory script which dynamically queries Azure and fetches existing VMs. It does this on a subscription level and groups the VMs in the following way:

  • azure
  • location
  • resource_group
  • security_group
  • tag key_value

The most useful groupings for us are likely to be resource_group & tag key_value.

tag key_value groups by Azure tags. Each Azure tag key & value form its own group in the syntax:

key_value

e.g:
[environment_dev]
Host1
Host2
Host3

In Crime, VMs are tagged with useful info such as tier, stack, project, platform & application. The dynamic grouping from these tags, along with the resource group grouping can then be used in Ansible Limiting as the most effective way to target subsets of VMs when delivering the rollout strategy.

The inventory script is located in automation.ansible in /sp-ansible/inventory/azure_rm.py and config params are provided by /sp-ansible/inventory/azure_rm.ini. The inventory script looks for an .ini file with the same name within the same directory by default.

We have created a customized version of azure_rm.py called azure_rm_standalone.py & an associated .ini file, azure_rm_standalone.ini. This version has been altered to remove a dependency on a non-standard python library utilised for checking azure_compute minimum version for API compatability. In favour, standard library functionality is used. This allows the script to execute standalone outside of the ansible-playbook command without needing to mess with dependencies in the Jenkins environment.

This standalone version of the inventory script is used by xdr-ansible-inventory to create the backend JSON data that populates xdr-ansible-deploy parameters. This exists in the xdr-ansible-deploy branch of the automation.ansible repo.

automation.jenkins-dsl-jobs & The Mother Seed!

automation.jenkins-dsl-jobs contains pipeline Jenkinsfiles for Crime Jenkins pipelines. These are split across different directories in logical fashion.

Within the repo exists a file: /mother_seed_job.groovy

The purpose of the mother seed is to build the pipelines in Jenkins. While the Jenkinsfiles exist in their directory structure, they don’t do anything till they’re brought into a pipeline definition. This is what the mother seed does.

Currently the two xdr pipelines are defined through the UI in-place & NOT via the mother seed. Its likely these will need integrating into the mother seed in the future.

Verify Installation

Verify installation in the Cortex XSIAM Portal. Certain PlatOps members have access (Rees, Chirag, Jordan H). This can also be confirmed by MOJ SoC Engineers in the MS Teams chat.

Verification can also be made on the VM itself.

rpm -qa | grep -i cortex-agent

To check the agent is running on the VM:

systemctl status traps_pmd.service

Uninstall

If required, uninstall the XDR agent by doing the following:

Verify agent is installed

rpm -qa | grep -i cortex-agent

If installed, run the rpm erase command

rpm -e cortex-agent

If this fails to work, specify the full version (this will have been output from the first command).

For example

rpm —e cortex-agent-8.5.0.125392-1.x86_64

This will automatically remove the VM from the XSIAM Cortex portal

Detailed documentation.

Developing the Role

We have created a test environment repo which mimics the python & ansible setup used in Crime’s main PlatOps Jenkins instance.

The README details how to setup a local machine environment for development that is compatible with the Jenkins environment. It uses Vagrant & the Virtualbox driver to build a local VM and automate Ansible against it.

Troubleshooting

Put troubleshooting info here as needed

This page was last reviewed on 19 November 2024. It needs to be reviewed again on 19 November 2025 by the page owner platops-build-notices .
This page was set to be reviewed before 19 November 2025 by the page owner platops-build-notices. This might mean the content is out of date.