Skip to main content

Incident Bot

We currently have a slack bot in place that people can use to report major incidents and open slack channels so everyone can huddle and work together to fix them.

The incident bot is based on monzo-response but it has since been forked into our own version so we can update its dependencies.

The components of the incident bot are:

  • slack app
  • postgresql database
  • kubernetes pods, service and ingress

The slack app is called Incident Response.

The app has been configured according to the instructions set out in the monzo-response repo.

The main functions of the app are the slash command, the interactivity and the event subscriptions.

To use the bot, someone simply runs /incident in their local slack desktop app and they will be presented with a form to complete.

More info on this can be found on confluence.

The slash command is passed to a response URL which has been exposed externally via app-proxy.

This URL points to a kubernetes ingress resource on the cft-ptl AKS cluster which forwards to a service which forwards to a set of pods.

Every slack app has an oauth token and a signing secret. These are used by Slack to generate a request signature. When a request is made to the app via a slash command, the python application running in the kubernetes pod generates a signature using the same secrets and compares them.

If the two signatures match, the request is allowed and the process continues. This is the authentication mechanism that ensures no third party can use the response API.

When the user completes and submits the form, it is processed and a new slack channel called #inc-abcd-wxyz-1234 is created.

People can join the channel to huddle and troubleshoot the issue.

Certain events will trigger the app to take action, such as a comment being pinned. This is the event subscription at work.

All of the information is stored in a postgres database.

A frontend service also exists to allow you to view the stored information in a user friendly way.

Here is an example incident you can review to see what it looks like.

Response infra

Response frontend

Response backend

Response fork

This page was last reviewed on 26 January 2024. It needs to be reviewed again on 26 January 2025 by the page owner platops-build-notices .
This page was set to be reviewed before 26 January 2025 by the page owner platops-build-notices. This might mean the content is out of date.