Availability Tests in App Insights
Availability tests are an AppInsights feature that can be used to measure the uptime of a service and trigger automatic alerts when downtime is detected. This document covers how to set up an availability test, how to set up automatic alerts for that availability test and how to assign an existing action group to an availability test. This document assumes you already have an AppInsights Azure Resource already configured and working.
Making an Availability Test
In the Azure Portal, navigate to your AppInsights resource. Then, in the left panel, navigate to Investigate > Availability
. Here you can see all currently configured availability tests for the resource as well as the resource’s availability over time.
To add an availability test, click the ‘Add Classic Test’ button near the top-left of the center panel. In the pop-up on the right, you will need to provide a name for your test and a URL for it to ping.
‘Name’ should read as the service or application being monitored by the test. Name this appropriately so the team immediately knows what part of the service is down when an alert is triggered by the test. Test names have to be unique across the project, so Azure will automatically prepend this name with an auto-generated test ID. This is normal.
‘URL’ should point to the health page of the application or service. This must be publicly-accessible or the test will fail.
‘Enable retries for availability test failures’ will ping the URL again after 20 seconds if the test fails. While this option is enabled, Azure will only record a failure if the test fails three times in a row. This can reduce the number of false positives generated by the test at the cost of potentially missing brief periods of downtime. If in doubt, leave this box checked.
Adjust the test frequency to your liking and the success criteria to your needs. If you want an alert to be raised when this test fails, leave the ‘Alerts’ slider on ‘Enabled’. Configuring alerts will be covered in the next step, so for now, just click the ‘Create’ button.
Configuring Alerting
Now that you have a test configured (and hopefully running green) we can configure an alert for it to monitor for when it fails. In the Availability
tab, find your test in the list, click the ...
icon to open a drop-down context, then click the ‘Open Rules (Alerts) page’ option. Here you can see all the alert rules for the test you have configured. These rules define when an alert should be triggered. If you enabled alerts when you created the availability test, you should already have an alert rule set up.
Clicking on the alert here will open up a detailed view where you can examine the availability test it’s monitoring and the condition that will cause the alert to be raised. By default, this should be something to the effect of “Whenever the average failed locations is greater than or equal to X”.
Here we can see the action groups assigned to this alert. When the alert is triggered, it will trigger all the actions in these action groups. In this case, the alert is assigned to only the ‘servicenow production’ group. This is the standard action group for all availability tests in prod and is configured to send an E-Mail to everyone in the Reform DevOps group and raise an incident in ServiceNow.
Scrolling to the bottom, we can edit the description and severity of the alert. If this alert is warning about potential downtime on production environments, ‘Severity’ should be set to ‘Critical’ so it is raised as a P0 incident in SNOW.
Configuring Action Groups
Opening up the action group, we can see two groups: Notifications and Actions. Notification are in effect, limited to sending an email to a group. Functionality does exist to send SMS messages and automated phone calls, however neither of these two options are in use by the project (automated phone calls for on-call members are handled through SNOW instead).
Actions, on the other hand are more versatile and can invoke anything from Azure functions to scure webhooks. Raising incidents in SNOW is handled through an ITSM action here. Multiple notifications and actions can be configured per action group and are covered more in depth in the Microsoft action groups documentation