Often, incident response begins with an alert sent to the on-call team. You can create alerts for your metric or span data by configuring thresholds against queries that when crossed, trigger the alert.

When an alert is triggered, a message is sent to the configured destination, like a Slack channel or PagerDuty. The message includes a link to the alert.Alert in a Slack channel

You create an alert by defining a notification destination and a threshold that determines when the alert will trigger.

For this step, let’s create an alert that will trigger whenever the error percentage rate on the android service goes above 5%. Let’s tell Lightstep Observability to send that alert to the on-call team’s Slack channel every 10 minutes until it’s resolved.

  1. We’ll start by creating a Slack notification destination. In Lightstep Observability, the Destinations tab of the Alerts view shows all current notification destinations that alerts can be sent to. Destination page Click New Message Destination to create a destination for the on-call Slack channel.Create a new Slack destination

  2. In the dialog, you use the dropdown to search for the channel you want to post the alerts to. In this case, we’ll search for #on-call. Slack dialog When you click Save, the new destination now appears in the list. New destination added to the list

Now that we have a destination to send the alert to, we can create the alert.

  1. From the navigation bar, click Alerts and click Create an alert.Create an alert

  2. In the Name field, enter Android Service Error Rate and for the description, enter Alerts when the error rate for the Android service goes above 5%.Add alert description

    Descriptions can be written in basic markdown.

  3. Now let’s build the query.
    Enter the following in the query builder:

    • In the first line, search for the service that equals android.
    • For Aggregation, select Error% over a 2 minute period.

    This will search for span data froom the Android service and plot the error rate over a two minute window. Android error rate query

  4. We’ll define the threshold to send the alert the Alert Configuration section. Set the Critical threshold to above and 5%.Alert configuration

  5. To send the alert to Slack, in the Notification rules section, enter Slack for the destination type, and #on-call to send the notification to the #on-call Slack channel. Add a notification frequency of 2 minutes. Set the destination

That’s it! Now we wait for the threshold to be crossed and the alert to be sent.
Sure enough - it happened again! Slack alert The on-call team can click on the alert link to see what’s going on.

Looks like it might be a 429 error code coming from the get-store-data operation on the store-server service.

In the next step, we’ll see how we can make it easy for the team to begin remediation by adding links from Lightstep Observability to other tools the team uses.

What did we learn?

  • You can create alerts on metric and span data. Lightstep Microsatellites continuously send 100% of your telemetry data and the Saas looks for instances where defined thresholds are crossed.
  • You create alerts by defining a destination for the alert, a threshold that should trigger the alert, and rules for when the alert should be sent.