Often, incident response begins with an alert sent to the on-call team. You create alerts in Lightstep that activate when a set threshold on a Stream is crossed. Thresholds can be set for an error percentile, latency, or operations per second.

More about Streams

When you create a Stream that’s based on a query or based on a specific operation, Lightstep receives data from your Satellites and stores statistics and example traces related to the Stream to ensure you always have data from 0 to p99.9, including outliers. The Stream view displays statistical time series data and example traces and stores them for as long as your Data Retention policy allows.  

Learn more about Streams.

Expandable

When an alert is triggered, a message is sent to the configured destination, like a Slack channel or PagerDuty. The message includes a link into the Stream that triggered the alert and links to example traces.Alert in a Slack channel

You create an alert by defining a destination, a condition that determines when the alert will trigger, and a rule that tells Lightstep when and how to send an alert.

For this step, let’s create an alert that will trigger whenever the error percentage rate on the android service goes above 5%. Let’s tell Lightstep to send that alert to the on-call team’s Slack channel every 10 minutes until it’s resolved.

  1. We’ll start by creating a Slack destination. In Lightstep, the Destinations tab of the Monitoring view shows all current destinations that alerts can be sent to. Destination page Click New Message Destination to create a destination for the on-call Slack channel.Create a new Slack destination

  2. In the dialog, you use the dropdown to search for the channel you want to post the alerts to. In this case, we’ll search for #on-call. Slack dialog When you click Save, the new destination now appears in the list. New destination added to the list

  3. Now that we have a destination to send the alert to, we can create the condition and rule. Conditions are set on a Stream, so we need to open the Stream that will use the condition. We have a Stream that monitors the android service, so we’ll use that. Stream list

  4. When we open the Stream view, we can see that there have been errors. Good thing we’re creating an alert! Stream view

  5. You click the Create Conditions button to create the condition and rule for the alert. Create Conditions button is highlighted

  6. We’ll define the threshold to send the alert in this dialog. Choose Error Percentage for the Signal, set the Threshold to be above 5%, and the Evaluation Window to be 5 minutes, meaning that the alert won’t be sent until the condition lasts for 5 minutes. Create Condition dialog

  7. Now we’ll create the rules that determine where to send the alerts and how often. Click the Add Alerting Rule and enter Slack for the Integration, the #on-call channel as the Destination, and set the Interval to be 10m, meaning the alert will be sent every 10 minutes until resolved.Create a rule Once you click Create, the condition appears on the Stream.Created alert A dotted grey line on the Stream shows the threshold and the name of the alert that will trigger when the threshold is crossed.

That’s it! Now we wait for the threshold to be crossed and the alert to be sent.
Sure enough - it happened again! Slack alert The on-call team can click one of the example traces to see what’s going on. Trace View Looks like it might be a 429 error code coming from the get-store-data operation on the store-server service.

In the next step, we’ll see how we can make it easy for the team to begin remediation by adding links from Lightstep to other tools the team uses.


What Did We Learn?

  • You create alerts on Streams. Lightstep Satellites continuously monitor 100% of your telemetry data, looking for instances where defined thresholds are crossed.
  • You create alerts by defining a destination for the alert, a threshold that should trigger the alert, and rules for when the alert should be sent.