Detecting instances of sub-optimal performance is only useful if you can be notified when those problems occur. You can create thresholds on Streams that when crossed, send alerts to everyone that needs to know about it. Thresholds can be set for an error percentile, latency, or operations per second, on any operation within your distributed system.

Lightstep continually monitors the data in the Satellites and checks that data against the thresholds you configured. Alert messages indicate what threshold has been crossed and include sample traces indicative of the problem. This is incredibly useful during fire-fighting scenarios where up-to-date trace data can quickly identify the source of an issue.

Lightstep integrates with Slack and PagerDuty to send alerts and also supports other tools and services through customizable webhooks.

You create alerts using conditions, rules, and destinations. Conditions describe the threshold. Rules determine what to do when that threshold is crossed. Destinations are the places where the alert is sent.

For example, the BEEMO: ios-client Stream has one condition for when the latency for the 99.9th percentile is over 7.5s. The alerting rule states that when that happens, an alert is sent and updated every two minutes. The alert is sent to the #team-ios Slack channel destination.

Alert payloads include links to the project and stream and also direct links to traces that violate the condition for quick resolution. Click on a trace link and go to the Explorer view for real-time exploration. Additionally, you might want to check the Service Directory for latency or error rate changes in the last hour on that service.

Once the condition is below the threshold, Lightstep sends an alert saying that the condition is resolved.

You create alerts by doing the following:

  • Create destinations: Destinations determine how to send the alert and to whom. Lightstep has built-in support for Slack and PagerDuty. You can also create destinations using webhooks to integrate with other alerting services. OpsGenie offers a plugin for Lightstep.
  • Create conditions: Conditions on Streams provide the parameters for triggering an alert.
  • Create an alerting rule: You create an alert by combining a condition with a destination, along with an update interval to determine how long a condition should remain true before sending a subsequent alert.

You can have more than one rule per condition (that is, the alert can be sent to more than one destination) and you can have more than one condition per Stream. You should create your destinations before you create conditions.