Detecting instances of sub-optimal performance is only useful if you can be notified when those problems occur. You can create thresholds on Streams that when crossed, send alerts to everyone that needs to know about it. Thresholds can be set for an error percentile, latency, or operations per second, on any operation within your distributed system.
Lightstep continually monitors the data sent from the Microsatellites and checks that data against the thresholds you configured. Alert messages indicate what threshold has been crossed and include sample traces indicative of the problem. This is incredibly useful during fire-fighting scenarios where up-to-date trace data can quickly identify the source of an issue.
Lightstep integrates with Slack and PagerDuty to send alerts and also supports other tools and services through customizable webhooks.
You create alerts by setting thresholds against error percentile, latency, or operations per second, on any Stream and then adding a notification destination that routes the alert to the right channel.
For example, the Android Stream has an alert set for when the error percentage rate is above 5% for longer than 5 minutes. The alert is sent to the #team-android Slack channel every 5 minutes until it’s resolved.
Alert payloads include links to the project and stream and also direct links to traces that violate the alert for quick resolution. Click on a trace link and go to the Explorer view for real-time exploration. Additionally, you might want to check the Service Directory for latency or error rate changes in the last hour on that service.
Once the system is below the threshold, Lightstep sends an alert saying that the alert is resolved.
You create alerts by doing the following:
- Create notification destinations: Notification destinations determine how to send the alert and to whom. Lightstep has built-in support for Slack and PagerDuty. You can also create destinations using webhooks to integrate with other alerting services. OpsGenie offers a plugin for Lightstep.
- Create thresholds: Thresholds provide the parameters for triggering an alert.
- Add a notification destination to the alert: You create an alert by combining the threshold with one or more notification destinations, along with an update interval to determine how long a threshold should remain violated before sending a subsequent alert.