About Metrics Alerts

You can create alerts for your metric data by configuring thresholds against metric queries that when crossed, trigger the alert. You can also create notification destinations that determine where an alert is sent when that alert is triggered.

For example, let’s say you know your customer Packing Kings is onboarding new clients and you want to be warned if the request rate gets above 15 per second and a critical alert sent when the rate gets above 18 per second. You might configure the alert like this: Alert configuration

When an alert threshold is crossed, a notification is sent to the configured destination. Metric alert

An alert is also sent once the issue is resolved (or the metric drops below the threshold).Resolution alert

All alerts (for both metrics and Streams) are listed on the Alerts tab of the Alerts view. You can see the status of every alert and you can also delete alerts from here.Alerts tab

Once created, you can edit alerts and snooze them.

Build the Query

  1. From the navigation bar, click Alerts and click Create an alert.Create an alert

  2. In the Metric-based alert box in the dialog, click Create an alert. Monitor type dialog

  3. Enter a name for the alert. This should be descriptive enough that it will be identifiable in the list of alerts on the Alerts tab.Create metric alert

  4. Enter a description for the alert. The description is useful for adding more information about the alert, such as teams responsible or links to playbooks. Add alert description

  5. Build your query. The results of the query are what will help define the threshold for the alert.

    • Search for a metric to plot: Click into the search field. Lightstep displays all metrics that it’s currently ingesting. Choose one, or to search, start typing the name of a metric. Lightstep starts auto-completing to match available metrics.

      Metric search

      When you select a metric, Lightstep expands the query builder and begins to chart the metric.

    • Choose an operator for the data:

      • Latest: Graphs the latest value in a time series for a point in time.

      The latest operator can only be used with gauge-type metrics.

      • Count: Graphs the total number of counter increments as whole numbers. Counts are most useful for infrequent events and are best visualized as stacked bar charts.

      • Rate: Graphs the number of operations per second. Rates are most useful for ongoing operations and are best visualized as line charts.

        For example, this chart uses Count and shows there are a total of 364.474 requests at 11:45 am. Count used

        This chart has the same query but uses Rate, and shows there are 8.099 requests per second at 11:45 am. Rate used

    • Filter the data: By default, all data for the metric is displayed. You can filter the data using metric tags found on the data. You can include or exclude data with a given tag and value. You can add more than one tag to a filter where it makes sense (Lightstep prunes the available list as you add tags).

    Multiple selections use AND to join filters.

    Change Intelligence works best when it can focus on a single service and its dependencies. If your query includes metrics coming from many services, use filters to choose one service to focus on.

    Filter data with tags

    By default, Lightstep displays tags that it’s seen in the last three days. But you can type in a tag not in the dropdown and Lightstep will find it.

    • Group the data: By default, Lightstep aggregates the data from the metric into one line.Group all by default Instead, you can show lines for each available tag value (group by). Select a tag to display lines for each of the tag’s values. In this example, by choosing to group by the method tag, you can see the metrics for the individual tag values. Grouped by method

      Grouping isn’t available on big number charts.

    • Aggregation method: Choose how you want the data aggregated into the chart.

      • Count (non-null): The number of values found that are not null. For example, given the values of [10, 15, null, 50] the count is 3.
        • Count (non-zero): The number of values found that are not zero (null is counted). For example, given the values of [10, 15, null, 0 50] the count is 4.
        • Mean: The average (sum of the data divided by the count) of the data.
          For example, given the values of [10, 15, 50] the mean is 25.
        • Min: The lowest point in the data.
          For example, given the values of [10, 15, 50] the min is 10.
        • Max: The highest point in the data.
          For example, given the values of [10, 15, 50] the max is 50.
        • Sum: The total of all points in the data.
          For example, given the values of [10, 15, 50] the sum is 75.

    The chart on the page reflects the results of your query. In this example, the query results show request rates for the customer PackingKings grouped by method name, aggregated by the mean of all values. Metric alert query and results

You can add multiple queries to the alert and then join them using a formula.

Add Multiple Queries to an Alert

You can add more than one query to an alert and then join them with a formula.

For example, if you wanted to calculate the percentage of api/get-transactions calls that come from mobile (iOS and Android), you might create these queries and then apply a formula. Multiple queries for an alert

To add a metric query, click Plot another metric and build your query as you did the first one. Add a query

When you have multiple queries, you can edit the chart so only certain time series display. For example, in this chart, only the timeseries for metrics from iOS is displayed. Toggle time series

What happens when you delete a query?

You can delete a metric by clicking the X for that row. When you do, the remaining metrics retain their order (for example if you deleted b, the remaining metrics are a and c). If you then add another metric, it uses the order that was deleted. If you continue to add metrics, the order continues down the alphabet from the “highest” letter. b was deleted. Now a new metric uses b. In the above example, three metrics were originally plotted: a, b, and c. The user deleted b, so the next metric plotted used b. When adding another metric, the order continued to d.

expandable end

Now that you have more than one query, you need to join them with a formula to have a single set of data to base an alert on.

Add a Formula

You can perform arithmetic on a single time series or on multiple time series using Add a formula. For example you might enter a/(a+b) if you want to chart the percentage of the a metric to the sum of the a+b metrics.

Lightstep supports +, -, /, and *.

You can edit the chart so only the formula is shown. For example, in this chart, only the timeseries for the result of the formula is displayed.

Toggle time series

The toggle display doesn’t affect when the alert is triggered. Alerts are triggered only on the result of the formula

If you’re performing the arithmetic on multiple queries, they must all be grouped by the same tag.

Troubleshoot Your Query

If your chart doesn’t look as expected, it may be because of one of the following:

  • The No data found message displays when Lightstep can’t find a metric by that name. Ensure you are using the right name for the metric.

  • The No data found message also displays when you’re using the wrong time series operator for the metric type.

    The latest operator can only be used with gauge-type metrics.

  • If no data displays and there’s no No data found message, then Lightstep found the metric, but had no data to display

  • If adding a formula over multiple queries, they must all be grouped by the same tag.

Now that you can see the results of your query, you can set the threshold that when crossed, triggers an alert.

Configure the Alert

  1. In the Alert configuration section, set the threshold.

    • Single or separate alerts: If you’ve grouped your results, you can choose to send a single alert when any one of the group crosses the threshold during the evaluation window, or you can choose to send alerts each time one of the group crosses the threshold in that window. For example in this query, if set to single, you will be alerted once when one of the methods crosses the threshold during the two minute window. If another method also crosses within that two minutes, you won’t get another alert. If instead you set it to separate, you will be alerted each time one of the methods crosses the threshold within that 2 minute window.Single or separate notifications

    • Above or below the threshold: Choose whether the alert should be sent when the metric goes above or below the given threshold.
    • Evaluation window: Set the amount of time that the threshold should remain crossed before triggering an alert. You can aggregate that time period using the second dropdown. For example, if you set the evaluation window to two minutes, you can select one of the following to fine tune the alert:
      • Always: Always send the alert if the threshold is crossed every minute over the entire window.
      • At least once: Send the alert if the threshold is crossed at least once in the window.
      • In total: Send the alert if the total value is over the threshold during the window.
      • On average: Send the alert if the average value is over the threshold during the window. Evaluation window
    • Threshold: You can set either a Critical or Warning level threshold, or both. A warning is less severe than Critical. In this example, a warning is set if the metric crosses 15 and a critical alert is set at 18. When set, the chart redraws to show the two levels so you can immediately see if any metric is crossing the threshold.Critical and warning thresholds
    • Notify if no data is reporting for this query: Select this option if you want to be notified if Lightstep is not collecting any data for the query.No data reporting

If the chart contains multiple queries and a formula, you can toggle their display on and off. Toggle time series

The toggle selection is for visualization purposes and not persisted to the alert.

If you want a team or person to be notified of an alert outside of Lightstep, you can add a destination

Assign a Notification Destination

Lightstep can send notifications of an alert to PagerDuty, Slack, or use webhooks to send the alert to other third-part apps.

Destinations must already exist before you can assign them

  1. Expand the Notification Destination section.

  2. Choose a destination type (either Slack, PagerDuty, or Webhook), then begin typing to find the destination you want.

  3. Enter a time period that a renotification should be sent if the threshold is still crossed.

Alert destinations

You can add as many destinations as you want.

Be sure to click Save to save your configuration.

Snooze an Alert

You can snooze an alert when needed, for example if you know a team is working on a fix and don’t need to be further notified.

To snooze an alert:

  1. From the Alerts view, click the alert to open it in the editor.

  2. Click Snooze, choose the amount of time to snooze the alert for, and click Save. Snooze an alert The alert now displays in the Alert view as snoozed. When you hover over the snooze icon, a tooltip displays the time when the alert will reactivate. Snoozed alert

To un-snooze an alert:

You remove a snooze by returning to the editor using the Snooze button to choose Off. Remove a snooze

Delete an Alert

To delete an alert, from the Alerts view use the gear icon to choose Delete.

Take Action on an Alert

Once you have the alert open in the editor, you can use Lightstep’s Change Intelligence to determine what cased the change in performance. Change Intelligence links metric data with trace data to find components in your system whose performance changed at the same time as the metric change, allowing you to find the root cause without leaving Lightstep.

Change Intelligence

If you’ve made any edits to the alert, you need to save those changes before using Change Intelligence!