Lightstep’s Unified Query Builder allows you to query both your metric and span data using one tool. You use the same builder across notebooks, dashboards, and alerts to create charts that visualize your data. The builder allows you to quickly and easily ask questions of your data and see the results in one place.
Query metric data
-
Search for a metric to plot:
Click into the search field and begin typing the metric name.You can select Metric from the All telemetry dropdown to narrow your search to only metric data.
Change Intelligence works best when it can focus on a single service and its dependencies. To narrow your search to metrics emitted from a specific service, click Filter to service and select the service name from the dropdown.
When you select a metric, Lightstep expands the query builder and begins to chart the metric.
-
Filter the data:
Unless you’ve already filtered by a service, all data for the metric is displayed. You can filter the data using metric attributes found on the data.Enter an attribute key, select an operator or enter a regular expression, and enter one or more values. Multiple values are joined by
OR
.You can add more than one filter to the query where it makes sense (the query builder prunes the available list as you add attributes).
Multiple filters use
AND
to join filters.By default, Lightstep Observability displays attributes that it’s seen in the last three days. But you can type in an attribute not in the dropdown and Lightstep will find it.
-
Align the data by choosing an aggregation method. Alignment is the process by which time series data points are aggregated temporally to produce regular, periodic outputs. By aligning your data before plotting it, you can tell a clearer story and more easily see anomalies. You can also set an explicit rolling input window that determines the data points that will be used when computing the aggregation.
If you use the latest
aggregation, no input window is needed.
Whenever you make a query, Lightstep determines the number of output points to display that will make the chart detailed without making it feel crowded or hard to read. This optimal distance between points is called the output period. Lightstep adjusts the output period depending on the the amount of time being displayed on the chart. If you set the time picker to one week, the output period is 2 hours (meaning that the query produces a series of points which are all two hours apart). If you change the time picker to one hour, the output period is 30 seconds.
For example, say you are querying the requests
metric and the data is comming from two sources. Using the delta
aggregation, Lightstep combines the data from those sources to create individual time series points and then plots those on the chart, based on the output period. If you queried over the last 60 minutes, the output period (the space between each point on the graph) is 30 seconds.
The rolling input window is the duration of time that the data is pulled from before aggregating it. By default for charts on dashboards and notebooks, the input window is the same as the output period. That is, if the output period is 30 seconds, the data is aggregated over the last 30 seconds to produce the data point.
If your chart appears very noisy, or if you’re building a chart for an alert, it can be helpful to smooth the output data by using data from a wider rolling input window. For example, if you are querying on the requests
metric and choose to aggregate using rate
with a 10 minute input window, Lightstep computes the number of requests per second in 10 minute rolling time windows.
When querying for alerts, you must specifically set an rolling input window. Increasing the input window means increasing the time that it will take before a underlying change in the data will trigger an alert threshold. When it’s important to be notified immediatly, use smaller input windows.
Choose one of the following temporal aggregation methods:
-
Delta: Computes the total number of increments in the input window as whole numbers. Deltas are most useful for infrequent events and are best visualized as stacked bar charts.
-
Rate: Computes the number of operations per second in the input window. Rates are most useful for ongoing operations and are best visualized as line charts
Gauge metrics also allow the following aggregations:
-
Latest: Computes the latest value in the input window. When using
latest
, the input window is the same as the output period. -
Mean: Computes the average value of the time series in the input window.
-
Max: Computes the maximum value of the time series in the input window.
-
Min: Computes the minimum value of the time series in the input window.
The query builder automatically configures the aggregation for distribution type metrics. If the distribution is a gauge, the aggregation is set to latest
. If it is a delta or cumulative, the aggregation is set to rate
by default, but you can change it to delta
.
-
Set the rolling input window (Unless using
latest
aggregation. Alerts require an input window):
By default for charts in notebooks and dashboards, the input window is the same as the output window (determined by the time period for the chart). By setting an explict input window, you can smooth out your data to avoid noise. -
Compute percentiles (distribution type metrics only):
When the metric data is a distribution type (a set of values for each point in time), Lightstep Observability can compute percentiles for you. Enter the value in the Percentile field (no%
needed).For existing Lightstep customers interested in tracking distribution metrics, please opt-in here. For new customers to Lightstep, this feature is already enabled in your account.
When using distributions in an alert, you must select only one percentile to alert on.
-
Group the data:
By default, Lightstep Observability spatially aggregates the data from the metric into one line.Instead, you can show lines for each available attribute value (group by). Select an attribute to display lines for each of the attribute’s values.
In this example, by choosing to group by the
host
attribute, you can see the metrics for the individual hosts.Grouping isn’t available on big number charts.
-
Choose how you want the data spatially aggregated into the chart.
- count of non-null values: The number of values found that are not null. For example, given the values of [10, 15, null, 50] the count is 3.
- count of non-zero values: The number of values found that are not zero (null is counted). For example, given the values of [10, 15, null, 0 50] the count is 4.
- maximum value: The highest point in the data.
For example, given the values of [10, 15, 50] the max is 50. - mean of all values: The average (sum of the data divided by the count) of the data.
For example, given the values of [10, 15, 50] the mean is 25. - minimum value: The lowest point in the data.
For example, given the values of [10, 15, 50] the min is 10. -
sum of all values: The total of all points in the data.
For example, given the values of [10, 15, 50] the sum is 75.Distribution type metrics are automatically summed and then aggregated into percentiles.
- Click Save to save your chart.
When you hover over a point on the chart, you can see its value, along with the group-by value.
Clicking on a point allows you to start Change Intelligence to determine what cased the change in performance.
Below the chart, a table displays the data for each line in the chart.
Query span data
-
In the first field, select operation or service, or start typing another attribute key name.
You can select Spans with from the All telemetry dropdown to narrow your search to only span data.
By default, the query builder displays attribute keys that it’s seen in the last three days. But you can type in an attribute not in the dropdown and Lightstep will find it.
Select an operator or enter a regular expression (regex), and enter one or more values. Multiple values are joined by
OR
.When using regex (especially a wildcard
.*
), your query may match more results than can be returned. When this happens, try updating the regex so it matches a smaller range of values.
Also note that if the query is saved as a Stream, as new values that match the regex become available, the cardinality may become too high for the Stream to record and save the data. For example, if a query filters byhosts=.*
and a huge number of hosts are added, the Stream may stop saving the data. The UI shows a warning when this type of “cardinality explosion” happens. -
Add an optional filter.
You can further refine your query by adding adding a service, operation, or attribute key and value(s) to a filter where it makes sense (Lightstep prunes the available list as you add filters).
Multiple filters use
AND
to join filters. -
Choose an SLI for the chart (latency percentiles, error rate, operation rate, or count of spans)
-
Click Specify a time aggregation to optionally set an explicit rolling input window that aggregates the data points to be computed when displaying the chart.
Alerts require a specific input window to determine how far back to look for alert violations.
Whenever you make a query, Lightstep determines the number of output points to display that will make the chart detailed without making it feel crowded or hard to read. This optimal distance between points is called the output period. Lightstep adjusts the output period depending on the the amount of time being displayed on the chart. If you set the time picker to one week, the output period is 2 hours (meaning that the query produces a series of points which are all two hours apart). If you change the time picker to one hour, the output period is 30 seconds.
The rolling input window is the duration of time that the data is pulled from. By default, the input window is the same as the output period. That is, if the output period is 30 seconds, the data is aggregated over the last 30 seconds to produce the data point.
Alerts require that you specifically set an input window.
If your chart appears very noisy, it can be helpful to smooth the output data by using data from a wider input window.
-
Optionally group the results (not supported for alerts).
The builder aggregates the data from the span’s performance into one line. You can show lines for each available attribute value (group by). Select an attribute to display lines for each of the attribute’s values. In this example, by choosing to group by the
customer
attribute, you can see the percentiles for the individual customers.You must be using Microsatellites to add a group-by.
-
For latency charts, the 50th, 95th, 99th, and 99.9th percentiles are added by default. You can delete any you don’t want and add others by typing the value in the field (no
%
needed). For alerts, you can have only one value to alert on. -
Click Save to save your chart.
The result of the query displays in a chart below the query builder.
By default, the chart shows lines for each series (group-by), and dots for sampled spans. Triangles represent spans that have errors.
Spans are sampled intelligently to bias towards high latency and errors.
Use the Show span samples toggle to turn these off.
With span samples displayed, when you hover over a point, you can see the value at the point along with the group-by value.
Clicking the point takes you to its full trace, where you can view the full request path that the span participated in. In this case, the error is coming from the GET
operation on the store-server
service.
Below the chart, a table displays details about the data
The first tab shows details for each line in the chart.
With sample spans displayed, the Value column shows the latest value for that series.
The second tab allows you to view details about the exemplar spans shown in the chart, giving you an easier way to find the traces you’re interested in. Click on a row to open the span in the full trace view.
You can sort by any column in the tables.
Query Lightstep-specific attributes
Lightstep has attributes you can use to query for specfic spans or traces.
-
Search for a specific span ID
lightstep.span_id
Example: Return span with the ID
1a2b902a0ff1a9e3
You can find the span ID in the Trace view
-
Search for spans from a specific trace
lightstep.trace_id
Example: Return spans that are not included in a trace with the ID
bd1285b6af0acd8d
You can find the trace ID in the Trace view
-
Search for spans sent by a specific tracer
lightstep.tracer_id
Example: Return spans from the tracer with the ID
cebd0875ab
-
Search for spans that have no parent
lightstep.is_root_span
:
Add a final time aggregation
Once you’ve filtered and grouped your data, or added a formula, you may find it necessary to include a final time aggregation. The final aggregation takes all the values into account over the specified time period, and further smoothes the data.
Choose the aggregation operation (min, max, or mean) and then set the rolling input window. The final input window must be larger than the input windows set on individual queries.
Another time you may want to use the final aggregation is for data that may cause flappy alert. For example, say you set an alert to be sent when the rate of requests is over 2,300, and you set the initial input window to two minutes (because you want to smooth out super short spikes). There may be cases where during a two minute period, it does cross the threshold but then goes below it immediately after, multiple times, leading to your alert notifications flapping. If you set the final aggregation window to 10 minutes, the alert will still trigger within 2 minutes and will remain open for at least 10 minutes.The alert remains open until there has been a 10 minute period where the metric has consistently been under the threshold.
Add multiple queries to a chart
You can more than one query to a chart. For example, you might want to show the request rate for iOS and Android on one chart.
For alerts, if you add more than one query, you must join them with a formula.
To add a query, click Plot another metric or Plot another span and build your query as you did the first one.
When you have multiple queries, you can edit the chart so only certain time series display. For example, in this chart, only the timeseries for metrics from the iOS service is displayed.
Once you save the chart, this display toggle is persisted to the chart in the dashboard.
You can delete a query by clicking the X for that row. When you do, the remaining queries retain their order (for example if you deleted b
, the remaining queries are a
and c
). If you then add another query, it uses the order that was deleted. If you continue to add queries, the order continues down the alphabet from the “highest” letter.
In the above example, three queries were originally plotted:
a
, b
, and c
. The user deleted b
, so the next query plotted used b
. When adding another query, the order continued to d
.
If you want to use a big number metric chart with more than one metric, you need to combine them using a formula (big number charts can only display a single value).
Compare current data to past data
The Unified Query Builder lets you visualize changes in your data over time. With the Compare to past option, you can create charts comparing current data to data from the last minutes, hours, days, or weeks. Use this option to monitor and track system-health trends over time.
Follow these steps to compare data over time:
- In Lightstep’s Unified Query Builder click Compare to past.
- Next to Compare, enter the time range you want to compare your current data to. For example, data from the last 30 minutes or 1 week.
- View your data in the chart.
Lightstep plots two series in the chart:
- a is the solid line, showing recent data.
- b is the dotted line, showing data from the past.
You can also add formulas to do arithmetic on the two series. For example, the formula (abs((a-b)/b*100)) in the image below calculates the absolute percentage change between a and b. The chart visualizes the query results, showing all three series.
If you’re creating alerts, select the Percentage change template to alert on changes in your data over time. Visit Create alerts for more information.
Add a formula to the query
For span queries, you must be using Microsatellites to add a formula.
You can perform arithmetic on a single time series or on multiple time series using Add a formula. For example you might enter a/(a+b)
if you want to chart the percentage of the a
metric to the sum of the a+b
metrics.
Lightstep Observability supports +
, -
, /
, and *
.
You must use *
for multiplication. Implicit multiplication (for example, ab
) is not allowed.
When using a metric that is a distribution type in a formula, you must select only one percentile.
If you’re performing the arithmetic on multiple queries, they must all be grouped by the same attribute.
You can edit the chart so only the formula is shown. For example, in this chart, only the timeseries for the result of the formula is displayed.
The toggle display doesn’t affect when the alert is triggered. Alerts are triggered only on the result of the formula
Once you save the chart, this display toggle is persisted.
Considerations for alert queries
When creating queries for alerts, keep the following in mind:
- You must set an input window (unless using
latest
for aggregation). - When querying distribution metrics or latency on spans, you can select only one percentage to query on.
- If your query includes multiple sub-queries, you must use a formula to join them and create one output.
- Group-by isn’t supported in alerts.
- To include Regex in metric queries, you must be running this release or later of the Microsatellites.
- Consider adding a final time aggregation to prevent “noisy” alerts.
Troubleshoot query results
If your chart doesn’t look as expected, it may be because of one of the following:
-
The No data found message displays when Lightstep Observability can’t find a metric or span attribute key (service, operation, or attribute) by that name. Ensure you are using the right name in the query.
-
The No data found message also displays if you’re using the wrong time series operator for the metric kind.
The
latest
operator can only be used with gauge metrics. -
If no data displays and there’s no No data found message, then Lightstep Observability found the metric or span attribute key, but had no data to display
-
When adding a formula over multiple queries, they must all be grouped by the same attribute.
Visualize your data
Once you’ve completed your query, depending on the type of data, you can choose from a number of different visualizations.
Data retention
Data for metric queries is retained for 13 months.
For notebooks, span data is retained for the length of your retention window. For a chart on a dashboard, you can choose to retain the data for longer by creating it as a Stream. Data for alerts is always saved as a Stream.
Learn more about data retention.