Query real-time span data

We will be introducing new workflows to replace Explorer, and as a result, it will soon no longer be supported. Instead, use notebooks for your investigation where you can run ad-hoc queries, view data over a longer time period, and run Cloud Observability’s correlation feature. In your notebooks and dashboards, you can use the traces list panel to view a list of spans from a query and dependency maps to view a service diagram.

When you hear that a particular service is slow, or there’s a spike in errors, or a particular customer is having an issue, your first action is to see real-time data regarding the issue. Cloud Observability’s Explorer view allows you to query all span data from the past hour to see what’s going on.

Explorer consists of four main components. Each of these components provides a different view into the data returned by your query:

  • Query: Allows you to query on all span data past hour. Every query you make is saved as a Snapshot, meaning you can revisit the query made at this point in time, any time in the future, and see the data just as it was. Results from the query are shown in the Latency Histogram, the Trace Analysis table, and the Service diagram.

  • Latency histogram: Shows the distribution of spans over latency periods. Spans are shown in latency buckets represented by the blue lines. Longer blue lines mean there are more spans in that latency bucket. Lines towards the left have lower latency and towards the right, higher latency.

  • Trace Analysis table: Shows data for spans matching your query. By default, you can see the service, operation, latency, and start time for each span. Click a span to view it in the context of the trace.

  • Correlations panel: For latency, shows services, operations, and attributes that are correlated with the latency you are seeing in your results. That is, they appear more often in spans that are experiencing higher latency. For errors, shows attributes that are more often on spans that have errors. The panel can show both positive and negative correlations, allowing you to drill down into attributes that may be contributing to latency, as well as ruling out ones that are not. Find out more about Correlations here.

  • Service diagram: Depicts the service based on your query in relation to all its dependencies and possible code paths, both upstream and downstream. Areas of latency (yellow) and errors (red) are shown. Learn more about the Service diagram here.

Using these tools, it’s possible to quickly form and validate hypotheses around the current issue.

Query data to create a snapshot

You can query the span data on any combination of a service, an operation, and any number of span attributes. Every time you run a query, the results are saved as a Snapshot so you can go back to data at that point in time and analyze it in the Explorer view.

Run a query

You run your query from the top of Explorer. You can use the Query Builder to ensure valid syntax, or you can enter the query manually.

When you run your query, Cloud Observability queries data from the past hour and returns all span data that matches your query.

To run a query using the Query Builder:

Click into the search bar to open the Query Builder. You can build queries for services, operations, and attributes. Use IN or NOT IN to build the query. When you click into the Service or Operation field, Cloud Observability displays valid values.

When you add multiple values to the Operation field, spans that match either value (OR operation) are returned.

To add attributes, click the Add an attribute filter button. You can add multiple attributes.

Attributes are added to the query as an AND operation, meaning only spans that match all attributes are returned.

To run a query manually:

Click into the Search bar and start typing your query. If you need help building your query, checkout the Query Language Cheat Sheet. Reset your query by clicking the X icon on the right.

Values must be an exact match - capitalization matters.

Supported keys

KeyValueExample

service

Service’s name

service IN ("iOS")
Returns spans from the iOS service

service NOT IN ("android")
Returns spans from every service but android
 

operation

Operation's name

operation IN ("/api/get-profile")
Returns spans from the /api/get-profile operation
 

"attribute_name"

Custom attribute’s name, in quotes.
For example "customer" or "aws-region"
 

"aws-region" IN ("east")
Returns spans where the aws-region attribute value is east

"lightstep.span_id"
"lightstep.trace_id"
"lightstep.tracer_id"

Cloud Observability generated attributes.

The ID for a span, trace, or tracer. Valid values are hex strings including only the following characters: 0123456789abcdef

"lightstep.span_id" IN ("ad5490bcd")
Returns that specific span

"lightstep.trace_id" NOT IN ("cebd0875ab")
Returns spans in traces other than the cebd0875ab trace

"lightstep.tracer_id" NOT IN ("cebd0875ab")
Returns spans that were produced from tracers other than the cebd0875ab tracer

Querying multiple keys and values

Use the following syntax rules to build complex queries:

  • Use a comma to query multiple values for a key. Multiple values are treated as OR operations.
    Example:
    service IN (“iOS”,“android”)
    Returns spans that are from either the iOS or android service

    “customer” NOT IN (“smith”,”jones”)
    Returns spans that do not have smith or jones as the value for the customer attribute.
     

  • Only one set of values per key are allowed.
    Example:
    Valid:
    service IN (“iOS”, “android”)

    Not valid:
    service IN (“iOS”, “android”) AND service IN (“web”)
     

  • Use AND operations to build queries with multiple key/value sets (the OR operation is not supported).
    Example:
    service IN (“iOS”,“android”) AND “aws_region” IN (“us_east”)
    Returns spans that are in either the iOS or android service and are in the us_east AWS region.

    "lightstep.trace_id" NOT IN (“edcba347fe”, “abc7584def”) AND “error” IN (“true”)
    Returns spans that are not in traces with the ID edcba347fe or abc7584def and have the value true for the error attribute.

Full example

service IN (“iOS”, “android) AND operation IN (“auth”, “transaction-db”) AND “aws-region” NOT IN (“us-west”) AND "lightstep.span_id" IN (“edcba347fe”, “abc7584def”)

View snapshots

Cloud Observability saves every query you make as a Snapshot. Snapshots provide a view into saved data that you can share with other Cloud Observability users. When you share a Snapshot, the recipient can work with the data in the same way that you did. Snapshots are automatically created for you, and the data is saved for as long as your data retention policy allows. Snapshots are perfect for Slack messages, emails, post-mortem docs, and anywhere you need a definitive historical view of your span data.

To view your Snapshots:

  1. Click the gray dropdown that displays Today and a timestamp (this is the timestamp of your latest snapshot).

    Your Snapshots are listed in reverse chronological order.
  2. Select the Snapshot to view. Explorer rerenders using the data from the Snapshot.

Share a snapshot

You can share a Snapshot with another Cloud Observability user using a URL. When the user clicks the link, the same query is run using the data from the Snapshot (instead of live data).

To share a snapshot:

  1. Click Share.

    The URL is copied to your clipboard.
  2. Paste the URL wherever you want someone to access the data.

Share a snapshot in Slack

When you integrate Cloud Observability with Slack, you can share a preview of the query histogram in any channel of your workspace. Simply paste the URL from the the Share button into a Slack channel. Other Slack members can see the histogram and Cloud Observability users can click View Explorer to jump right to that page.

Add a query to a notebook

You can add an Explorer query to a notebook for when, during an investigation, you want to be able to run ad hoc queries, take notes, and save your analysis for use in postmortems or runbooks. Notebooks let you view logs, metrics, and traces from different places in Cloud Observability together, in one place. While Explorer queries show you the last hour of data, notebooks allow you to view the data in your retention window.

To add to a notebook, click Add to notebook and search to choose an existing notebook or create a new notebook.

Add query to notebook

When you add to a notebook, a panel is created using the same query. You can see the latency for multiple percentiles and view exemplar traces. The annotation is a link back to the original, so you can quickly return to the origin of your investigation.

Explorer query in a notebook

Learn more about notebooks.

Save your query and monitor the data going forward

When you want to monitor the data from a query going forward, instead of coming back to Explorer and running the query, you can create a Stream. When you create a Stream, Cloud Observability looks at the data from the Microsatellites every minute and persists example traces from different buckets of distribution for that query to ensure you always have data from 0 to p99.9, including outliers.

Learn more about Streams here.

View latency histogram

Once you run a query, the Latency Histogram is generated by 100% of the span data collected from the Microsatellites that match the query. The bottom of the histogram (X axis) shows latency time, and each blue line represents the number of spans that fall into that latency time (Y axis).

For example, in the histogram below, you can see that around 1k spans fall into the 4.42s-4.77s time range.

View percentile markers

By default, a marker shows the 95th percentile. You can change that using the Show percentile dropdown.

Compare with historical data

You can compare the current histogram with histograms from the past by choosing 1 hour1 day or 1 week prior. An overlay of that prior data displays on top of the histogram. This overlay represents data from the same time window exactly 1 hour, 1 day, or 1 week ago. If selecting a historical time period results in “No matches found,” then there were no spans matching the query during the historical window in time.

If your query returns results that differ significantly from the past, the overlay displays automatically, alerting you to a potential issue.

Now that you have an overview of latency for spans, you can get more detailed information from the Trace Analysis table.

Analyze span data

The Trace Analysis table shows information from a sample of spans used to create the histogram. By default, the table shows the service; the operation reporting the span, the span’s duration, and the start time. You can add other columns as needed.

Cloud Observability analyzes 100% of the data used to create the histogram and then selects span data that represents all ranges of performance or is otherwise statistically interesting, ensuring outliers and other anomalies are well represented.

When you click on a span, you’re taken to the Trace view where you can see the span in the context of the full trace.

Add more columns to the Trace Analysis table

You can add columns that show the span’s attribute data to the right of the table by clicking the + icon. As you start typing, Cloud Observability finds attributes matching your search.

Filter data

You can filter the data in the Trace Analysis table in several ways.

Filter by latency

You can filter to see a certain range of latencies by clicking and dragging the area to filter to in the histogram.

Use the Show percentile dropdown to mark where a certain percentile starts.

The Trace Analysis table refreshes to show spans only in the selected percentile range.

Filter by service, operation, or attribute

You can also filter to show only specific services, operations or attributes.

Group results

You can group your results to see interesting aggregate data about your spans. You can group by service, operation, or attribute.

When you group your results, Cloud Observability organizes the table by those groups.

Click on a group to see the spans belonging to that group. Cloud Observability shows you the average latency, the error percenattributee, and the number of spans in that group.

You can also filter and group from the Correlations panel.

Add all spans from the trace to the table

By default, Cloud Observability only shows spans that match your query. For example, if you queried on a service, you’ll only see spans from that service. But often, you’ll want to see spans that participated in the same traces as the spans returned by your query. If you’re trying to reach a hypothesis, you may want a more broad view of what is going on in a trace, without having to open a bunch of traces yourself.

To see all spans that took part in a trace with your query results, select Show all Spans in Traces.

The table refreshes to show spans that participate in the same traces that your results spans do. Use the group by or filter buttons to then filter data and still include spans from outside your initial query.

Example: Use explorer to validate a hypothesis

The Explorer view is useful for validating (or invalidating) hypotheses about a system regression. Let’s say you’re an on-call engineer and need to investigate high error rates in api-server. You suspect a downstream service caused the errors, but it is not clear which service is causing the issue or why.

Here’s how you can use Explorer to debug the issue:

  1. Run a query for service: api-server.
  2. Check Show all Spans in Traces so you can see spans outside of the api-server service.
  3. Group by service so you can quickly spot the service with issues.
  4. Sort by Error % to find the services with a high error rate.

    This will show you the upstream and downstream services of the api-server, sorted by error rate.

    Notice that the auth-service (called by api-server) has a high error rate. You’re aware that there was a recent canary release for this service, so you dig deeper by doing the following:

  5. Filter by service: auth-service.
  6. Group by canary.

Notice that canary=true has an 86% error rate (illustrating that the canary release for the auth-service is experiencing errors). You can then click on that canary=true row to see example traces and debug further.

Validating hypotheses is even easier when you can see your services in a diagram that shows service relationships and performance. Read View Service Hierarchy and Performance for more information about the Service diagram, which does just that!

See also

Find correlated areas of latency and errors

Retain queries with Streams

Set Slack preview preferences

Updated Apr 26, 2023