Let’s use the Trace view to verify that the write-cache operation on the inventory service is the main source of latency in the critical path, and that it contains the large_batch=true tag.

Use the Trace View to Get a Look at a Single Request

The Trace view shows you the path that a single request took through your system, including the time each operation spent in the critical path. You can then drill down on a single operation and view meta-data, including tags, logs, and other attributes, to validate the root cause.

Going back to the Compare Operations table on the RCA page, you use the More ( ⋮ ) icon to view a trace.

When you choose a trace from here, you can be assured that the trace you’ll view will show a request with the latency issue, as the table is built using only data from the Service Diagram, where we saw the large latency halo.

Clicking one of these examples opens the Trace view.

Lightstep automatically detects the critical path in the request using a black line. Looking at the map at the top of the page, you can immediately see that one span contributes to most of the latency in the critical path.

Learn more about Trace View

You use the Trace view to see a full trace from beginning to end of a request. The Trace view shows you a flame graph of the full trace (each service a different color), and below that, each span is shown in a hierarchy, allowing you to see the parent-child relationship of all the spans in the trace. Errors are shown in red.

Clicking a span shows details in the right panel, based on the span’s metadata. Whenever you view a trace in Lightstep, it’s persisted for the length of your Data Retention policy so it can be bookmarked, shared, and reviewed by your team at any time. The selected span is part of the URL, so it will remain selected when the URL is visited.

You can learn more about the Trace view here.

Expandable end

Clicking on that span expands the detailed trace below and automatically selects the span you clicked.

Sure enough, it’s the write-cache operation!

To the right, Lightstep shows you information about the span in a panel. As we suspected, this span has the tag large_batch with a true value, validating what we saw when comparing tags.

Let’s see if we can get more info.

View Span Metadata

The panel contains three tabs, each with information specific to this operation’s span. We’ve already seen the tags on the span - let’s look at more data. Clicking the Details tab shows you everything else Lightstep knows about this span. Right away, you can see what the trace visualized for us. The write-cache operation contributed to over 90% of the request’s latency.

That’s great! Even though you didn’t know the inventory service was deployed (or even what that service does), you validated that it’s having trouble writing large batches to the cache and you have the data to prove it!

Once you find an issue, you can share it with the responsible party using [sharable urls] and the Slack integration.


What Did We Learn?

  • You can go directly from performing root cause analysis on the RCA page to hypothesis validation in the Trace view, with 100% confidence that the trace you’ll be viewing contains the issue you’ve found.
  • Lightstep automatically shows you the critical path in a request, all the way through your system. You can immediately see where the bottleneck is.
  • The information panel displays all the information collected from every span in the request, allowing you to verify your hypothesis with data.