Let’s use the Trace view to verify that the write-cache operation on the inventory service is the main source of latency in the critical path, and that it contains the large_batch=true attribute.

Use the Trace view to get a look at a single request

The Trace view shows you the path that a single request took through your system, including the time each operation spent in the critical path. You can then drill down on a single operation and view meta-data, including attributes, logs of events, and other metadata, to validate the root cause.

Going back to the Compare Operations table on the RCA page, you use the More ( ⋮ ) icon to view a trace.

When you choose a trace from here, you can be assured that the trace you’ll view will show a request with the latency issue, as the table is built using only data from the Service Diagram, where we saw the large latency halo.

Clicking one of these examples opens the Trace view.

Lightstep Observability automatically detects the critical path in the request using a black line. Looking at the map at the top of the page, you can immediately see that one span contributes to most of the latency in the critical path.

Learn more about Trace View

You use the Trace view to see a full trace from beginning to end of a request. The Trace view shows you a flame graph of the full trace (each service a different color), and below that, each span is shown in a hierarchy, allowing you to see the parent-child relationship of all the spans in the trace. Errors are shown in red.

Clicking a span shows details in the right panel, based on the span’s metadata. Whenever you view a trace in Lightstep Observability, it’s persisted for the length of your Data Retention policy so it can be bookmarked, shared, and reviewed by your team at any time. The selected span is part of the URL, so it will remain selected when the URL is visited.

You can learn more about the Trace view here.

Expandable end

Clicking on that span expands the detailed trace below and automatically selects the span you clicked.

Sure enough, it’s the write-cache operation!

To the right, Lightstep Observability shows you information about the span in a panel. As we suspected, this span has the attribute large_batch with a true value, validating what we saw when comparing attributes.

Let’s see if we can get more info.

View span metadata

The panel contains three tabs, each with information specific to this operation’s span. We’ve already seen the attributes on the span - let’s look at more data. Clicking the Details tab shows you everything else Lightstep Observability knows about this span. Right away, you can see what the trace visualized for us. The write-cache operation contributed to almost 90% of the request’s latency.

That’s great! Even though you didn’t know the inventory service was deployed (or even what that service does), you validated that it’s having trouble writing large batches to the cache and you have the data to prove it!

Once you find an issue, you can share it with the responsible party using [sharable urls] and the Slack integration.

What did we learn?

  • You can go directly from performing root cause analysis on the RCA page to hypothesis validation in the Trace view, with 100% confidence that the trace you’ll be viewing contains the issue you’ve found.
  • The Trace view automatically shows you the critical path in a request, all the way through your system. You can immediately see where the bottleneck is.
  • The information panel displays all the information collected from every span in the request, allowing you to verify your hypothesis with data.