Let’s see if other tools on the page can verify the errors and where they’re coming from.

See Error Messages with Logs Analysis

Below the Operations diagram is the Log Event Analysis. A graph shows you the rate of log event messages, filtered by default to show only events from spans with errors. The table below shows the log messages, sorted by the number of logs in the regression.

These are logs that your tracing instrumentation has appended to the spans. If you used tracing libraries that do not support event appending, you won’t see Log Event Analysis.

It looks like the largest number of events are payload messages, which isn’t all that helpful.

Since we know that the errors where introduced with the deployment of version v10.8.157, let’s filter the data to only show log messages from spans with that attribute. In the Attributess with Errors table, clicking on that row further filters the log table.

You can now see that the error message error.message: Too many requests for store ID was definitely in the v10.8.157 deploy, and it’s from the get-store-data operation. Ta da!

Now let’s view an individual trace for confirmation.

View a Trace

The Trace Analysis table below the Logs Analysis shows span information from the filtered data set.

Clicking one of the spans opens it in the Trace View.

The Trace view shows you the path that a single request took through your system. You can then drill down on a single operation and view its metadata, including attributes, log events, and other attributes, to validate the root cause.

Learn more about Trace View

You use the Trace view to see a full trace from beginning to end of a request. The Trace view shows you a flame graph of the full trace (each service a different color), and below that, each span is shown in a hierarchy, allowing you to see the parent-child relationship of all the spans in the trace. Errors are shown in red.

Clicking a span shows details in the right panel, based on the span’s metadata. Whenever you view a trace in Lightstep, it’s persisted for the length of your Data Retention policy so it can be bookmarked, shared, and reviewed by your team at any time. The selected span is part of the URL, so it will remain selected when the URL is visited.

You can learn more about the Trace view here.

Expandable end

In the panel, you can see that the span for get-store-data has the log message we were just viewing, as well as the error codes we saw in Attributes with Errors table in the previous step. We now have the data we need to validate that the get-store-data operation on the store-server service is unable to process requests, and that this error is causing errors all the way up the stack.

What Did We Learn?

  • Logs Analysis shows you logs from spans in the regression and you can compare the count to the number in the baseline.
  • You can filter the data by service, operation, or attribute, making it easier to pinpoint the issue.
  • The Trace view verifies your hypothesis showing data from an actual span.