We’ve made some big changes to our root cause analysis functionality, allowing you to quickly assess the impact of a regression and then quickly isolate the root cause.

Time Shifted Deployment Comparison

When you notice that performance has regressed after a deployment, before you start digging in to latency or error analysis, it can be helpful to compare the current deploy to another deploy to see if the changes are an anomaly or if this type of regression is common after a deploy (often the case with canary deployments). Now you can overlay the shape of a previous deploy over the current performance to see if any performance issues are an actual regression.

In this example, you can see that the selected version has the same shape as the current version and can conclude that the current version is behaving as expected.

In the image below, you can see that the selected version did not have the same issue, there are no gray lines showing a regression.

Learn more here.

Metrics Now Displayed on the Service Health View

Back in April, Lightstep added the ability to view machine metrics when you compare performance of a service over two different time periods (metrics are available when your instrumentation uses Go, Java, Node.js, or Python).

Now you can view the machine metrics directly from the Service Health view.

More here.

Filter Root Cause Analysis Data to Narrow Your Investigation

When you use Lightstep’s RCA view to compare performance over two different time periods (for example, before and after a deploy), you can now filter the data to narrow in on the cause of regression. When you apply filters, the Operation diagram, Log analysis, and Trace Analysis tables all repopulate with data that match the filters.

You can filter by service, operation, or tags for both latency and error rate increases .

Trace Analysis Table on RCA Views Now Shows all Span Data

The Trace Analysis table on both the latency and error rate RCA views shows span data, allowing you to see the service, operation, duration, and start time from spans from both the baseline and regression time periods. Previously by default, only the span data from the service and operation currently under your investigation were shown. Now the table shows data from an aggregation of all spans that participated in the same trace as the service and operation under investigation.