Lightstep combines metric and tracing data to gain full observability into your system in one tool. Your metric dashboards and alerts now not only show you when there’s a problem, they become actionable tools that find the source of that problem for you. Changes in metric behavior are mapped automatically to changes across service boundaries. You don’t need to know the dependencies in your system, Lightstep understands them and can find the issues deep in your stack.

Getting started with Lightstep is easy! Sign up for a free account, import your system’s infrastructure, application, and cloud metrics,instrument your services and libraries using our OpenTelemetry Launchers, and use Change Intelligence to obtain full observability.

Sign Up for a Free Account

Lightstep offers a free Community Tier account. If you don’t already have an account, you can sign up here.

If you company already has an account, click Join an existing team.

Send Metric Data to Lightstep

Lightstep supports infrastructure monitoring and application metrics with automatic migration from Datadog, Prometheus, and AWS. Once you have that data in Lightstep, you can build dashboards and charts to monitor it, and you can build corresponding alerts to be notified when meaningful deviations occur. Charts for metrics

But more then telling you whether any given part of your system is unhealthy, you can use your metrics, traces, and Lightstep’s Change Intelligence to understand “what caused that change?” Change Intelligence

Instrument Your Services

To take full advantage of Change Intelligence, your services need to be instrumented for distributed tracing. Lightstep supports OpenTelemetry to get telemetry data (traces, logs, and metrics) from your app as requests travel through its many services and other infrastructure.

If you’ve never instrumented for observability, read Understand Distributed Tracing for some background knowledge.

Lightstep provides Launchers that install OpenTelemetry and capture telemetry from popular libraries and frameworks already installed in your system, with only one line of code needed. This type of instrumentation provides observability into a request as it travels from service to service. Change Intelligence uses that telemetry to understand when performance changes correlate with metric deviations. Attributes on the traces help you pinpoint where the change ocurred.

OpenTelemetry (currently in Beta, with GA expected in November 2020) is the unified initiative that takes the best of both OpenTracing and OpenCensus forward. Think of OpenTelemetry as the next evolution of OpenTracing and OpenCensus.

Configure and install the launchers to instrument your app for tracing.

Understand Lightstep Microsatellites

Lightstep uses Microsatellites to collect 100% of the performance data that your tracing instrumentation generates. Microsatellites collect telemetry data generated by instrumented clients and servers, and then send that data to the Lightstep SaaS platform. The SaaS platform records aggregate information about the spans, directs the trace assembly process, and then stores traces durably, all for display in the Lightstep UI.

Learn more about Microsatellites.

Microsatellites communicate with the instrumentation using an access token. Access tokens are project specific. You’ll need this access token when you install the launcher.

Understand What’s Changed Using Lightstep

Once you’ve sent metrics to Lightstep and instrumented your services for tracing, you can use Change Intelligence to find the cause of a change in system behavior.

In your dashboard, you can click on a chart and ask Change Intelligence What caused this change? What caused this change?

In this example, it seems the update-catalog operation on the warehouse service experienced latency and rate regressions at the same time as the metric deviation. Change Intelligence tells you that over 35% of traces with that operation (taken from the time of the deviation) contained the attribute customer:ProWool, and so traces with that attribute may have the answer. Trace analysis

Expanding the analysis, you can see that the customer:ProWool attribute is coming from the iOS service. Traces with that attribute have almost 5x the latency and the rate increased almost ten-fold. Traces without that attribute didn’t experience that regression. Trace comparison

You can view the traces used in the analysis to confirm the cause. View traces

The Trace view shows that there seems to be an issue writing from the cache to the database client, likely caused by an increase in traffic from the ProWool customer! Trace view

When you send your metric and trace data to Lightstep, Change Intelligence quickly pinpoints the root cause for you. Instead of looking through numerous dashboards and charts, each time seeing just what changed, you can now let Change Intelligence tell you why!