Lightstep analyzes 100% of unsampled transaction data from highly distributed, deep systems to produce complete end-to-end traces and robust metrics that explain performance behaviors and accelerate root cause analysis. Four main components make that happen: instrumentation that creates telemetry data, Satellites that collect and analyze that data, the Hypothesis Engine that assembles and further analyzes the data, and the web UI where you can view that data to investigate and resolve performance issues.

Instrumentation

You instrument your services to create and collect telemetry data used to describe distributed traces and metrics in your system. This instrumentation lives in your microservices, functions, web, mobile clients, anywhere your system accesses functionality.

If your services use Java, Node.js, Python, or Go, you can quickly instrument using our OpenTelemetry Launchers. OpenTelemetry provides APIs, libraries and instrumentation resources to capture telemetry data from your applications. Any supported frameworks, protocols, libraries, and data stores are automatically instrumented with just one line of code. You can then add more targeted instrumentation in areas of your system where additional data would prove helpful.

Lightstep also supports instrumentation for many other languages using OpenTracing.

Lightstep can also ingest data from Jaeger Agents or Zipkin, so if you’ve already instrumented your app to work with one of those, it will work with Lightstep too!

Satellites

Satellites are Lightstep components that communicate with your instrumentation to collect 100% of the telemetry data. Satellites store and analyze that data for a period of time called the recall window before it’s sent to the Hypothesis Engine for further analysis. Satellites analyze the performance of each segment against historical performance, error rates, and throughput.

Lightstep offers three types of Satellites: a locally run satellite that developers use during individual coding and testing to speed up instrumentation time, public remote satellites used by development environments to quickly observe your full system pre-production, and on-premise Satellites that you configure and maintain to meet your specific production environment requirements.

Lightstep Hypothesis Engine

Once Satellites analyze 100% of the unsampled data, they send any data that serves as examples of application errors, high latency, or other interesting events to the Hypothesis Engine. The engine further analyzes the data, builds complete traces and dynamic service diagrams, deduces correlations among the data, and monitors for changes in performance after deploys. Along with trace data, the engine also monitors metrics and logs to provide full observability into your system’s performance.

The Hypothesis Engine durably stores the data for as long as your Data Retention policy allows. Historical comparisons allow you to quickly see when things are not normal. Post-mortems can contain real data to show exactly what happened and when.

Lightstep Web UI

Here’s where observability is fully realized. You can view complete traces, from web and mobile clients down to low-level services and back, with the critical path (areas where latency or error rate is affecting performance) detected for you. The Service Health view allows you to compare performance from two time periods (for example before and after a deploy) so you can proactively catch issues before they become problematic to your customers. The Explorer view makes it easy to discover and isolate distinct real-time performance behaviors. Streams and dashboards allow you to monitor any facet of the system without limits on cardinality. Share your findings using interactive detailed views of the system.


What Did We Learn?

  • Lightstep uses four main components: instrumentation that collects telemetry data and sends it to Satellites. Satellites analyze that data and send examples of application errors, high latency, or other interesting events to the Hypothesis Engine when polled. The engine further analyzes the data and builds the data visualization that the web UI displays for you.
  • There are three different types of Satellites: Developer Satellites let you test your service’s performance locally, during development. Public Satellites allow you to view end-to-end telemetry through all the services and frameworks in your app. On-premise Satellites provide the full data fidelity and security needed by production environments.
  • The Hypothesis engine and Web UI allow you to easily see changes in performance and error rates, and drill down into details to make and then confirm hypotheses.