Lightstep analyzes 100% of unsampled transaction data from highly distributed, deep systems to produce complete end-to-end traces and robust metrics that explain performance behaviors and accelerate root cause analysis. Four main components make that happen: tracers that create telemetry data, Satellites that collect and analyze that data, the Hypothesis Engine that assembles and further analyzes the data, and the web UI where you can view that data to investigate and resolve performance issues.

Tracers

You implement tracers in the language of your service’s code to create and collect span data used to describe distributed traces in your system. This instrumentation lives in your microservices, functions, web, mobile clients, anywhere your system accesses functionality. Lightstep tracers are built on top of the OpenTracing library and are fully open source.

For broad coverage, you can auto-instrument your framework using one of the OpenTracing auto-installers that instrument frameworks (such as Django and Spring), common protocols (HTTP, gRPC) and data store drivers (MySQL, MongoDB). Or integrate the Lightstep tracer with Istio to instrument the full service mesh. You can then add more targeted instrumentation in areas of your system where additional data would prove helpful.

Lightstep tracers can also ingest data from Jaeger Agents or Zipkin, so if you’ve already instrumented your app to work with one of those, it will work with Lightstep too!

Want to use OpenTelemetry instead? Read these docs to get started!

Satellites

Satellites are Lightstep components that communicate with the tracers to collect 100% of the created telemetry data. Satellites store that data for a period of time called the recall window before it’s sent to the Hypothesis Engine for further analysis, and then deleted to make room for new data. Satellites analyze the performance of each segment against historical performance, error rates, and throughput.

Lightstep offers three types of Satellites: a locally run satellite that developers use during individual coding and testing to speed up instrumentation time, public remote satellites used by development environments to quickly observe your full system pre-production, and on-premise Satellites that you configure and maintain to meet your specific production environment requirements.

Lightstep Hypothesis Engine

Once Satellites analyze 100% of the unsampled data, they send any data that serves as examples of application errors, high latency, or other interesting events to the Hypothesis Engine. The engine further analyzes the data, builds complete traces and dynamic service diagrams, deduces correlations among the data, and monitors for changes in performance after deploys. Along with trace data, the engine also monitors metrics and logs to provide full observability into your system’s performance.

The Hypothesis Engine durably stores the data for as long as your Data Retention policy allows. Historical comparisons allow you to quickly see when things are not normal. Post-mortems can contain real data to show exactly what happened and when.

Lightstep Web UI

Here’s where observability is fully realized. You can view complete traces, from web and mobile clients down to low-level services and back, with the critical path (areas where latency or error rate is affecting performance) detected for you. The Service Health view allows you to compare performance from two time periods (for example before and after a deploy) so you can proactively catch issues before they become problematic to your customers. The Explorer view makes it easy to discover and isolate distinct real-time performance behaviors. Streams and dashboards allow you to monitor any facet of the system without limits on cardinality. Share your findings using interactive detailed views of the system.

If you already have an account, and don’t need to learn about creating other Lightstep users, you can sign in to Lightstep and go on to Step 3.


What Did We Learn?

  • Lightstep has four main components: tracers that collect telemetry data and send them to Satellites. Satellites analyze that data and send examples of application errors, high latency, or other interesting events to the Hypothesis Engine when polled. The engine further analyzes the data and builds the data visualization that the web UI displays for you.
  • There are three different types of Satellites: Developer Satellites let you test your service’s performance locally, during development. Public Satellites allow you to view end-to-end telemetry through all the services and frameworks in your app. On-premise Satellites provide the full data fidelity and security needed by production environments.
  • The Hypothesis engine and Web UI allow you to easily see changes in performance and error rates, and drill down into details to make and then confirm hypotheses.