LightStep

LightStep Documentation

Welcome to the LightStep developer hub. You'll find comprehensive guides and documentation to help you start working with LightStep as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Understand Distributed Tracing

Firefighting with traditional solutions is hard. Scrolling through multitudes of dashboards only shows you there is a problem, not where, and not what caused it. Sifting through logs takes time, and it's hard to find that needle in a haystack. That's where distributed tracing comes in.

Distributed tracing provides a view of the life of a request as it travels across multiple hosts and services communicating over various protocols. Here's an example of a request from a client through a load balancer into several backend systems. With distributed tracing implemented, you have a window into performance at every step in the request.

Distributed tracing relies on instrumentation of the system you're trying to observe. You can use libraries such as OpenTracing to provide a consistent interface across a variety of languages to write this instrumentation code. Some systems may require custom instrumentation at the service level, while others may only need instrumentation of the framework. Often, you'll need to use a combination of these approaches.

Before you start that instrumentation, read on to learn about the different components that make up a distributed trace, and how the data from that instrumentation makes into LightStep where you can view and work with it.

Spans and Traces

In distributed tracing, a trace is a view into a request as it moves through a distributed system. Multiple spans represent different parts of the workflow and are pieced together to create a trace. A span is a named, timed operation that represents a piece of the workflow.

In LightStep, you view traces as a "tree" of spans that reflects the time that each span started and completed. It also shows you the relationship between spans. Here's a simplified view of a trace, as it relates to the request above.

A trace starts with a root span where the request starts. This root span can have one or more child spans, and each one of those child spans can have child spans.

Child spans don't always finish before their parent when the two are asynchronous. For example, an RPC call might time out, and so the parent span finishes before the "hanging" child span.

As you can see in the above illustration, there can be two types of child spans. A ChildOf span is one where the parent depends on that child span's result (like the relationship of the load balancer and the auth span). Spans doing concurrent (perhaps distributed) work may all individually be the ChildOf a single parent span that merges the results for all children.

The second is the FollowsFrom relationship, where the parent span is not dependent on the child (like the auth span and the billing span). These often represent "fire-and-forget" operations, for example, an opportunistic write to cache or a message that doesn't care about its consumer.

SpanContext

In order for the trace tree to be built with these relationships intact, each span needs to propagate its SpanContext to its child. SpanContext tells the child span who its parent is (parent SpanID) and what trace it belongs to (trace ID). The child span creates its own ID and then propagates both that ID (as the parent span ID) and the trace ID in the SpanContext to its child span.

There can be other components in SpanContext, but the parent span ID and trace ID are what allow a trace tree to be built. Read the OpenTracing docs for more info.

Tags

A span may also have zero or more key/value tags. Tags allow you to create metadata about the span. For example, you might create tags that hold a customer ID or information about the environment that the request is operating in or an app's release. Tags do not reflect any time-based event (logs handle events). The OpenTracing spec defines several standard tags. For example, here are the tags available using the Java-based tracer. You can also implement your own tags.

Span Logs

Span logs contain time-stamped information. A span can have zero or more logs. Each is a time-stamped event name, optionally accompanied by a structured data payload of arbitrary size.

You can add logs to any span where the additional context would add value and the information included would be unique to an individual trace.

Like tags, OpenTracing defines recommended conventions for log fields.

Propagating Across the Wire

Propagating span context between services is aided by the use of the inject and extract methods provided by OpenTracing. When creating a request, you inject the span context into the RPC, and when receiving that request you extract the Span Context. Here's an example of injecting parent span context into a carrier using message headers.

public class TracingMessageProducer extends ForwardingMessageProducer {
    void startTracing(final Message message) {
        // ...
        final SpanId parent = Tracing.peekOrCreate();
        final SpanId spanId = parent.createChild();

        addToHeaders(message, TracingHeaders.TRACE_ID, spanId.getTraceId().toString());
        addToHeaders(message, TracingHeaders.SPAN_ID, spanId.getSpanId());
        addToHeaders(message, TracingHeaders.PARENT_SPAN_ID, spanId.getParentId());
        // ...
}

You then extract the context from the message headers:

public class TracingMessageConsumer extends ForwardingMessageConsumer {
    void startTracing(final Message message) {
        final SpanId parentSpanId = SpanId.create(message.getStringProperty(TracingHeaders.TRACE_ID, null),
                                                                             message.getStringProperty(TracingHeaders.SPAN_ID, null),
                                                                             message.getStringProperty(TracingHeaders.PARENT_SPAN_ID, null));

And then create a new child span:

public class TracingMessageConsumer extends ForwardingMessageConsumer {
    void startTracing(final Message message) {
        final SpanId parentSpanId = SpanId.create(message.getStringProperty(TracingHeaders.TRACE_ID, null),
                                                                             message.getStringProperty(TracingHeaders.SPAN_ID, null),
                                                                             message.getStringProperty(TracingHeaders.PARENT_SPAN_ID, null));
        final SpanId spanId = parentSpanId.createChild();
        final SpanId previousSpanId = Tracing.push(spanId);
        Tracing.setOperation(OPERATION_NAME);
        Tracing.setAttribute(TracingLogEntryKeys.EXCHANGE, message.getExchangeName());
        Tracing.setAttribute(TracingLogEntryKeys.ROUTING_KEY, message.getRoutingKey());
        Tracing.push(previousSpanId);
         // ...
    }

Read Instrument Your Code to learn more details about instrumentation, such as how to prioritize what to trace.

Sending Span Data to LightStep with Tracers

Once you've done your instrumentation, you instantiate tracers that know how to create the spans and their associated tags, logs, and context. LightStep tracers collect 100% of that data and send it to the LightStep Satellites, who piece together the spans into traces. The Satellites then send any data that serves as examples of application errors, high latency, or other interesting events in real time to the LightStep Engine. You use the LightStep web application to view the actual traces, along with all the associated metadata from tags and logs. Read How LightStep Works for more info.

Of course, that's not all there is to distributed tracing or LightStep!
Here are more resources that can help you get started:


Understand Distributed Tracing


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.