LightStep

LightStep [𝑥]PM Documentation

Welcome to the LightStep developer hub. You'll find comprehensive guides and documentation to help you start working with LightStep [𝑥]PM as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

How to Use This Guide and Glossary

Welcome!

If you're new to LightStep, we suggest reading the articles in the left-hand navbar in order. If you have a question about a specific feature, the search in the upper-right should provide relevant results (e.g., "Streams", "Satellite").

Below, a Glossary is provided for reference should you encounter an unfamiliar term within the documentation.

Finally, we genuinely value your feedback! If you came here and didn't find what you were looking for, please let us know by sending your feedback to support@lightstep.com. Another way to provide feedback is by directly suggesting edits to the articles themselves using the Suggest Edits link in the top right-hand corner of any page. Thanks!


Traces and Spans

Span

Represents a logical unit of work in the system that has a start time and a duration. Spans may be nested and ordered to model causal relationships. There are three main type of Spans:

  • Root Span is the starting point for a trace.
  • Parent Span is a span that creates other spans.
  • Child Span is a span that is created by a Parent Span

Logs

Every Span has zero or more Logs, each is a timestamped event name, optionally accompanied by a structured data payload of arbitrary size.

Logs should be added to any span where additional context would add value and the information included would be unique to an individual trace.

OpenTracing has some recommended conventions around log fields that LightStep [x]PM follows as well.

Span Reference

A Span may reference zero or more Spans that are causally related. LightStep recognizes the two types of references defined by OpenTracing: ChildOf and FollowsFrom. Both reference types specifically model direct causal relationships between a child Span and a parent Span.

  • ChildOf: A parent ←→ child relationship specifically where the parent has some sort of dependency on the child Span. (as opposed to FollowsFrom). All of the following would constitute ChildOf relationships:
  • A Span representing the server side of an RPC may be the ChildOf a Span representing the client side of that RPC
  • A Span representing a SQL insert may be the ChildOf a Span representing an ORM save method
  • Many Spans doing concurrent (perhaps distributed) work may all individually be the ChildOf a single parent Span that merges the results for all children that return within a deadline
  • FollowsFrom: Some parent Spans do not depend in any way on the result of their child Spans. In these cases, we say merely that the child Span FollowsFrom the parent Span in a causal sense. There are many distinct FollowsFrom reference sub-categories, and in future versions of OpenTracing they may be distinguished more formally.

These can all be valid timing diagrams for children that FollowFrom a parent.

SpanContext

Represents Span state that must propagate to child Spans and across process boundaries (e.g., a <trace_id, span_id, sampled> tuple). SpanContext is used when propagating traces across process boundaries and when creating edges in the trace graph.

Sub-Trace

A Sub-Trace is the portion of an overall end-to-end trace that consists of a root span and all of its descendants. If a trace is thought of as a directed acyclic graph (DAG) of Spans, then a sub-trace is simply a subgraph of the overall DAG.

Tags

Every Span may also have zero or more key:value Tags, which do not have timestamps and simply annotate the spans. As is the case with Logs, if certain known tag key:values are used for common application scenarios, tracers can choose to pay special attention to them.

There are also two ways to create traces using special tag prefixes:

  • GUID Tag - if your system has implemented a Global Unique ID (GUID), you can add a prefix on tag keys (set via the setTag() OpenTracing API). The value of the GUID is then able to be used as a unique identifier in the LightStep service to associate spans.
  • Join Tag - a prefix on tag keys (set via the setTag() OpenTracing API). Any spans with the same value for a given "join key" and overlapping in time will automatically be considered part of the same trace. Join tags work for cross-service traces in a system that is already maintaining a request ID or transaction ID.

Trace

A Trace represents the potentially distributed, potentially concurrent data/execution path in a (potentially distributed, potentially concurrent) system. A Trace can be thought of as a directed acyclic graph (DAG) of Spans.

Trace Assembly

Trace Assembly is the process whereby the individual spans reported by the LightStep client libraries are connected into a single, logical trace of the top-most operation. This process is managed by LightStep outside of the host application to ensure the minimal overhead of tracing instrumentation.


LightStep Specific Usage

Alert

An Alert is a notification that a value being monitored has gone outside of an assigned threshold for an assigned duration.

LightStep can be configured to alert via PagerDuty as well as Slack.

Collector

The Collector is the former name for Satellite.

Component

A Component is a logical service (or client) in a distributed system. The component usually represents a particular process or script in the distributed system. During initialization of the LightStep client library, the Component is assigned a name via the component_name field (the spelling of this field will vary depending on the programming language being used).

Dashboard

A Dashboard is a user-created, high-level view of the operations of interest.

Operation

An Operation is the work represented by a Span.

Organization

Administrative features within LightStep are built around the concept of an Organization. Organizations allow teams to more easily manage projects and per-user access controls. Functionality includes:

  • Centrally manage users and projects for the entire organization
  • New users get access to ALL projects owned by the organization
  • Single Sign-on with Google for one-click sign-in and sign-up
  • Whitelisted Domains allow JIT account provisioning for verified users

Project

An organization can have one or more Projects. A Project encapsulates all LightStep data for a particular environment such as dev or production, spanning team boundaries, languages, clients, servers, and physical locations. Things to note:

  • New Projects will immediately be accessible to everyone in the organization
  • Deleting a project will remove access for everyone

Satellite

The Satellite (formerly called the "Collector") is a LightStep service that manages the collection and aggregation of trace data reported by the LightStep client libraries. Satellites are run on-premise in a customer datacenter or VPC. LightStep also provides a shared Satellite pool (SaaS) for initial product testing and evaluation.

Saved Searches

The former name for Streams.

Streams

Streams (formerly "Saved Searches") are persistent time-series trace data matching a predicate such as a combination of service (component) name, operation, and tag values. They allow analysis of specific facets of the generated tracing data.

SLA

A Service Level Agreement, or SLA, is a contract between a service provider (either internal or external) and the end user that defines the level of service expected from the service provider. SLAs are output-based in that their purpose is specifically to define what the customer will receive.


General Statistical Terms

Cardinality

The number of elements in a set or other grouping, as a property of that grouping.

Tag cardinality - adding tags to your spans can provide additional correlation across components or services. To get the most value out of this additional data it is important to consider what values make good tags.

Tag
Guidance

account_id

Too general

account_id=792_2016-09-27_08-23-15

Too specific

account_id=792

Good

p99 (or pXX)

p99 is an abbreviation for 99th percentile of a (histogram) distribution. This represents the upper bound of latencies experienced by 99% of traces. In other words, 99% of the traces are experiencing the p99 latency or less.