In distributed tracing, a trace is a view into a request as it moves through a distributed system. A span is the building block of a trace and is a named, timed operation that represents a piece of the workflow in the distributed system. Multiple spans are pieced together to create a trace.

In Lightstep (and other observability tools), you view traces as a “tree” of spans that reflects the time that each span started and completed. It also shows you the relationship between spans. Here’s a simplified view of a trace, as it relates to the request above.

A trace starts with a root span where the request starts. This root span can have one or more child spans, and each one of those child spans can have child spans.

The purpose of a span is to provide information to observability tools about the execution of a program, so it should contain details about the work.

The components of an individual span include:

  • Operation name
  • Start and finish timestamps
  • SpanContext
  • Set of Attributes
  • Ordered list of Events

SpanContext

In order for the trace tree to be built with these relationships intact, each span needs to propagate its SpanContext to its child. It may be sent along with a request to a remote system, or to another span generated by the same system. SpanContext tells the child span who its parent is (via the span_ID) and what trace it belongs to (trace_ID).

Attributes

Attributes are key-value pairs that provide detail about the span. They apply to the whole span and don’t include timestamps (use an Event for information about events that happen at a specific time). Attributes allow you to query, group, or otherwise analyze traces and spans.

The OpenTelemetry Spec defines standard attributes that you should use, for example StatusCode and SpanKind).

Status Code

Status Code is a special, standardized attribute for a span. It may be set to values like OK, Cancelled, and Permission Denied.

SpanKind

SpanKind is another standardized attribute. The SpanKind attribute provides useful performance context in a trace - does this span call a remote system? Does it serve requests from remote systems? Does it do work asynchronously off of a queue? All of this information is useful in performance analysis. The supported values of SpanKind are CLIENT, SERVER, PRODUCER, CONSUMER, and INTERNAL.

User-Defined

You can also create your own attribute key/value pairs so that the information you know you’ll need to understand your system are available to you.

Here are a few examples:

  • db.type: cassandra
  • db.url: mysql://db.example.com:3306
  • net.transport: IP.TCP
  • net.peer.ip: 127.0.0.1

By following these practices, your observability tools may be able to provide more useful information to you.

Events

Events contain a name, a timestamp, and an optional set of Attributes, along with a timestamp. Events represent an event that occurred at a specific time within a span’s workload.

Here are some examples of events: t:3, name:log, message:“retrieved 400 records” t:5, name:image-generated, image.x:408, image.y:552, image.size:2055 KB

An example

Here’s an example of a Span

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    t=0            operation name: db_query               t=10

     +-----------------------------------------------------+
     | · · · · · · · · · ·    Span     · · · · · · · · · · |
     +-----------------------------------------------------+

Status: Unavailable

Attributes:
- db.instance:"customers"
- db.statement:"SELECT * FROM mytable WHERE foo='bar'"
- peer.address:"mysql://127.0.0.1:3306/customers"

Events:
- (t=4): message:"Can't connect to mysql server on '127.0.0.1'(10061)"

Parent:
  SpanContext:
  - trace_id:"abc123"
  - span_id:"xyz654"

SpanContext:
- trace_id:"abc123"
- span_id:"xyz789"

And here’s an example of a trace made of spans: