Lightstep Microsatellites generate several helpful StatsD-based metrics that you can send to any compliant monitoring system. If you already use Datadog as your system, you can add tags to provide more context for your metrics.
Enabling StatsD Microsatellite metrics
You can turn on metrics reporting when you configure your Satellites. Here are examples of that configuration using StatsD or Datadog, for both AWS/Debian and Docker.
StatsD example
Start tabs
Docker
1
2
3
4
5
6
7
8
9
10
11
# Required
COLLECTOR_STATSD_HOST=127.0.0.1
COLLECTOR_STATSD_PORT=8125
COLLECTOR_STATSD_EXPORT_STATSD=true
# Recommended
COLLECTOR_STATSD_PREFIX=lightstep.prod.us-west-1
# Optional
COLLECTOR_STATSD_SATELLITE_PREFIX=satellite-canary
COLLECTOR_STATSD_CLIENT_PREFIX=client-via-canary
AWS or Debian
1
2
3
4
5
6
7
8
9
10
11
12
statsd:
# Required
host: 127.0.0.1
port: 8125
export_statsd: true
# Recommended
prefix: "lightstep.prod.us-west-1"
# Optional
satellite_prefix: "satellite-canary"
client_prefix: "client-via-canary"
End code tabs
Datadog example
Start tabs
Docker
1
2
3
4
5
6
7
8
9
10
11
12
# Required
COLLECTOR_STATSD_HOST=127.0.0.1
COLLECTOR_STATSD_PORT=8125
COLLECTOR_STATSD_EXPORT_DOGSTATSD=true
# Recommended
COLLECTOR_STATSD_PREFIX=lightstep.prod.us-west-1
# Optional
COLLECTOR_STATSD_SATELLITE_PREFIX=satellite-canary
COLLECTOR_STATSD_CLIENT_PREFIX=client-via-canary
COLLECTOR_STATSD_DOGSTATSD_TAGS="env:prod,pool:us-west-1,canary:true"
AWS or Debian
1
2
3
4
5
6
7
8
9
10
11
12
13
statsd:
# Required
host: 127.0.0.1
port: 8125
export_dogstatsd: true
# Recommended
prefix: "lightstep.prod.us-west-1"
# Optional
satellite_prefix: "satellite-canary"
client_prefix: "client-via-canary"
dogstatsd_tags: "env:prod,pool:us-west-1,canary:true"
End code tabs
Available metrics
Following are the metrics that Microsatellites report. Important metrics that affect Microsatellite and Lightstep Observability health are noted, with advice on when to alert and how to resolve the issue.
A note about project names in metrics:
* Many of these metrics are automatically labeled with a Lightstep Observability project name, so the resulting time series can be grouped by project, if desired.
* For basic StatsD metrics, the project becomes part of the metric name itself, for example: satellite.spans.received.my_lightstep_project_name
* For Datadog metrics, the project name is attached using a tag called lightstep_project
on the relevant metrics. The syntax to indicate a tag is {tag_name}
.
client.spans.dropped
The number of spans dropped at the client because the outgoing queue is full and trying to send earlier spans to a Microsatellite.
Values are cumulative and can be aggregated across Microsatellites and projects.
Consider monitoring this metric
Why monitor: The value of this metric represents how many spans the client can’t send to Microsatellites because its outgoing queue is full. When tracer clients can’t send spans to Microsatellites, the product experience may be compromised due to incomplete traces and incomplete statistics.
Alert Thresholds: Any value above 0 indicates some amount of data loss. We recommend setting alerts for when the value remains above 0 for an extended period.
Remediations: First try tuning the buffer size of the tracer client library by following these instructions. If the problem persists, audit your instrumentation to ensure you aren’t “over-instrumenting” by sending too many low value (or accidental) spans.
Type: Count
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<client_prefix>.spans.dropped.<lightstep_project>
Datadog
1
2
<prefix>.<client_prefix>.spans.dropped
{lightstep_project}
End code tabs
satellite.access_tokens.invalid
The number of reports (i.e., batches of spans) that have been rejected by the Microsatellite due to an invalid access token.
Values are cumulative and can be aggregated across Microsatellites and projects.
Type: Count
Since: 2018-11-19_17-15-06Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.access_tokens.invalid.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.access_tokens.invalid
{lightstep_project}
End code tabs
satellite.bytes.received.thrift
The total bytes of Thrift span traffic received over the network by the Microsatellite. You can use this metric to tune your tracer if you’re seeing dropped spans from the client.
Values are cumulative and can be aggregated across Microsatellites and projects.
Type: Count
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.bytes.received.thrift
End code tabs
satellite.bytes.received.grpc
The total bytes of gRPC span traffic received by the Microsatellite over the network. You can use this metric to tune your tracer if you’re seeing dropped spans from the client.
Values are cumulative and can be aggregated across Microsatellites and projects.
Type: Count
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.bytes.received.grpc
End code tabs
satellite.spans.received
The total number of spans that the Microsatellite received and decoded. This value reflects any sampling you may have configured as reflected by <satellite_prefix>.spans.indexed
and also includes any spans that Microsatellites may yet drop due to insufficient resources (<satellite_prefix>.spans.dropped
).
Values are cumulative and can be aggregated across Microsatellites and projects.
Type: Count
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.spans.received.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.spans.received
{lightstep_project}
End code tabs
satellite.spans.dropped
The total number of spans that the Microsatellite dropped due to insufficient resources (after being received and decoded). These spans are not indexed or added to the statistics for streams.
Values are cumulative and can be aggregated across Microsatellites and projects.
Consider monitoring this metric
Why monitor: The value of this metric represents how many spans the Microsatellite is unable to process due to insufficient resources. When spans are not able to be processed, the product experience may be compromised due to incomplete traces and incomplete statistics.
Alert Thresholds: Any value above 0 indicates some amount of data loss. We recommend setting alerts for when the value remains above 0 for an extended period. It might also be helpful to alert when the percentage of received spans that are subsequently dropped exceeds a value of 2% (configurable given your tolerance). satellite.spans.dropped / satellite.spans.received > 0.02
Remediations: If the problem persists, try adding more Microsatellites.
Type: Count
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.spans.dropped.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.spans.dropped
{lightstep_project}
End code tabs
satellite.index.queue.length
The number of reports (i.e., batches of spans) that have been read from the network and are currently waiting to be indexed.
This value is instantaneous (non-cumulative).
Type: Gauge
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.index.queue.length.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.index.queue.length
{lightstep_project}
End code tabs
satellite.index.queue.bytes
The number of bytes worth of reports that are currently waiting to be indexed (size of index.queue.length
in bytes).
This value is instantaneous (non-cumulative).
Type: Gauge
Since: 2018-10-03_18-47-12Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.index.queue.bytes.<lightstep_project>
Datadog
1
2
3
<prefix>.<satellite_prefix>.
index.queue.bytes
{lightstep_project}
End code tabs
satellite.spans.indexed
The number of spans that are successfully ingested by the Microsatellite and can be viewed in Lightstep Observability or assembled into traces.
If Microsatellites are configured to use the sample_one_in_n
parameter, this metric represents the number of spans that remain after down-sampling. See spans.received
for pre-sampled counts.
Values are cumulative and can be aggregated across instances and projects.
Aggregate statistics in Streams and Histograms will be scaled up automatically to account for the sampling ratio.
Type: Count
Since: 2021-01-26_23-02-36Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.spans.indexed.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.spans.indexed
{lightstep_project}
End code tabs
satellite.bytes.indexed
The total bytes for spans that are successfully ingested by the Microsatellite and can be viewed in Lightstep Observability or assembled into traces.
If Microsatellites are configured to use the sample_one_in_n
parameter, this metric represents the total size in bytes that remain after down-sampling. See spans.received
for pre-sampled counts.
Values are cumulative and can be aggregated across instances and projects.
Aggregate statistics in Streams and Histograms will be scaled up automatically to account for the sampling ratio.
Type: Count
Since: 2021-01-26_23-02-36Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.bytes.indexed.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.bytes.indexed
{lightstep_project}
End code tabs
satellite.starts
The number of times this Microsatellite has been restarted (including the initial start). Increments by one for each restart.
Type: Count
Since: 2021-01-26_23-02-36Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.starts.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.starts
{lightstep_project}
End code tabs
forward_spans.dropped
The total number of spans that the Microsatellite dropped between the Microsatellite and the Lightstep Observability platform, and so won’t be available for trace assembly.
Values are cumulative and can be aggregated across Microsatellites and projects.
Consider monitoring this metric
Why monitor: The value of this metric represents how many spans the Microsatellite is unable to forward to the Lightstep SaaS for analysis.
Alert Thresholds: Any value above 0 indicates some amount of data loss. We recommend setting alerts for when the value remains above 0 for an extended period.
Remediations: If the problem persists, try increasing the memory or adding Microsatellite.
Type: Count
Since: 2021-03-22_13-16-05z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.forward_spans.dropped.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.forward_spans.dropped
{lightstep_project}
End code tabs
forward_spans.dropped.size_exceeded
Spans dropped when being sent to Lightstep Observability because they exceed the maximum span size (128 KB).
Values are cumulative and can be aggregated across Microsatellites and projects.
Consider monitoring this metric
Why monitor: The value of this metric represents how many spans the Microsatellite is unable to process because the span size is over 128 KB. When spans are not able to be processed, the product experience may be compromised due to incomplete traces and incomplete statistics.
Alert Thresholds: Any value above 0 indicates some amount of data loss. We recommend setting alerts for when the value remains above 0 for an extended period. It might also be helpful to alert when the percentage of received spans that are subsequently dropped exceeds a value of 2% (configurable given your tolerance). forward_spans.dropped.size_exceeded / satellite.spans.received > 0.02
Remediations: If the problem persists, try reducing the size of the spans.
Type: Count
Since: 2021-03-22_13-16-05z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.forward_spans.dropped.size_exceeded.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.forward_spans.dropped.size_exceeded
{lightstep_project}
End code tabs
forward_spans.request.compressed_bytes.sum
Total bytes (compressed) emitted from the Microsatellite.
Values are cumulative and can be aggregated across Microsatellites and projects.
Consider monitoring this metric
Why monitor: The value of this metric represents how many compressed bytes of span data emitted by the Microsatellite and can be useful when looking at network egress costs.
Type: Count
Since: 2022-04-22_21-58-06Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.forward_spans.request.compressed_bytes.sum.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.forward_spans.request.compressed_bytes.sum
{lightstep_project}
End code tabs
forward_spans.request.compressed_bytes.failed
The amount of span bytes (compressed) that failed to send from the Microsatellite to the Lightstep SaaS.
Values are cumulative and can be aggregated across Microsatellites and projects.
Consider monitoring this metric
Why monitor: The value of this metric represents how many bytes of compressed span data that the Microsatellite is unable to send due to errors. The metric includes the code
tag, whose value will be the error code received.
Alert Thresholds: Any value above 0 indicates some amount of data loss. We recommend setting alerts for when the value remains above 0 for an extended period. It might also be helpful to alert when the percentage of received spans that fail exceeds a value of 10% (configurable given your tolerance). forward_spans.request.compressed_bytes.failed / forward_spans.request.compressed_bytes.sum > 0.10
Type: Count
Since: 2022-04-22_21-58-06Z
Start tabs
StatsD
1
<prefix>.<satellite_prefix>.forward_spans.request.compressed_bytes.failed.<lightstep_project>
Datadog
1
2
<prefix>.<satellite_prefix>.forward_spans.request.compressed_bytes.failed
{lightstep_project}
End code tabs