LightStep

LightStep [π‘₯]PM Documentation

Welcome to the LightStep developer hub. You'll find comprehensive guides and documentation to help you start working with LightStep [π‘₯]PM as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Satellite Diagnostics

The Satellite diagnostics service is a Satellite sidecar process whose purpose is to track Satellite health and provide diagnostic information.

The service runs by default on port 8000 at the /diagnostics endpoint. In Docker, the environment variable to set this is COLLECTOR_BABYSITTER_PORT; in the collector.yaml file it's defined as babysitter_port. For more information, see the Satellite Setup article as well as Reporting Errors.

To access the diagnostics page, go to satellite-host:8000/diagnostics. You will see a list of checks and logs for the given collector.

Health/Readiness and Liveness Checks

Health/Readiness Checks are used by load balancers to determine whether a Satellite is currently healthy and available to handle incoming span traffic.

Liveness endpoints are used by orchestration frameworks like Kubernetes to determine when a Satellite is unable to respond and needs to be restarted. This is a lower-confidence indication of Satellite health than the readiness checks.

All checks will have a checkmark next to its name. βœ“ means that the check is okay and ✘ means there is an issue.

  • ready / healthy checks fail when the Satellite has too many queued span reports (load balancers should try a different Satellite)

Ready/health check endpoint can also be reached via http(s)://{satellite host}:{admin port}/_ready. Load balancers can use this endpoint for health checks. A 200 (OK) response indicates that the satellite is both responding AND healthy (able to handle incoming span traffic).

  • alive (formerly: healthy) checks fail when a Satellite is not able to respond to HTTP requests at all (orchestrators should restart the instance)

Alive check endpoint can also be reached via http(s)://{satellite host}:{admin port}/_live. Orchestration tools (e.g. Kubernetes)can use this endpoint for liveness checks. A 200 (OK) response indicates that the satellite is alive and responding, and the instance should not be terminated. However, it makes no promises about satellite health (it could be temporarily overloaded and not accepting spans)

Configuration

The Satellite diagnostics keep track of whether the configuration has been parsed and shows what the current Satellite configuration is. This will not include the API key in order to preserve secrets.

Connection Status

Satellites need to communicate with two different LightStep endpoints to work:

  • api.lightstep.com
  • api-grpc.lightstep.com

Note that these will report as "not connected" if there has not been any data sent to the Satellite yet.

Diagnostics Bundle

In the event that the information shown isn't enough to make a quick diagnosis, the diagnostics can generate a tarball with additional information. This will include the checks and logs shown above plus various profiles from the Satellite.