Troubleshoot missing data in Cloud Observability

Telemetry is missing or incomplete

Data can appear incomplete or not appear in Cloud Observability for several reasons. Here are some troubleshooting steps for the most common causes.

Check your access token

Cloud Observability uses access tokens to authorize customers to send telemetry. If the access token is expired, disabled, or typed incorrectly, data will not be accepted by Cloud Observability.

Confirm your access token is enabled in Organization settings, the value in your configuration matches the value in the UI, and it hasn’t expired.

Verify the time window in the UI

Data displayed in the Cloud Observability UI are scoped to a specific start and end time. Double-check that the time window set on charts and queries in the UI is correct using the date and time picker.

For very long windows, data may not be available due to retention policies.

Check Reporting Status Page and Metric Details Page

If you’re using Microsatellites to send data, the Reporting Status page provides a summary of services actively sending traces to Cloud Observability.

If you’re using OpenTelemetry Collectors, check the Collector health page to ensure your Collectors are up to date and sending data.

For metrics, you can visit the metric details page to see actively reporting metrics.

Verify configuration and health of data pipeline components

Customers typically send data to Cloud Observability via multiple intermediate components like the OpenTelemetry Collector, Cloud Observability Microsatellites, or AWS CloudWatch Metric Streams. The health and configuration of these components is a frequent cause of missing or dropped telemetry.

Understanding the end-to-end pipeline of how data flows into Cloud Observability is critical for troubleshooting. For each “hop” of telemetry data through a collector, Microsatellite, or other component in your data pipeline, it’s important to verify the following:

  • How is the telemetry being ingested into this component?
  • How is the telemetry being modified (i.e. sampling, redacting) by this component?
  • How is the telemetry being exported from this component?
  • What format is the telemetry in?
  • How is the next hop configured?
  • Are there any network policies that prevent data from getting in or out?
  • Are there error messages in the logs of this component?

OpenTelemetry Collectors and Microsatellites are also impacted by hardware limitations including memory, network bandwidth, and CPU. More information is available on health, monitoring, and troubleshooting specific components under Verify and test microsatellite setup and the Collector health page. Also check out the OpenTelemetry Collector Troubleshooting Guide on GitHub.

Verify your data retention and sampling policies

Sampling intentionally drops data for performance or cost reasons. If sampling is set to a low value in either Cloud Observability, the OpenTelemety Collector, or Cloud Observability Microsatellites, data is intentionally not sent to (or processed by) Cloud Observability.

Cloud Observability data retention policies allow customers to configure how long data is retained in Cloud Observability. If you’re trying to retrieve data from many days or weeks ago, confirm your data retention policies.

More information on data retention policies is available here.

Check Cloud Observability’s status page

Cloud Observability updates the status page if our systems are experiencing an outage or trouble processing data.

The status page is available at https://status.lightstep.com/

Contact Cloud Observability

Include as much information as possible when contacting Cloud Observability for missing data issues, particularly if you are running OpenTelemetry Collectors or Microsatellites.

End-to-end architecture information and collector/microsatellite logs are especially helpful.

Services are missing or stale in the Service Directory

Services listed in the Service Directory are generated from traces. Below are some steps to troubleshoot service that don’t appear.

Confirm traces with service metadata is being ingested

The Reporting Status page also provides an active view of services sending data to Cloud Observability. If the expected service name does not appear, follow steps in the Telemetry is missing or incomplete section.

Wait 7 days for non-reporting services to be removed from the Service Directory

You may see old services in the Service Directory. If a service is decommissioned, it may take up to 7 days for it to be removed from the service directory.

Updated Jan 17, 2024