Lightstep Observability has a number of tools that help you in all your observability flows, whether it’s continual monitoring, triaging an incident, root cause analysis, viewing overall service health, or managing your team’s observability practices.
Monitoring
Monitoring your resources and transactions is a key part of observability. At a glance, you need to know if your transactions through your system are performant and that your resources (services, virtual memory) that those transactions consume are healthy. Unified dashboards allow you to view both your transactional performance (from trace data) and your resource health (usually from metric data) in one place. And after a deployment (even a partial deploy), you can use Lightstep Observability to ensure things are staying on track.
Unified dashboards
Using the unified dashboard experience, you can monitor both metric and span data charts in one place.
As a starting point, you can use our pre-built dashboards for AWS CloudWatch Metric Streams metrics or for a metric integration that uses the OTel Collector. Once the dashboard builds, you can edit it to add additional charts, change chart queries, rearrange the charts, and more.
You create charts for a dashboard using a query builder that works for both metric and span data. Use filters and groupings to see just the data you want.
Instead of the builder, you can use the Unified Query Language (UQL) in the editor to build more fine-grained queries.
For span data, exemplars are mapped in the chart, providing direct access to traces. A table below the chart provides a quick view into the data.
You can click into a chart and immediately start your investigation using Change Intelligence.
Using Terraform? You can use the Lightstep Terraform provider to create and manage your dashboards and charts. You can also use it to export existing dashboards into the Terraform format.
Read more:
- Create and manage unified dashboards
- Create and manage charts
- Learn about UQL
- Use Change Intelligence
Set up alerts
You create alerts by setting thresholds on a query to your metric or span data (you can set both a warning and critical threshold).
Notification destinations determine where the alert should be sent. Lightstep supports PagerDuty, Slack, and BigPanda out-of-the-box. You can use webhooks to integrate with other third-party destinations.
Read more:
Investigate root causes
Lightstep Observability has a number of different ways to help you find the root cause of performance and error issues. It can correlate spikes in metric performance or errors with changes in span data that ocurred at the same time to determine what caused the change in performance. Using span data, Lightstep Observability analyzes traces to determine service dependencies that may be causing latency or errors in services further up or down the stack.
Triage incidents using notebooks
When you begin an investigation, you often need to run a number of queries to reach a hypothesis about the origin of an issue. Notebooks allow you to query both your metric and span data in one place to reach that hypothesis and then share those findings with other team members.
Once you mitigate the issue, you can transfer your learnings from notebooks to begin deeper root cause analysis.
Learn more:
Find the cause of change
Lightstep’s Change Intelligence correlates metric and span data to help find the cause of metric deviations. It determines the service that emitted a metric, searches for performance changes on Key Operations from that service at the same time as the deviation, and then uses trace data to determine what caused the change.
You access Change Intelligence from any chart on a dashboard, notebook, or alert. A side panel displays attributes on spans that experienced a change in performance at the same time as a deviation on the chart. You can copy queries for these attributes and paste them into a notebook, where you can continue your investigation
Learn more: Investigate a deviation
View a full trace
From all tools that you might use in your investigation, you can click through to a full-stack trace of a request. A side panel provides details of each span in the trace, allowing you to view its attributes, logs, and other details. The Trace view illustrates the critical path for you (the time when an operation in a trace is actually doing something) so you can immediately see bottlenecks in the request. The Trace view is where you can prove out your hypotheses.
View service health
The Service Directory view lets you see at a glance how services reporting to Lightstep Observability are performing. At a glance, you can view changes to performance on a service’s operations.
You can also see how well a service is instrumented for tracing, and where you can make improvements.
Learn more: