Cloud Observability offers a way to quickly see how all your services and their operations are performing in one place - the Service Directory view.
You can also use our pre-built service dashboards or the Service health panel to view service health.
From here, you can:
When you first open Cloud Observability, you’re taken to the Service Directory. You can also access it from the navigation bar.
Your services are listed in alphabetical order. To make finding services easier, you can “favorite” a service so it always appears at the top of the list.
To find a service:
To favorite a service:
The Service Health view on the Deployments tab shows you the latency, error rate, and operation rate of your key operations (operations whose performance is strategic to the health of your system) on the selected service.
Key operations are displayed in order of magnitude of change in performance. For each chart, Cloud Observability displays a shaded yellow bar to indicate the magnitude of the change.
Cloud Observability measures two aspects of change: size and continuity. A full bar indicates that a large, sustained change has happened. Smaller bars indicate either a smaller change or one that did not last for the full time period. Only changes that are relative (i.e. a change of 10ms to 500ms is ranked higher than one of 1s to 2s) are considered.
The yellow bar means that an SLI had an objectively large change, regardless of service or operation. Cloud Observability’s algorithm runs on each SLI independently. For example, when the bar displays for an operation’s latency, that means latency has changed – not that its change was greater compared to the other SLIs.
When determining change, Cloud Observability compares the SLI of the baseline SLI time series to the comparison SLI time series. Those time periods are determined using the data currently visible in the charts.
You can change the amount of time displayed using the time period dropdown at the top right of the page.
The baseline and comparison time periods are determined as follows:
If there is one or more deployment markers visible:
If there are no deployment markers visible:
Cloud Observability compares the performance of the first half of the time period to the second half.
When you see performance changes, you can use the Span Explorer to begin your investigation.
When you implement an attribute to display versions of your service, a deployment marker displays at the time the deployment occurred. These markers allow you to quickly correlate deployment with a possible regression. By default, performance from all versions are shown. Use the View version dropdown to show data from just one version.
If a version attribute hasn’t been instrumented for the service, or you haven’t configured Cloud Observability to recognize the version attribute, an Instrument version attributes button displays.
By default, Cloud Observability dynamically determines key operations as the ingress or root operations that have the highest rate for that service. You can also manually select key operations so that they always display in this view.
By default, each project has 30 key operations. So if you manually select 10 key operations, Cloud Observability dynamically chooses 20 of the remaining highest rate ingress operations to display as well.
You can change the default number of key operations. Contact your Customer Success representative for more information.
To manually set key operations:
On the Deployments tab of the Service Directory, click Edit next to Key Operations.
In the dialog, select the operations you always want to be key operations, up to 30 per service. If you select less than 30, Cloud Observability dynamically determines the remaining key operations for you.
By default, the operations are sorted by the amount of detected change (largest to smallest). Use the dropdown to change the sort.
Also by default, for each operation only the latency percentile with the largest amount of change displays. You can change the charts to show more percentiles using the More ( ⋮ ) icon.
By default, the data shown is from the last 60 minutes. You can change that time period using the time picker. Use the < > controls to move backwards and forwards through time. You can view data from your retention window (default is three days).
You can also zoom in on a time period by clicking and dragging over the time period you want a closer look at. The charts redraw to report on just the period you selected.
To start an investigation into a performance change, click on an operation to open the Span Explorer view.
By default, a query is built for the service, filtered to the selected operation. You can add other filters and group-bys as needed using the mini query bar. Charts for latency, error rate, and operation rate show a time series graph and example spans. You can change the reported latency using the dropdown. If configured, deployment markers display with a tooltip that shows version information.
Spans without errors are green dots, spans with errors are red triangles. You can control the display of spans using the View dropdown. Click on a span in a chart to view it in the Trace view.
The Aggregates table below the charts shows the average latency time, error percentage, and rate, as well as the number of spans currently in the database. If you’ve added a group-by, each row represents an entity in that group. Hover over a row to filter the charts and the Span samples table to just that entity.
The Span samples table shows example spans returned by the query. If you’ve added a group-by, colored dots represent the group that the span is from. You can sort the spans or to search for specific spans. Click a row to view the span in the Trace view.
Use the More ( ⋮ ) icon to add this view to a notebook. The three charts are added to a notebook where you can do more advanced querying like group-by or time comparisons.
To view the relationships of the selected service and operation to upstream and downstream services and operations, click Create dependency map to add it to a notebook.
Use the More ( ⋮ ) icon and select Share to create a link to this Span Explorer view.
The Operations tab on the Service Directory view shows the selected service’s operations currently reporting to Cloud Observability in alphabetical order, along with performance metrics aggregated over the selected time period.
The table provides several useful performance metrics for each operation:
To see if other services are affecting an operation, view the operation in a notebook or dashboard and use the dependency map to view upstream and downstream services and their performance.
As with the Deployments tab, you can use Span Explorer to view latency, rate, and error percentages, and to view example spans.
Click an operation’s row to view its data in Span Explorer.
Streams are retained span queries that continuously collect latency, error rate and operation rate data. By default, data from span queries are persisted for three days. When you save a query as a Stream, the data is collected and persisted for a longer period of time.
To view all Streams for a service, click the Streams tab. The number on the tab tells you how many Streams exist for this service.
Create a Stream from the Operations tab by clicking Create Stream for an operation.
You can add a trichart that shows the Stream’s performance to either a notebook or a dashboard by clicking View Stream.
Add an Stream’s query to a notebook for when, during an investigation, you want to be able to run ad hoc queries, take notes, and save your analysis for use in postmortems or runbooks. Add the query to a dashboard when you want to monitor the performance over a period of time.
Click the Dashboards tab to view dashboards that include charts or a Stream for this service. The number on the tab tells you how many dashboards exist for this service.
Only dashboards that have charts that contain a filter for the service are shown.
Click a dashboard to view it.
Read Create and manange dashboards to learn more.
The data you can view and use in Cloud Observability depends on the quality of your tracing instrumentation. The better and more comprehensive your instrumentation is, the better Cloud Observability can collect and analyze your data to provide highly actionable information.
Cloud Observability analyzes the instrumentation on your services and determines how you can improve it to make your Cloud Observability experience even better. It can determine whether you instrumentation:
hostname
attributes to help find performance issues in different environments.Click the Instrumentation Quality tab to learn how well your instrumentation measures up. The number on the tab gives your score (based on 100%).
Learn more about what your score means and how to fix it.
Updated Jul 1, 2024