Cloud Observability ingests metrics from a number of sources as time series and saves them to a time series database. When you use Cloud Observability to view that data, it transforms it into data points plotted on a chart. Different sources send different data. Some may roll up to a 60 second average. Others might report multiple individual points and timestamps, or send an array of points. Some may even aggregate the data before sending it. Cloud Observability stores that data in form that’s sent and then when you make a query to that data, based on the metric kind and the time series operators chosen for the chart, it aligns the seconds to transform data uniformly. Once that happens, the aggregation method you select for the chart determines the values to display.
A time series is a sequence of measurements over a period of time. For example, if you recorded the temperature outside at every hour, the time series would be each of the values at every hour: 8:00 am - 55°, 9:00am - 57°, 10:00 am - 58°, and so on. Time series allow you to easily see changes over time when displayed as a chart.
Here’s an example of a time series for the
http.request metric being reported by two agents (the first number in each array is a timestamp):
[1607986123, 50.0] [1607986125, 100.0]
When displaying time series data, you may want the visualization of the data to be different depending on what you’re measuring. For example, if you’re measuring the temperature, you want to see the actual temperature at a particular point in time. Seeing just the change between the points in time would not be very helpful. A 5° between 50° and 55° is not a big deal, but it might be if it was between 95° and 100°. But if you were measuring HTTP requests to an endpoint, then you might only want to know about the change, or even the rate of change since the last point in time. Did the request rate go up or down? And by how much?
Because of these differences, Cloud Observability supports gauges and two types of counters - deltas and cumulative.
Metrics must be labeled as their kind before they are ingested. If when first ingested a metric name doesn’t match the kind recorded, it will be rejected and an error will be sent back to the client. If you want to change a metric’s kind, contact Customer Success at email@example.com.
Gauges represent an observed value at a specific point in time or over a specified range of time. Temperature readings are an example of a gauge metric. CPU usage is another example; you want to know exactly how much of an available resource is being used at a given point in time. Gauges are best when you don’t much care about the degree of change over time - you want the actual number at that time.
Deltas show how the values change from one reporting period (point on the graph) to the next. HTTP requests is an example of a delta metric. You want to see if requests are going up or down, and by how much.
Cumulative metrics add their value to the last value. They count the total number of things at a specific point in time, but as opposed to deltas, each value uses the same “start” timestamp to determine the value. An example of a cumulative metric is total web page hits. The value at each point in time increases from the last value, and you want to know how many of something you have accumulated at a given point in time.
A metric type represents the value type being reported. Cloud Observability supports the following metric types:
For existing Cloud Observability customers interested in tracking distribution metrics, please opt-in here. For new customers, this feature is already enabled in your account.
Aggregation is the process of taking the metrics stored in the database that may be irregularly spaced and converting these into the data points shown on the chart that are spaced evenly. That spacing (also called output period) is calculated by Cloud Observability such that, regardless of the query duration, each chart contains roughly 120 data points.
For example, if a chart is displaying an hour of data, there will be 2 data points per minute, or one every 30 seconds. What exact values are displayed every 30 seconds depends on the aggregation method you choose.
The minimum output period is 30 seconds.
Below, we will describe three different ways of performing aggregation.
latest operator can be used only for gauge-type metrics, and displays the last value reported to the database for a time period. So for example, if Cloud Observability receives a gauge time series array of
[71, 71, 71.5, 72] for a time period, Cloud Observability displays the value
72 and drops the others.
latest operator can only be used with gauge-type metrics.
delta operator computes the difference since the previous value. If Cloud Observability ingests a time series array for a delta metric kind of
[1,1,1,2,2,2,3,3] and the time series operator is set to
delta, Cloud Observability computes the value
15- the total amount that value changed since last reported. Deltas are useful when you want to know how much something has changed over a period of time. If you use
delta for something like HTTP requests, you’d be able to see how the number of requests has changed at a particular point in time.
Rather than telling you how much something has changed at a particular point in time,
rate tells you how many happened over a time period. Think of a car that went 50 miles in an hour (you’re interested in the number of miles) versus the fact that it went 50 miles per hour (you’re interested in how fast it goes).
In the requests time series above (
[1,1,1,2,2,2,3,3]), if the time series operator is set as
rate and time period is 10 seconds, instead of using
15 ( for
delta), Cloud Observability computes the value
When using counter metrics (delta or cumulative), you can choose between displaying that metric as a
delta or as a
rate when you create your chart.
Metric data often uses attributes to annotate the data with descriptions that help in getting your data to tell a more exact story. For example, metrics might use the
service attribute to show what service a metric was emitted from, or the
customer attribute to show which customer made the request. You can use attributes from your metric data to filter your query (to explicitly include/exclude metrics with certain attribute/value pairs) and to break out a single line of data points into separate data points for each value of an attribute.
For a metric like HTTP requests, one chart probably isn’t going to be very helpful. You’d see large numbers and if there’s a troubling spike, probably wouldn’t be able to pinpoint where the issue is coming from or who it’s affecting. By using attributes to filter your query, you can create concise charts that include only the data that’s important for this chart.
When you filter a query, Cloud Observability only retrieves data from the metric database that matches (includes) the attribute/value or that doesn’t match (exclude).
For example, as part of your service level agreements with your VIP accounts, you need to keep a close eye on performance for them. You might use the
customer attribute create a chart for each VIP customer.
Like filtering, grouping gives you more insight into your data by allowing you to “split apart” the metrics into groups. When you choose to group by an attribute, Cloud Observability creates separate data points (separate lines on a line chart or different color boxes on a bar chart) for each value of the attribute. Now you can get a sense of the distribution of the metric across the different attribute values.
Say you’ve added the filter
service: iOS so that the query only returns HTTP requests from the iOS service. But seeing all requests made by service might not give you the details you need to find an issue. If you group by the
customer attribute, then you can see how the service is performing for each customer.
sum of all values: The total of all points in the data.
For example, given the values of [10, 15, 50] the sum is 75.
Distribution type metrics are automatically summed and then aggregated into percentiles.
Let’s look at an example metric query starting with the time series reported by the agent and finishing with a useful chart in Cloud Observability. We’ll use
requests, which is a delta metric kind.
Here are three time series flushed from two different agents for the
requests metric, shown in a table. Both agents report every 30 seconds.
For simplicity in this example, we’ll assume Cloud Observability is only charting the first three minutes. Instead of showing a valid time stamp, we use a time relative to when Cloud Observability queried the metric database (
now = the time of the request).
|agent1||[now-15, 100], [now+15, 100]||method: post|
|agent1||[now-15, 150], [now+15, 125]||method: get|
|agent2||[now-15, 50], [now+15, 50]||method: post|
|agent2||[now-15, 100], [now+15, 100]||method: get|
Cloud Observability ingests these time series and stores them in the metric database. When you build a chart, the metric
requests is now available to select in the query builder.
Now Cloud Observability considers the time series operator.
requests is a delta metric, you can choose to display either the delta (the number of requests for that point in time) or the rate (the number of requests per second).
For example, this chart uses
delta and shows there are a total of 364.474 requests at 11:45 am.
This chart has the same query but uses
rate, and shows there are 8.099 requests per second at 11:45 am. .
delta. Now Cloud Observability knows to count the number of requests reported for each time series.
Remember for this example, we’re just charting the first three minutes. Cloud Observability requests time buckets of 60 seconds, so asks for metric values from
now+60 secs, and
now+120 secs for each method value.
Since the time is different from what is stored in the database, Cloud Observability needs to align the time. The first time series in the database is:
[[now-15, 100], [now+15, 100]
Cloud Observability aligns that by interpolating it to:
[-60, 0], [0, 50], [60, 0] [120 0]
This alignment allows the data points from all three time series line up on unified points. Here’s the alignment of all three time series:
|agent1||[-60, 0], [0, 100], [60, 0], [120, 0]||method: post|
|agent1||[-60, 0], [0, 125], [60, 0], [120, 0]||method: get|
|agent2||[-60, 0] [0, 50] [60, 0] [120, 0]||method: post|
|agent2||[-60, 0] [0, 100] [60, 0] [120, 0]||method: get|
Let’s say you want the chart to display a line for each value of the
method attribute. Now Cloud Observability knows it needs to create separate data points for each value of the
Let’s say we want the chart to display the maximum value at each time point.
We end up with the following data points:
For the method
post, we get the following:
[-60, 0], [0, 100], [60, 0], [120, 0]
Cloud Observability drops the value
50 from agent2, because it only reports the maximum value at any time.
For the method
get we get:
[-60, 0], [0, 125], [60, 0], [120, 0]
Cloud Observability drops the value
100 from agent2.
Updated Sep 23, 2021