Cloud Observability’s Unified Query Language (UQL) can be used to retrieve and process your metric data. This guide will help you understand how you can use UQL to operate on histogram metrics.
For more details on specific operations, see the UQL Reference. We also have a UQL Cheatsheet to help you build queries.
A distribution metric (sometimes also called a histogram) is a metric type that samples value observations, allowing you to approximate individual values for aggregations and calculations in a way that is cheaper than actually storing every individual value. Distribution metrics do this by measuring the frequency of value observations that fall into specific, pre-defined buckets.
For example, if you wish to measure the median latency of requests to a particular service, you can use a distribution metric. Instead of storing all the durations for every single request to the service, you can accurately approximate the true median by storing the frequency of requests that fall into particular duration buckets.
Distribution metrics are useful when you’re not bothered about having the exact values for a time series, but want to be able to perform percentile calculations on a time series without having to capture every value.
Storing distribution metrics is a significantly cheaper and faster way to store and query large quantities of data than storing a scalar metric with a very high number of points.
Querying distribution metrics over a wide time window may take slightly longer than querying scalar metrics over the same time window, because of the size of the distribution metrics.
Cloud Observability Public APIs currently only support returning scalar time series (made up of int or float values). This means you either must use a percentile
, dist_count
, or dist_sum
point operator or a group_by
stage with an appropriate reducer to convert the distribution values into scalar values.
If you are using the Cloud Observability web application with a Heatmap chart type, then you are able to query a distribution metric without transforming the distribution values into scalar values.
UQL supports latest
aligners for gauge kind distribution metrics and delta
aligners for delta kind distribution metrics.
The latest
operator will take the latest (or most recent) distribution in the input window.
The delta
operator performs a delta alignment across each bucket for each distribution time series in the metric, such that each point in the aligned output distribution metric has a population that does not overlap with other points. Just like with scalar metric time series, you can provide an input window for the delta
operator.
The percentile point operator operates on the value
column of the input time series, calculating the n
th percentile for the distribution input, where n
is a float value between 0 and 100. You must use the keyword value
for the percentile point operator to work on the input distribution metric.
The 90th percentile of response size, summed up by service
1
2
3
4
metric response.size.bytes
| delta
| group_by [service], sum
| point percentile(value, 90)
You can also calculate multiple percentiles from a distribution, producing multiple scalar time series in response.
The 90th percentile, 99th percentile, and 99.9th percentile of response size, summed up by service
1
2
3
4
5
6
metric response.size.bytes
| delta
| group_by [service], sum
| point percentile(value, 90),
percentile(value, 99),
percentile(value, 99.9)
Like the percentile point operator, the distribution sum and distribution count point operators operate on the value
column of the input time series. The dist_sum
point operator calculates the sum of all values within the distribution. The dist_count
point operator calculates the total number of values within the population of the distribution.
The average request latency, summed up by service
1
2
3
4
metric request.latency.millis
| delta
| group_by [service], sum
| point dist_sum(value) / dist_count(value)
There are two reducers to be used with reduce
and group_by
stages that UQL supports: count
and sum
.
count
returns the count of distribution time series as input (not the total number of values within the population of the distribution)
sum
will sum up distribution input time series (not the sum of all values within a distribution). The sum
reducer used in a group_by
adds together distribution points with the same label value.
Where one places the group_by
statement when querying a distribution is important. The below queries are not equivalent: the first query sums up the distribution points with the same service
label, and the second query sums up the 90th percentile of each distribution with the same service
label.
1
2
3
4
metric response.size.bytes
| delta
| group_by [service], sum
| point percentile(value, 90)
1
2
3
4
metric response.size.bytes
| delta
| point percentile(value, 90)
| group_by [service], sum
Get started with spans queries in UQL
Updated Oct 4, 2022