Summary

Named Join Syntax

1
2
3
with
  <bound-input-query>...
join <join-expression> <default>...

Unnamed Join Syntax

1
2
(<left-query>; <right-query>) 
| join <join-expression> [left-default [, right-default]]

The UQL join stage, much like a SQL JOIN, merges two tables into a single table. It then combines timeseries values together using an arithmetic expression. You can build the input tables from any telemetry types; join can combine tables built from logs, metrics, spans, or a mix of these types.

There are two different syntaxes for joining: named join and unnamed join. Regardless of which syntax you use the underlying join semantics will be the same.


Supported Pipelines: spans timeseries, metrics, logs timeseries

Arguments

Named join

Name Type Required/Optional Description
bound-input-query string=query 2 or more A query assigned a name
join-expression value expression Required An expression that combines points from all input tables into a single output table
defaults string=number 0 or more Default value for points in a particular input table

Unnamed join

Name Type Required/Optional Description
left-query string Required Query that produces the left input table
right-query string Required Query that produces the right input table
join-expression value expression Required An expression that combines points from the left and right input tables
left-default number Optional Default value for points in the left table
right-default number Optional Default value for points in the right table

Details

Two-way joins

Say you want to chart “% CPU usage”. You might do this using two metrics: cpu.requests and CPU cpu.usage. You need to combine these metrics somehow, dividing usage / requests to get percent usage. In UQL, you do this with a join:

1
2
3
4
with
  requests = metric cpu.requests | latest | group_by [], sum;
  usage = metric cpu.usage | delta | group_by [], sum;
join usage / requests * 100

Now, let’s walk through the produced tables to see how they’re joined:

Requests table:

start_timestamp limit_timestamp value
0 0 10
60 60 10

Usage table:

start_timestamp limit_timestamp value
0 0 8
60 60 7

Table after join:

start_timestamp limit_timestamp value
0 0 80
60 60 70

Label columns

At its core, join has two parts. First, it merges rows from the input tables together to create a new combined table. This combined table will contain the union of all label columns that appear in either input table, plus two value columns. Then, the join stage combines the two value columns using the provided arithmetic expression.

How does the system decide which rows to merge? Joins in UQL are “natural joins”. A natural join “joins on” label columns that exist in both tables. “Joining on” a set of columns means that the query engine will merge rows where those columns have the same values. In summary, the query engine combines rows from both tables where the values of all shared columns match.

An example may be helpful. Imagine joining these two tables using the arithmetic a+b:

host service start_timestamp limit_timestamp value
abc foo 0 0 10
456 bar 0 0 10
123 baz 0 0 10
method service start_timestamp limit_timestamp value
GET foo 0 0 10
GET bar 0 0 10
POST bar 0 0 10

The resulting table looks like this:

host method service start_timestamp limit_timestamp value
abc GET foo 0 0 20
456 GET bar 0 0 20
456 POST bar 0 0 20

Defaults

Let’s revisit the “% CPU Usage” chart from earlier in the topic. In some cases when a process doesn’t use any CPU for a period of time no cpu.usage point is emitted. This happens because often metrics pipelines drop zero-valued delta points. In the earlier iteration of this example, you would see a blank area on the chart in this situation. If you want to see 0% CPU usage when there is no cpu.usage point, you have to use a default.

You can rewrite the query, adding usage=0:

1
2
3
4
with
  requests = metric cpu.requests | latest | group_by [], sum;
  usage = metric cpu.usage | delta | group_by [], sum;
join usage / requests * 100, usage=0

Now, let’s walk through the produced tables and how they’re joined:

Requests Table:

start_timestamp limit_timestamp value
0 0 10
60 60 10
120 120 10

Usage Table:

start_timestamp limit_timestamp value
0 0 8
60 60 7

Table after Join:

start_timestamp limit_timestamp value
0 0 80
60 60 70
120 120 0

Notice that the joined table has a row with the timestamp 120 even though this timestamp only appeared in one input table. This happened because the above query uses the default usage=0. This default says that if a row in the requests table has no corresponding row in the usage table, the join is performed as if there was a matching row in the usage table with the value 0.

You can also use defaults if the input tables have label columns. For example, you could write the same query as above, this time plotting % CPU Usage per-service:

1
2
3
4
with
  requests = metric cpu.requests | latest | group_by [service], sum;
  usage = metric cpu.usage | delta | group_by [service], sum;
join usage / requests * 100, usage=0

This works exactly as you’d expect.

However, there is one situation where you can’t use defaults. A table containing label columns that don’t appear in the table you’re joining it with can’t have a default. For example, this query attempting to find the fraction of global CPU requests that each service uses is illegal:

1
2
3
4
with
  requests = metric cpu.requests | latest | group_by [], sum;
  usage = metric cpu.usage | delta | group_by [service], sum;
join usage / requests * 100, usage=0

To understand why this query is illegal, imagine that the usage table is empty and the requests table has a single entry:

Usage:

service start_timestamp limit_timestamp value
       

Requests:

start_timestamp limit_timestamp value
0 0 30

Because the above query has the default usage=0, the system will try to synthesize a row in the usage table with [start_timestamp=0, limit_timestamp=0, value=0] to match the row in the requests table. However, here it runs into a problem. What value should the service column have in this synthesized entry? Instead of guessing the value of service, UQL prohibits this type of query.

Unnamed joins

You can also write 2-way joins using an alternate “unnamed join” syntax. The semantics of join are identical, but things look a bit different.

1
2
3
4
(
  metric cpu.usage | delta | group_by [], sum;
  metric cpu.requests | latest | group_by [], sum;
) | join left / right * 100, 0

The name of the first join input is left and the name of the second is right. The 0 at the end of the join stage means that the left table has a default value of 0. A third argument (e.g. left / right * 100, 0, 0) would mean that both left and right tables have default values of 0. There is no way to specify a default only for the right side.

In unnamed joins, the parentheses which enclose the two input queries are called the table group. You can write stages after the table group and before the join keyword that will apply to both input queries before they are joined. This behavior is specific to unnamed joins; you cannot do this with a named join. For example, these two queries are equivalent because the delta and group_by stages are distributed into the table group.

1
2
3
4
5
6
7
8
9
10
11
# original query
(
  metric requests | delta 1m | group_by [hostname], sum;
  metric errors | delta 1m | group_by [hostname], sum
) | join right / left

# equivalent query
(
  metric requests;
  metric errors
) | delta 1m | group_by [hostname], sum | join right / left

Multi-telemetry joins

You can use both named an unnamed joins to combine timeseries derived from different telemetry types. For example, this join finds the “efficiency” of each pod at processing requests:

1
2
3
4
with
  requests = spans count | filter operation == "GET" && kube_app == "foo" | rate | group_by [kube_pod], sum;
  cpu = metric cpu.usage | latest | filter kube_app == "foo" | group_by [kube_pod], sum;
join requests / cpu

N-way joins

N-way joins allow you to join more than two tables in a single join statement. For example, you can use an n-way join to calculate the difference in error rate between yesterday and today:

1
2
3
4
5
6
with
  errors_today     = spans count | filter error == true | delta | group_by [], sum;
  total_today      = spans count | delta | group_by [], sum;
  errors_yesterday = spans count | time_shift 1d | filter error == true | delta | group_by [], sum;
  total_yesterday  = spans count | time_shift 1d | delta | group_by [], sum;
join (errors_today / total_today * 100) - (errors_yesterday / total_yesterday * 100)

N-way joins are syntactic sugar for a series of 2-way joins performed in the order of arithmetic precedence. For example, the query above performs joins in this order: (errors_today JOIN total_today) JOIN (errors_yesterday JOIN total_yesterday).

The arithmetic section of the join must use all inputs defined in the with section.

Also note that there is no unnamed n-way join syntax.

Defaults

Defaults for n-way joins resemble defaults in 2-way joins, but there is one important difference to call out. Consider the phony query:

1
2
3
4
5
with
  a = ...;
  b = ...;
  c = ...;
join a+b+c, a=0, b=0

Since an n-way join is a series of 2-way joins ordered by arithmetic precedence, the query engine first joins a+b with the defaults a=0, b=0. Next, the query engine joins the result of a+b (let’s call this ab) with c. Since a and b both have defaults, the compiler can actually find a default for ab by substituting a=0 and b=0 into the expression a+b. This tells us that the default for ab is 0 as well!

Finally, the query engine joins ab+c with the default ab=0.

Notice that the default ab=0 changes the query output in exactly one way. When this default is set and a point in c has no corresponding points in a or b the query will still output a point. If ab had no default the query wouldn’t output a point in this situation.

See also

UQL reference

UQL cheat sheet

Updated Jan 3, 2024