UQL: join

Summary

Named Join Syntax

with
  <bound-input-query>...
join <join-expression> <default>...

Unnamed Join Syntax

(<left-query>; <right-query>) 
| join <join-expression> [left-default [, right-default]]

The UQL join stage, much like a SQL JOIN, merges two tables into a single table. It then combines timeseries values together using an arithmetic expression. You can build the input tables from any telemetry types; join can combine tables built from logs, metrics, spans, or a mix of these types.

There are two different syntaxes for joining: named join and unnamed join. Regardless of which syntax you use the underlying join semantics will be the same.

Supported Pipelines: spans timeseries, metrics, logs timeseries

Arguments

Named join

Name	Type	Required/Optional	Description
bound-input-query	string=query	2 or more	A query assigned a name
join-expression	value expression	Required	An expression that combines points from all input tables into a single output table
defaults	string=number	0 or more	Default value for points in a particular input table

Unnamed join

Name	Type	Required/Optional	Description
left-query	string	Required	Query that produces the left input table
right-query	string	Required	Query that produces the right input table
join-expression	value expression	Required	An expression that combines points from the left and right input tables
left-default	number	Optional	Default value for points in the left table
right-default	number	Optional	Default value for points in the right table

Details

Two-way joins

Say you want to chart “% CPU usage”. You might do this using two metrics: cpu.requests and CPU cpu.usage. You need to combine these metrics somehow, dividing usage / requests to get percent usage. In UQL, you do this with a join:

with
  requests = metric cpu.requests | latest | group_by [], sum;
  usage = metric cpu.usage | delta | group_by [], sum;
join usage / requests * 100

Now, let’s walk through the produced tables to see how they’re joined:

Requests table:

start_timestamp	limit_timestamp	value
0	0	10
60	60	10

Usage table:

start_timestamp	limit_timestamp	value
0	0	8
60	60	7

Table after join:

start_timestamp	limit_timestamp	value
0	0	80
60	60	70

Label columns

At its core, join has two parts. First, it merges rows from the input tables together to create a new combined table. This combined table will contain the union of all label columns that appear in either input table, plus two value columns. Then, the join stage combines the two value columns using the provided arithmetic expression.

How does the system decide which rows to merge? Joins in UQL are “natural joins”. A natural join “joins on” label columns that exist in both tables. “Joining on” a set of columns means that the query engine will merge rows where those columns have the same values. In summary, the query engine combines rows from both tables where the values of all shared columns match.

An example may be helpful. Imagine joining these two tables using the arithmetic a+b:

host	service	value
abc	foo	10
456	bar	10
123	baz	10

method	service	value
GET	foo	10
GET	bar	10
POST	bar	10

The resulting table looks like this:

host	method	service	value
abc	GET	foo	20
456	GET	bar	20
456	POST	bar	20

Defaults

Let’s revisit the “% CPU Usage” chart from earlier in the topic. In some cases when a process doesn’t use any CPU for a period of time no cpu.usage point is emitted. This happens because often metrics pipelines drop zero-valued delta points. In the earlier iteration of this example, you would see a blank area on the chart in this situation. If you want to see 0% CPU usage when there is no cpu.usage point, you have to use a default.

You can rewrite the query, adding usage=0:

with
  requests = metric cpu.requests | latest | group_by [], sum;
  usage = metric cpu.usage | delta | group_by [], sum;
join usage / requests * 100, usage=0

Now, let’s walk through the produced tables and how they’re joined:

Requests Table:

start_timestamp	limit_timestamp	value
0	0	10
60	60	10
120	120	10

Usage Table:

start_timestamp	limit_timestamp	value
0	0	8
60	60	7

Table after Join:

start_timestamp	limit_timestamp	value
0	0	80
60	60	70
120	120	0

Notice that the joined table has a row with the timestamp 120 even though this timestamp only appeared in one input table. This happened because the above query uses the default usage=0. This default says that if a row in the requests table has no corresponding row in the usage table, the join is performed as if there was a matching row in the usage table with the value 0.

You can also use defaults if the input tables have label columns. For example, you could write the same query as above, this time plotting % CPU Usage per-service:

with
  requests = metric cpu.requests | latest | group_by [service], sum;
  usage = metric cpu.usage | delta | group_by [service], sum;
join usage / requests * 100, usage=0

This works exactly as you’d expect.

However, there is one situation where you can’t use defaults. A table containing label columns that don’t appear in the table you’re joining it with can’t have a default. For example, this query attempting to find the fraction of global CPU requests that each service uses is illegal:

with
  requests = metric cpu.requests | latest | group_by [], sum;
  usage = metric cpu.usage | delta | group_by [service], sum;
join usage / requests * 100, usage=0

To understand why this query is illegal, imagine that the usage table is empty and the requests table has a single entry:

Usage:

service	start_timestamp	limit_timestamp	value

Requests:

start_timestamp	limit_timestamp	value
0	0	30

Because the above query has the default usage=0, the system will try to synthesize a row in the usage table with [start_timestamp=0, limit_timestamp=0, value=0] to match the row in the requests table. However, here it runs into a problem. What value should the service column have in this synthesized entry? Instead of guessing the value of service, UQL prohibits this type of query.

Unnamed joins

You can also write 2-way joins using an alternate “unnamed join” syntax. The semantics of join are identical, but things look a bit different.

(
  metric cpu.usage | delta | group_by [], sum;
  metric cpu.requests | latest | group_by [], sum;
) | join left / right * 100, 0

The name of the first join input is left and the name of the second is right. The 0 at the end of the join stage means that the left table has a default value of 0. A third argument (e.g. left / right * 100, 0, 0) would mean that both left and right tables have default values of 0. There is no way to specify a default only for the right side.

In unnamed joins, the parentheses which enclose the two input queries are called the table group. You can write stages after the table group and before the join keyword that will apply to both input queries before they are joined. This behavior is specific to unnamed joins; you cannot do this with a named join. For example, these two queries are equivalent because the delta and group_by stages are distributed into the table group.

# original query
(
  metric requests | delta 1m | group_by [hostname], sum;
  metric errors | delta 1m | group_by [hostname], sum
) | join right / left

# equivalent query
(
  metric requests;
  metric errors
) | delta 1m | group_by [hostname], sum | join right / left

Multi-telemetry joins

You can use both named an unnamed joins to combine timeseries derived from different telemetry types. For example, this join finds the “efficiency” of each pod at processing requests:

with
  requests = spans count | filter operation == "GET" && kube_app == "foo" | rate | group_by [kube_pod], sum;
  cpu = metric cpu.usage | latest | filter kube_app == "foo" | group_by [kube_pod], sum;
join requests / cpu

N-way joins

N-way joins allow you to join more than two tables in a single join statement. For example, you can use an n-way join to calculate the difference in error rate between yesterday and today:

with
  errors_today     = spans count | filter error == true | delta | group_by [], sum;
  total_today      = spans count | delta | group_by [], sum;
  errors_yesterday = spans count | time_shift 1d | filter error == true | delta | group_by [], sum;
  total_yesterday  = spans count | time_shift 1d | delta | group_by [], sum;
join (errors_today / total_today * 100) - (errors_yesterday / total_yesterday * 100)

N-way joins are syntactic sugar for a series of 2-way joins performed in the order of arithmetic precedence. For example, the query above performs joins in this order: (errors_today JOIN total_today) JOIN (errors_yesterday JOIN total_yesterday).

The arithmetic section of the join must use all inputs defined in the with section.

Also note that there is no unnamed n-way join syntax.

Defaults

Defaults for n-way joins resemble defaults in 2-way joins, but there is one important difference to call out. Consider the phony query:

with
  a = ...;
  b = ...;
  c = ...;
join a+b+c, a=0, b=0

Since an n-way join is a series of 2-way joins ordered by arithmetic precedence, the query engine first joins a+b with the defaults a=0, b=0. Next, the query engine joins the result of a+b (let’s call this ab) with c. Since a and b both have defaults, the compiler can actually find a default for ab by substituting a=0 and b=0 into the expression a+b. This tells us that the default for ab is 0 as well!

Finally, the query engine joins ab+c with the default ab=0.

Notice that the default ab=0 changes the query output in exactly one way. When this default is set and a point in c has no corresponding points in a or b the query will still output a point. If ab had no default the query wouldn’t output a point in this situation.

UQL: join

Summary

Arguments

Named join

Unnamed join

Details

Two-way joins

Label columns

Defaults

Unnamed joins

Multi-telemetry joins

N-way joins

Defaults

See also