Named Join Syntax
1
2
3
with
<bound-input-query>...
join <join-expression> <default>...
Unnamed Join Syntax
1
2
(<left-query>; <right-query>)
| join <join-expression> [left-default [, right-default]]
The UQL join stage, much like a SQL JOIN
, merges two tables into a single table. It then combines timeseries values together using an arithmetic expression. You can build the input tables from any telemetry types; join can combine tables built from logs, metrics, spans, or a mix of these types.
There are two different syntaxes for joining: named join and unnamed join. Regardless of which syntax you use the underlying join semantics will be the same.
Supported Pipelines: spans timeseries, metrics, logs timeseries
Name | Type | Required/Optional | Description |
---|---|---|---|
bound-input-query | string=query | 2 or more | A query assigned a name |
join-expression | value expression | Required | An expression that combines points from all input tables into a single output table |
defaults | string=number | 0 or more | Default value for points in a particular input table |
Name | Type | Required/Optional | Description |
---|---|---|---|
left-query | string | Required | Query that produces the left input table |
right-query | string | Required | Query that produces the right input table |
join-expression | value expression | Required | An expression that combines points from the left and right input tables |
left-default | number | Optional | Default value for points in the left table |
right-default | number | Optional | Default value for points in the right table |
Say you want to chart “% CPU usage”. You might do this using two metrics: cpu.requests
and CPU cpu.usage
. You need to combine these metrics somehow, dividing usage / requests
to get percent usage. In UQL, you do this with a join:
1
2
3
4
with
requests = metric cpu.requests | latest | group_by [], sum;
usage = metric cpu.usage | delta | group_by [], sum;
join usage / requests * 100
Now, let’s walk through the produced tables to see how they’re joined:
Requests table:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 10 |
60 | 60 | 10 |
Usage table:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 8 |
60 | 60 | 7 |
Table after join:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 80 |
60 | 60 | 70 |
At its core, join
has two parts. First, it merges rows from the input tables together to create a new combined table. This combined table will contain the union of all label columns that appear in either input table, plus two value columns. Then, the join
stage combines the two value columns using the provided arithmetic expression.
How does the system decide which rows to merge? Joins in UQL are “natural joins”. A natural join “joins on” label columns that exist in both tables. “Joining on” a set of columns means that the query engine will merge rows where those columns have the same values. In summary, the query engine combines rows from both tables where the values of all shared columns match.
An example may be helpful. Imagine joining these two tables using the arithmetic a+b
:
host | service | start_timestamp | limit_timestamp | value |
---|---|---|---|---|
abc | foo | 0 | 0 | 10 |
456 | bar | 0 | 0 | 10 |
123 | baz | 0 | 0 | 10 |
method | service | start_timestamp | limit_timestamp | value |
---|---|---|---|---|
GET | foo | 0 | 0 | 10 |
GET | bar | 0 | 0 | 10 |
POST | bar | 0 | 0 | 10 |
The resulting table looks like this:
host | method | service | start_timestamp | limit_timestamp | value |
---|---|---|---|---|---|
abc | GET | foo | 0 | 0 | 20 |
456 | GET | bar | 0 | 0 | 20 |
456 | POST | bar | 0 | 0 | 20 |
Let’s revisit the “% CPU Usage” chart from earlier in the topic. In some cases when a process doesn’t use any CPU for a period of time no cpu.usage
point is emitted. This happens because often metrics pipelines drop zero-valued delta points. In the earlier iteration of this example, you would see a blank area on the chart in this situation. If you want to see 0% CPU usage when there is no cpu.usage
point, you have to use a default.
You can rewrite the query, adding usage=0
:
1
2
3
4
with
requests = metric cpu.requests | latest | group_by [], sum;
usage = metric cpu.usage | delta | group_by [], sum;
join usage / requests * 100, usage=0
Now, let’s walk through the produced tables and how they’re joined:
Requests Table:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 10 |
60 | 60 | 10 |
120 | 120 | 10 |
Usage Table:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 8 |
60 | 60 | 7 |
Table after Join:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 80 |
60 | 60 | 70 |
120 | 120 | 0 |
Notice that the joined table has a row with the timestamp 120 even though this timestamp only appeared in one input table. This happened because the above query uses the default usage=0
. This default says that if a row in the requests table has no corresponding row in the usage table, the join is performed as if there was a matching row in the usage table with the value 0
.
You can also use defaults if the input tables have label columns. For example, you could write the same query as above, this time plotting % CPU Usage per-service:
1
2
3
4
with
requests = metric cpu.requests | latest | group_by [service], sum;
usage = metric cpu.usage | delta | group_by [service], sum;
join usage / requests * 100, usage=0
This works exactly as you’d expect.
However, there is one situation where you can’t use defaults. A table containing label columns that don’t appear in the table you’re joining it with can’t have a default. For example, this query attempting to find the fraction of global CPU requests that each service uses is illegal:
1
2
3
4
with
requests = metric cpu.requests | latest | group_by [], sum;
usage = metric cpu.usage | delta | group_by [service], sum;
join usage / requests * 100, usage=0
To understand why this query is illegal, imagine that the usage table is empty and the requests table has a single entry:
Usage:
service | start_timestamp | limit_timestamp | value |
---|---|---|---|
Requests:
start_timestamp | limit_timestamp | value |
---|---|---|
0 | 0 | 30 |
Because the above query has the default usage=0
, the system will try to synthesize a row in the usage table with [start_timestamp=0, limit_timestamp=0, value=0]
to match the row in the requests table. However, here it runs into a problem. What value should the service column have in this synthesized entry? Instead of guessing the value of service, UQL prohibits this type of query.
You can also write 2-way joins using an alternate “unnamed join” syntax. The semantics of join are identical, but things look a bit different.
1
2
3
4
(
metric cpu.usage | delta | group_by [], sum;
metric cpu.requests | latest | group_by [], sum;
) | join left / right * 100, 0
The name of the first join input is left
and the name of the second is right
. The 0 at the end of the join
stage means that the left table has a default value of 0. A third argument (e.g. left / right * 100, 0, 0
) would mean that both left
and right
tables have default values of 0. There is no way to specify a default only for the right side.
In unnamed joins, the parentheses which enclose the two input queries are called the table group. You can write stages after the table group and before the join
keyword that will apply to both input queries before they are joined. This behavior is specific to unnamed joins; you cannot do this with a named join. For example, these two queries are equivalent because the delta
and group_by
stages are distributed into the table group.
1
2
3
4
5
6
7
8
9
10
11
# original query
(
metric requests | delta 1m | group_by [hostname], sum;
metric errors | delta 1m | group_by [hostname], sum
) | join right / left
# equivalent query
(
metric requests;
metric errors
) | delta 1m | group_by [hostname], sum | join right / left
You can use both named an unnamed joins to combine timeseries derived from different telemetry types. For example, this join finds the “efficiency” of each pod at processing requests:
1
2
3
4
with
requests = spans count | filter operation == "GET" && kube_app == "foo" | rate | group_by [kube_pod], sum;
cpu = metric cpu.usage | latest | filter kube_app == "foo" | group_by [kube_pod], sum;
join requests / cpu
N-way joins allow you to join more than two tables in a single join statement. For example, you can use an n-way join to calculate the difference in error rate between yesterday and today:
1
2
3
4
5
6
with
errors_today = spans count | filter error == true | delta | group_by [], sum;
total_today = spans count | delta | group_by [], sum;
errors_yesterday = spans count | time_shift 1d | filter error == true | delta | group_by [], sum;
total_yesterday = spans count | time_shift 1d | delta | group_by [], sum;
join (errors_today / total_today * 100) - (errors_yesterday / total_yesterday * 100)
N-way joins are syntactic sugar for a series of 2-way joins performed in the order of arithmetic precedence. For example, the query above performs joins in this order: (errors_today JOIN total_today) JOIN (errors_yesterday JOIN total_yesterday)
.
The arithmetic section of the join must use all inputs defined in the with
section.
Also note that there is no unnamed n-way join syntax.
Defaults for n-way joins resemble defaults in 2-way joins, but there is one important difference to call out. Consider the phony query:
1
2
3
4
5
with
a = ...;
b = ...;
c = ...;
join a+b+c, a=0, b=0
Since an n-way join is a series of 2-way joins ordered by arithmetic precedence, the query engine first joins a+b
with the defaults a=0, b=0
. Next, the query engine joins the result of a+b
(let’s call this ab
) with c
. Since a
and b
both have defaults, the compiler can actually find a default for ab
by substituting a=0
and b=0
into the expression a+b
. This tells us that the default for ab
is 0 as well!
Finally, the query engine joins ab+c
with the default ab=0
.
Notice that the default ab=0
changes the query output in exactly one way. When this default is set and a point in c
has no corresponding points in a
or b
the query will still output a point. If ab
had no default the query wouldn’t output a point in this situation.
Updated Jan 3, 2024