Joins
Overview
JOIN combines rows from two tables or other table-like sources (views, table functions, etc.) into a new combined row or rows.
Joins are particularly useful when data in tables is organized relationally and the data in the different tables is linked together.
WITH
planets AS (SELECT 1 AS id, 'Jupiter' AS name, 142984 as diameter, 'km' as m_units
UNION ALL
SELECT 2 AS id, 'Saturn' AS name, 120500 as diameter, 'km' as m_units
UNION ALL
SELECT 3 AS id, 'Uranus' AS name, 51118 as diameter, 'km' as m_units
UNION ALL
SELECT 4 as id, 'Earth' AS name, 12742 as diameter, 'km' as m_units
UNION ALL
SELECT 5 as id, 'Venus' AS name, 12104 as diameter, 'km' as m_units
),
top2_moons AS ( SELECT 1 AS planet_id, 'Ganymede' AS name, 5268.2 AS diameter, 'km' as m_units
UNION ALL
SELECT 1 AS planet_id, 'Callisto' AS name, 4820.6 as diameter, 'km' as m_units
UNION ALL
SELECT 2 AS planet_id, 'Titan' AS name, 5149.46 AS diameter, 'km' as m_units
UNION ALL
SELECT 2 AS planet_id, 'Rhea' AS name, 1527.6 as diameter, 'km' as m_units
UNION ALL
SELECT 3 AS planet_id, 'Titania' AS name, 1576.8 AS diameter, 'km' as m_units
UNION ALL
SELECT 3 AS planet_id, 'Oberon' AS name, 1522.8 as diameter, 'km' as m_units
UNION ALL
SELECT 4 AS planet_id, 'Moon' AS name, 3474.8 as diameter, 'km' as m_units
UNION ALL
SELECT 6 as planet_id, 'Phobos' AS name, 22.2 as diameter, 'km' as m_units
UNION ALL
SELECT 6 as planet_id, 'Deimos' AS name, 12.6 as diameter, 'km' as m_units
)
SELECT p.name, p.diameter as planet_diameter, m.name as moon_name, m.diameter as moon_diameter
FROM planets p INNER JOIN top2_moons m
ON p.id= m.planet_id
ORDER BY 2 DESC, 4 DESC;+---------+-----------------+-----------+---------------+
| NAME | PLANET_DIAMETER | MOON_NAME | MOON_DIAMETER |
+---------+-----------------+-----------+---------------+
| Jupiter | 142984 | Ganymede | 5268.20 |
| Jupiter | 142984 | Callisto | 4820.60 |
| Saturn | 120500 | Titan | 5149.46 |
| Saturn | 120500 | Rhea | 1527.60 |
| Uranus | 51118 | Titania | 1576.80 |
| Uranus | 51118 | Oberon | 1522.80 |
| Earth | 12742 | Moon | 3474.80 |
+---------+-----------------+-----------+---------------+The related tables (or table-like objects) (in our example these are Common Table Expression (CTE)s named planets and top2_moons) can contain one or more columns (in our example these are column id in the table planets which contains values that match the column planet_id in the table top2_moons) whose values are related.
These columns can be used to join rows from one object to rows from the other object.
The JOIN statement mostly contains information about which criterion or criteria are decisive for which rows from one object are linked to the rows from the other object by specifying the names of the common columns (in our example it is ON p.id = m.planet_id). See CROSS JOIN for an example of a join without an ON clause and explicitly specified join condition.
A single JOIN statement can only link 2 table-like objects. However, several JOIN statements can be chained together to create complex linked rows that can contain information from several objects. See Multiple JOIN example for an example of possible syntax.
The result of a JOIN statement is a table-like object.
For more information about the JOIN syntax, see JOIN in the 'Query Syntax' section.
JOIN compatible Objects
Any table-like objects can be joined with the JOIN statement. For example:
- tables
- views
- an expression such as a result set returned by a subquery or table function. It is important that the expression evaluates to the equivalent of a table.
Types of Joins
The following types of joins are supported by MetaKraftwerk:
- INNER join
- OUTER (FULL, RIGHT or LEFT) join
- CROSS join
We create the objects planets and top2_moons as tables and use them in the further examples:
CREATE TABLE IF NOT EXISTS planets (id INTEGER, name STRING, diameter DECIMAL, m_units STRING);
INSERT INTO planets VALUES
(1,'Jupiter',142984, 'km'),
(2,'Saturn', 120500, 'km'),
(3,'Uranus', 51118 , 'km'),
(4,'Earth', 12742 , 'km'),
(5,'Venus', 12104 , 'km');
SELECT * FROM planets;+--------+---------+----------+---------+
| ID | NAME | DIAMETER | M_UNITS |
+--------+---------+----------+---------+
| 1 | Jupiter | 142984 | km |
| 2 | Saturn | 120500 | km |
| 3 | Uranus | 51118 | km |
| 4 | Earth | 12742 | km |
| 5 | Venus | 12104 | km |
+--------+---------+----------+---------+CREATE TABLE IF NOT EXISTS top2_moons (planet_id INTEGER, name STRING, diameter DECIMAL, m_units STRING);
INSERT INTO top2_moons VALUES
(1, 'Ganymede', 5268.2 , 'km'),
(1, 'Callisto', 4820.6 , 'km'),
(2, 'Titan', 5149.46, 'km'),
(2, 'Rhea', 1527.6 , 'km'),
(3, 'Titania', 1576.8 , 'km'),
(3, 'Oberon', 1522.8 , 'km'),
(4, 'Moon', 3474.8 , 'km'),
(6, 'Phobos', 22.2 , 'km'),
(6, 'Deimos', 12.6 , 'km');
SELECT * FROM top2_moons;+-----------+----------+----------+---------+
| PLANET_ID | NAME | DIAMETER | M_UNITS |
+-----------+----------+----------+---------+
| 1 | Ganymede | 5268 | km |
| 1 | Callisto | 4821 | km |
| 2 | Titan | 5149 | km |
| 2 | Rhea | 1528 | km |
| 3 | Titania | 1577 | km |
| 3 | Oberon | 1523 | km |
| 4 | Moon | 3475 | km |
| 6 | Phobos | 22 | km |
| 6 | Deimos | 13 | km |
+-----------+----------+----------+---------+INNER Join
An INNER JOIN joins each row in first object with the matching row or rows in the second object.
In our example above we used an INNER JOIN to join the CTE's by planet id and each row from planets was "extended" by one or more rows from top2_moons (compare Earth and other three planets).
The output of an INNER JOIN includes only rows that match the join condition in both joined objects. The example above does not contain rows for the moons of Venus (contained in in plantets) and Mars (contained in top2_moons with planet_id = 6) in the output because no corresponding rows were found in the other CTE.
See INNER JOIN description and INNER JOIN example for more information.
OUTER Joins
An outer join lists all rows in the specified table, in contrast to INNER JOIN.
See OUTER join descriptions and OUTER join examples for more information.
LEFT OUTER Join
LEFT OUTER JOIN retains all rows of the "left" object and fill the the row attributes of the "right" object, that has no related rows in the "left" object with NULLs:
Compare this results with the results of INNER JOIN:
SELECT p.name, p.diameter as planet_diameter, m.name as moon_name, m.diameter as moon_diameter
FROM planets p LEFT OUTER JOIN top2_moons m
ON p.id= m.planet_id
ORDER BY 2 DESC, 4 DESC;+---------+-----------------+-----------+---------------+
| NAME | PLANET_DIAMETER | MOON_NAME | MOON_DIAMETER |
+---------+-----------------+-----------+---------------+
| Jupiter | 142984 | Ganymede | 5268 |
| Jupiter | 142984 | Callisto | 4821 |
| Saturn | 120500 | Titan | 5149 |
| Saturn | 120500 | Rhea | 1528 |
| Uranus | 51118 | Titania | 1577 |
| Uranus | 51118 | Oberon | 1523 |
| Earth | 12742 | Moon | 3475 |
| Venus | 12104 | null | null |
+---------+-----------------+-----------+---------------+RIGHT OUTER join
RIGHT OUTER JOIN retains all rows of the right object and fill the the row attributes of the left object, that has no related rows in the right object with NULLs:
You can compare the results with the results of the INNER JOIN and LEFT OUTER JOIN.
SELECT p.name, p.diameter as planet_diameter, m.name as moon_name, m.diameter as moon_diameter
FROM planets p RIGHT OUTER JOIN top2_moons m
ON p.id= m.planet_id
ORDER BY 2 DESC, 4 DESC;+---------+-----------------+-----------+---------------+
| NAME | PLANET_DIAMETER | MOON_NAME | MOON_DIAMETER |
+---------+-----------------+-----------+---------------+
| null | null | Phobos | 22 |
| null | null | Deimos | 13 |
| Jupiter | 142984 | Ganymede | 5268 |
| Jupiter | 142984 | Callisto | 4821 |
| Saturn | 120500 | Titan | 5149 |
| Saturn | 120500 | Rhea | 1528 |
| Uranus | 51118 | Titania | 1577 |
| Uranus | 51118 | Oberon | 1523 |
| Earth | 12742 | Moon | 3475 |
+---------+-----------------+-----------+---------------+FULL OUTER join
FULL OUTER join causes all rows of the "right" and "left" object to be displayed in the result. As with LEFT or RIGHT OUTER JOIN, the missing information in the rows is completed with NULL.
SELECT p.name, p.diameter as planet_diameter, m.name as moon_name, m.diameter as moon_diameter
FROM planets p FULL OUTER JOIN top2_moons m
ON p.id= m.planet_id
ORDER BY 2 DESC, 4 DESC;+---------+-----------------+-----------+---------------+
| NAME | PLANET_DIAMETER | MOON_NAME | MOON_DIAMETER |
+---------+-----------------+-----------+---------------+
| null | null | Phobos | 22 |
| null | null | Deimos | 13 |
| Jupiter | 142984 | Ganymede | 5268 |
| Jupiter | 142984 | Callisto | 4821 |
| Saturn | 120500 | Titan | 5149 |
| Saturn | 120500 | Rhea | 1528 |
| Uranus | 51118 | Titania | 1577 |
| Uranus | 51118 | Oberon | 1523 |
| Earth | 12742 | Moon | 3475 |
| Venus | 12104 | null | null |
+---------+-----------------+-----------+---------------+CROSS JOIN
A CROSS JOIN creates a "Cartesian product" of the rows of both objects. These are all possible combinations of the rows.
Such results often occur when the join condition is not specified and cannot be implicitly inferred. Results of CROSS JOIN can contain a very large number of rows and place a heavy load on the system.
The number of resulting rows of CROSS JOIN can be calculated using the formula "N x M", where N is the number of rows in the first object and M is the number of rows in the second object.
See CROSS join description and CROSS join example for more information.
For example:
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT 3 AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 2 AS id, 'eleven' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t CROSS JOIN right_t;
-- OR
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT 3 AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 2 AS id, 'eleven' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t JOIN right_t; Without ON-clause the following table results:
+----+-------+----+--------+
| ID | NAME | ID | NAME |
+----+-------+----+--------+
| 1 | one | 1 | ten |
| 1 | one | 2 | eleven |
| 1 | one | 3 | twelve |
| 2 | two | 1 | ten |
| 2 | two | 2 | eleven |
| 2 | two | 3 | twelve |
| 3 | three | 1 | ten |
| 3 | three | 2 | eleven |
| 3 | three | 3 | twelve |
+----+-------+----+--------+CROSS JOIN can be useful and helpful if, for example, you use the WHERE clause in conjunction with it to filter the results.
The output in this case is equal to an INNER JOIN with an ON clause.
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT 3 AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 2 AS id, 'eleven' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t JOIN right_t
WHERE left_t.id = right_t.id; +----+-------+----+--------+
| ID | NAME | ID | NAME |
+----+-------+----+--------+
| 1 | one | 1 | ten |
| 2 | two | 2 | eleven |
| 3 | three | 3 | twelve |
+----+-------+----+--------+Compare with:
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT 3 AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 2 AS id, 'eleven' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t INNER JOIN right_t
ON left_t.id = right_t.id; +----+-------+----+--------+
| ID | NAME | ID | NAME |
+----+-------+----+--------+
| 1 | one | 1 | ten |
| 2 | two | 2 | eleven |
| 3 | three | 3 | twelve |
+----+-------+----+--------+Important Notes
Using JOINS in conjunction with WHERE and FROM... ON... clauses can produce different outputs under certain conditions. Thus, the procedures are not identical.
Example 1
For example, WHERE clause rows that contain NULL values in the columns relevant to the WHERE clause are filtered out:
Compare:
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT NULL AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 2 AS id, 'eleven' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t LEFT OUTER JOIN right_t
ON left_t.id = right_t.id; +------+-------+------+--------+
| ID | NAME | ID | NAME |
+------+-------+------+--------+
| 1 | one | 1 | ten |
| 2 | two | 2 | eleven |
| null | three | null | null |
+------+-------+------+--------+And
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT NULL AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 2 AS id, 'eleven' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t JOIN right_t
WHERE left_t.id = right_t.id; +------+-------+------+--------+
| ID | NAME | ID | NAME |
+------+-------+------+--------+
| 1 | one | 1 | ten |
| 2 | two | 2 | eleven |
+------+-------+------+--------+Example 2
If the rows of the “right” table after the join contain only NULL values, the WHERE clause filters out such rows from the output:
Compare:
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT 3 AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t LEFT OUTER JOIN right_t
ON left_t.id = right_t.id; +----+-------+------+--------+
| ID | NAME | ID | NAME |
+----+-------+------+--------+
| 1 | one | 1 | ten |
| 2 | two | null | null |
| 3 | three | 3 | twelve |
+----+-------+------+--------+And
WITH
left_t AS (SELECT 1 AS id, 'one' AS name
UNION ALL
SELECT 2 AS id, 'two' AS name
UNION ALL
SELECT 3 AS id, 'three' AS name
),
right_t AS (SELECT 1 AS id, 'ten' AS name
UNION ALL
SELECT 3 AS id, 'twelve' AS name
)
SELECT * FROM left_t JOIN right_t
WHERE left_t.id = right_t.id; +----+-------+----+--------+
| ID | NAME | ID | NAME |
+----+-------+----+--------+
| 1 | one | 1 | ten |
| 3 | three | 3 | twelve |
+----+-------+----+--------+Syntax spelling
There are in general two ways to join tables:
... FROM left_obj JOIN right_obj ... ON ...: this form is more flexible and has no problem with accidentally filtering rows that containNULLvalues.... FROM left_obj JOIN right_obj ... WHERE ....
The following additional spellings are allowed:
left_obj LEFT OUTER JOIN right_obj ON ...;=left_obj LEFT JOIN ... ON ;(the same applies toRIGHT ...andFULL OUTER JOIN)left_obj INNER JOIN right_obj ON ...;=left_obj JOIN right_obj ON ...;left_obj CROSS JOIN right_obj ...;=left_obj JOIN right_obj ...;left_obj CROSS JOIN right_obj WHERE...;=left_obj JOIN right_obj WHERE ...;left_obj CROSS JOIN right_obj WHERE ...;=left_obj, right_obj WHERE...;