Subqueries
A subquery, as the name suggests, is a query within another query. A subquery can be used to provide additional data that can be useful for operating on the data in the query.
A subquery can be scalar or non-scalar.
A scalar subquery is one that returns exactly one value: that is, exactly one row and one column. So can scalar subquery only contain one item in the SELECT list.
In contrast, there are non-scalar subqueries that can return multiple rows with one column, multiple columns with one row, or multiple rows with multiple columns.
If you want to try the examples yourself, please create the test table with this query.
create table if not exists my_folder.products (product_name STRING, price DECIMAL, product_category STRING);
INSERT INTO my_folder.products VALUES
('Quick Notebook', 20, 'notebook'),
('Nice Notebook', 28, 'notebook'),
('Small Notebook', 10, 'notebook'),
('Light Notebook', 44, 'notebook'),
('Big Notebook', 36, 'notebook'),
('Expensive Notebook', 60, 'notebook'),
('Quick Oven', 10, 'oven'),
('Nice Oven', 14, 'oven'),
('Small Oven', 5, 'oven'),
('Light Oven', 22, 'oven'),
('Big Oven', 18, 'oven'),
('Expensive Oven', 30, 'oven'),
('Quick Brush', 5, 'toothbrush'),
('Nice Brush', 7, 'toothbrush'),
('Small Brush', 3, 'toothbrush'),
('Light Brush', 11, 'toothbrush'),
('Big Brush', 9, 'toothbrush'),
('Expensive Brush', 15, 'toothbrush');Correlated subquery
If a subquery references one or more columns outside of itself, it is called a correlated subquery. A correlated subquery "filters" the table it references using informations from outer query. The return values of such a query depend on the row values of the outer query and vary from row to row.
For Example:
SELECT product_name, price
FROM my_folder.products p1
WHERE price > (
SELECT AVG(price)
FROM my_folder.products p2
WHERE p2.product_category = p1.product_category
);The outer query (p1) selects the product_name and price of the products. The inner query (p2) calculates the average price for the product category to which the product belongs. The condition WHERE p2.product_category = p1.product_category ensures that the average price is calculated only for the correct product_category, to which the product belongs in the subquery. The correlated query appears to be executed once per row of the outer query and returns a value for the defined "group" of that row.
If a correlated subquery returns (potentially) more than one row, a runtime error is generated.
Only subqueries, that by design ensures to return one row, are supported as correlated subquery by Metakraftwerk.
The GROUP BY clause is not allowed in a correlated subquery.
So, even though the aggregation function (such a AVG() in our example) is not strictly necessary to calculate the correct values (e.g. if the value to be returned by the subquery is already aggregated), this is necessary due to the requirement that a subquery can always return at most one row (and one column) for each row in the outer query.
This is thus an example of invalid correlated subquery:
SELECT product_name, price
FROM my_folder.products p1
WHERE product_name = 'Nice Brush' and price > (
SELECT price
FROM my_folder.products p2
WHERE p2.product_name = p1.product_name
);Unsupported subquery type cannot be evaluatedThe comparizon to Uncorrelated scalar subquery in WHERE clause once again illustrates the differences between correlated and uncorrelated subqueries.
Although this subquery only has one row and column as a result (is "scalar"):
SELECT price
FROM my_folder.products p2
WHERE p2.product_name ='Nice Brush';+-------+
| PRICE |
+-------+
| 7 |
+-------+Support and limitations
In WHERE clause MetaKraftwerk currently supports:
- scalar subqueries as correlated subqueries
- usage of Query operators
EXISTS,ANY | ALL, andINwith correlated subqueries.
Limitations:
- The
LIMITclauses are not allowed in correlated scalar subqueries. - Limitations of
EXISTS,ANY | ALL, andINoperators.
Uncorrelated subquery
An uncorrelated subquery is independent of the outer query. The results are returned to the outer query only once.
For example:
SELECT product_name, price
FROM my_folder.products p1
WHERE price >
(
SELECT AVG(price)
FROM my_folder.products p2
WHERE p2.product_category = 'notebook');This example selects products that are more expensive than the average price of the product in the Notebook category in this table. The subquery is independent of the outer query and does not need to be executed separately for each row of the outer query.
Uncorrelated scalar subqueries & WHERE clause
If a uncorrelated subquery is used in WHERE clause and returns more than one row, a runtime error is generated.
Since in following example the subquery returns only one row, it is a valid query:
SELECT product_name, price
FROM my_folder.products p1
WHERE price >
(SELECT price
FROM my_folder.products p2
WHERE p2.product_name = 'Quick Notebook'); +--------------------+-------+
| PRODUCT_NAME | PRICE |
+--------------------+-------+
| Nice Notebook | 28 |
| Light Notebook | 44 |
| Big Notebook | 36 |
| Expensive Notebook | 60 |
| Light Oven | 22 |
| Expensive Oven | 30 |
+--------------------+-------+But this one caused an error:
SELECT product_name, price
FROM my_folder.products p1
WHERE price >
( SELECT price
FROM my_folder.products p2
WHERE p2.product_category = 'notebook'); Single-row subquery returns more than one row.The comparizon to Correlated scalar subquery in WHERE clause once again illustrates the differences between correlated and uncorrelated subqueries.
Uncorrelated subqueries anywhere else.
MetaKraftwerk supports currently scalar and non-scalar subqueries as uncorrelated subqueries anywhere else as WHERE clause (see above) in the outer query where the value of an expression can be used.
For Example such query is valid query:
SELECT p1.product_name, p1.price, p1.product_category, p2.price as price_avg
FROM my_folder.products AS p1
LEFT OUTER JOIN (SELECT DISTINCT product_category, AVG(price) as price
FROM my_folder.products
GROUP BY product_category) AS p2
ON p1.product_category= p2.product_category;+--------------------+-------+------------------+-----------+
| PRODUCT_NAME | PRICE | PRODUCT_CATEGORY | PRICE_AVG |
+--------------------+-------+------------------+-----------+
| Quick Notebook | 20 | notebook | 33 |
| Nice Notebook | 28 | notebook | 33 |
| Small Notebook | 10 | notebook | 33 |
| Light Notebook | 44 | notebook | 33 |
| Big Notebook | 36 | notebook | 33 |
| Expensive Notebook | 60 | notebook | 33 |
| Quick Oven | 10 | oven | 16.5 |
| Nice Oven | 14 | oven | 16.5 |
| Small Oven | 5 | oven | 16.5 |
| Light Oven | 22 | oven | 16.5 |
| Big Oven | 18 | oven | 16.5 |
| Expensive Oven | 30 | oven | 16.5 |
| Quick Brush | 5 | toothbrush | 8.3 |
| Nice Brush | 7 | toothbrush | 8.3 |
| Small Brush | 3 | toothbrush | 8.3 |
| Light Brush | 11 | toothbrush | 8.3 |
| Big Brush | 9 | toothbrush | 8.3 |
| Expensive Brush | 15 | toothbrush | 8.3 |
+--------------------+-------+------------------+-----------+The subquery here returns 2 columns and multiple rows.
Support and limitations
MetaKraftwerk currently supports:
- scalar subqueries as uncorrelated subqueries in
WHEREclause - non-scalar subqueries as uncorrelated subqueries anywhere else in the outer query where the value of an expression can be used.
- usage of Query operators
EXISTS,ANY | ALL, andINwith uncorrelated subqueries.
Limitations:
See limitations of EXISTS, ANY | ALL, and IN operators.
Notes to returned values
In MetaKraftwerk:
- If no values can be returned from a scalar subquery, the subquery returns
NULL. - If no value can be returned in the column of the non-scalar subquery, the subquery returns
NULLas the value for the column. - If no rows are returned from the non-scalar subquery, the subquery returns 0 rows and not one row filled with
NULLs.
Subquery Operators
Subquery operators are operators, that operate on nested query expressions.
They can be used to operate with values that are:
- Returned in a
SELECTlist. - Grouped in a GROUP BY clause.
- Compared with other expressions in the WHERE or HAVING clause.
See Subqueries in Query Operators section for more information