Skip to content
Guides>SQL Guide>Subqueries

Subqueries

A subquery, as the name suggests, is a query within another query. A subquery can be used to provide additional data that can be useful for operating on the data in the query.

A subquery can be scalar or non-scalar.

A scalar subquery is one that returns exactly one value: that is, exactly one row and one column. So can scalar subquery only contain one item in the SELECT list.

In contrast, there are non-scalar subqueries that can return multiple rows with one column, multiple columns with one row, or multiple rows with multiple columns.

If you want to try the examples yourself, please create the test table with this query.
sql
create table if not exists my_folder.products (product_name STRING, price DECIMAL, product_category STRING); 
INSERT INTO my_folder.products VALUES 
('Quick Notebook',     20, 'notebook'), 
('Nice Notebook',      28, 'notebook'), 
('Small Notebook',     10, 'notebook'), 
('Light Notebook',     44, 'notebook'), 
('Big Notebook',       36, 'notebook'), 
('Expensive Notebook', 60, 'notebook'), 
('Quick Oven',         10, 'oven'), 
('Nice Oven',          14, 'oven'), 
('Small Oven',         5,  'oven'), 
('Light Oven',         22, 'oven'), 
('Big Oven',           18, 'oven'), 
('Expensive Oven',     30, 'oven'), 
('Quick Brush',        5,  'toothbrush'), 
('Nice Brush',         7,  'toothbrush'), 
('Small Brush',        3,  'toothbrush'), 
('Light Brush',        11, 'toothbrush'), 
('Big Brush',          9,  'toothbrush'), 
('Expensive Brush',    15, 'toothbrush');

Correlated subquery

If a subquery references one or more columns outside of itself, it is called a correlated subquery. A correlated subquery "filters" the table it references using informations from outer query. The return values ​​of such a query depend on the row values ​​of the outer query and vary from row to row.

For Example:

sql
SELECT product_name, price
FROM my_folder.products p1
WHERE price > (
    SELECT AVG(price)
    FROM my_folder.products p2
    WHERE p2.product_category = p1.product_category
    );

The outer query (p1) selects the product_name and price of the products. The inner query (p2) calculates the average price for the product category to which the product belongs. The condition WHERE p2.product_category = p1.product_category ensures that the average price is calculated only for the correct product_category, to which the product belongs in the subquery. The correlated query appears to be executed once per row of the outer query and returns a value for the defined "group" of that row.

If a correlated subquery returns (potentially) more than one row, a runtime error is generated.

Only subqueries, that by design ensures to return one row, are supported as correlated subquery by Metakraftwerk.

The GROUP BY clause is not allowed in a correlated subquery.

So, even though the aggregation function (such a AVG() in our example) is not strictly necessary to calculate the correct values ​​(e.g. if the value to be returned by the subquery is already aggregated), this is necessary due to the requirement that a subquery can always return at most one row (and one column) for each row in the outer query.

This is thus an example of invalid correlated subquery:

sql
SELECT product_name, price
FROM my_folder.products p1
WHERE product_name = 'Nice Brush' and price > (
      SELECT price                                   
      FROM my_folder.products p2
      WHERE p2.product_name = p1.product_name
      );
txt
Unsupported subquery type cannot be evaluated

The comparizon to Uncorrelated scalar subquery in WHERE clause once again illustrates the differences between correlated and uncorrelated subqueries.

Although this subquery only has one row and column as a result (is "scalar"):

sql
SELECT price
  FROM my_folder.products p2
  WHERE p2.product_name ='Nice Brush';
txt
+-------+
| PRICE |
+-------+
| 7     |
+-------+

Support and limitations

In WHERE clause MetaKraftwerk currently supports:

Limitations:

Uncorrelated subquery

An uncorrelated subquery is independent of the outer query. The results are returned to the outer query only once.

For example:

sql
SELECT product_name, price
FROM my_folder.products p1
WHERE price > 
    (
      SELECT AVG(price) 
      FROM my_folder.products p2
      WHERE p2.product_category = 'notebook');

This example selects products that are more expensive than the average price of the product in the Notebook category in this table. The subquery is independent of the outer query and does not need to be executed separately for each row of the outer query.

Uncorrelated scalar subqueries & WHERE clause

If a uncorrelated subquery is used in WHERE clause and returns more than one row, a runtime error is generated.

Since in following example the subquery returns only one row, it is a valid query:

sql
SELECT product_name, price
FROM my_folder.products p1
WHERE price > 
    (SELECT price 
      FROM my_folder.products p2
      WHERE p2.product_name = 'Quick Notebook'); 
txt
+--------------------+-------+
| PRODUCT_NAME       | PRICE |
+--------------------+-------+
| Nice Notebook      | 28    |
| Light Notebook     | 44    |
| Big Notebook       | 36    |
| Expensive Notebook | 60    |
| Light Oven         | 22    |
| Expensive Oven     | 30    |
+--------------------+-------+

But this one caused an error:

sql
SELECT product_name, price
FROM my_folder.products p1
WHERE price > 
    ( SELECT price 
      FROM my_folder.products p2
      WHERE p2.product_category = 'notebook'); 
txt
Single-row subquery returns more than one row.

The comparizon to Correlated scalar subquery in WHERE clause once again illustrates the differences between correlated and uncorrelated subqueries.

Uncorrelated subqueries anywhere else.

MetaKraftwerk supports currently scalar and non-scalar subqueries as uncorrelated subqueries anywhere else as WHERE clause (see above) in the outer query where the value of an expression can be used.

For Example such query is valid query:

sql
SELECT p1.product_name, p1.price, p1.product_category, p2.price as price_avg
    FROM my_folder.products AS p1 
    LEFT OUTER JOIN (SELECT DISTINCT product_category, AVG(price) as price
                            FROM my_folder.products
                            GROUP BY product_category) AS p2
    ON p1.product_category= p2.product_category;
txt
+--------------------+-------+------------------+-----------+
| PRODUCT_NAME       | PRICE | PRODUCT_CATEGORY | PRICE_AVG |
+--------------------+-------+------------------+-----------+
| Quick Notebook     | 20    | notebook         | 33        |
| Nice Notebook      | 28    | notebook         | 33        |
| Small Notebook     | 10    | notebook         | 33        |
| Light Notebook     | 44    | notebook         | 33        |
| Big Notebook       | 36    | notebook         | 33        |
| Expensive Notebook | 60    | notebook         | 33        |
| Quick Oven         | 10    | oven             | 16.5      |
| Nice Oven          | 14    | oven             | 16.5      |
| Small Oven         | 5     | oven             | 16.5      |
| Light Oven         | 22    | oven             | 16.5      |
| Big Oven           | 18    | oven             | 16.5      |
| Expensive Oven     | 30    | oven             | 16.5      |
| Quick Brush        | 5     | toothbrush       | 8.3       |
| Nice Brush         | 7     | toothbrush       | 8.3       |
| Small Brush        | 3     | toothbrush       | 8.3       |
| Light Brush        | 11    | toothbrush       | 8.3       |
| Big Brush          | 9     | toothbrush       | 8.3       |
| Expensive Brush    | 15    | toothbrush       | 8.3       |
+--------------------+-------+------------------+-----------+

The subquery here returns 2 columns and multiple rows.

Support and limitations

MetaKraftwerk currently supports:

  • scalar subqueries as uncorrelated subqueries in WHERE clause
  • non-scalar subqueries as uncorrelated subqueries anywhere else in the outer query where the value of an expression can be used.
  • usage of Query operators EXISTS, ANY | ALL, and IN with uncorrelated subqueries.

Limitations:

See limitations of EXISTS, ANY | ALL, and IN operators.

Notes to returned values

In MetaKraftwerk:

  • If no values can be returned from a scalar subquery, the subquery returns NULL.
  • If no value can be returned in the column of the non-scalar subquery, the subquery returns NULL as the value for the column.
  • If no rows are returned from the non-scalar subquery, the subquery returns 0 rows and not one row filled with NULLs.

Subquery Operators

Subquery operators are operators, that operate on nested query expressions.

They can be used to operate with values that are:

  • Returned in a SELECT list.
  • Grouped in a GROUP BY clause.
  • Compared with other expressions in the WHERE or HAVING clause.

See Subqueries in Query Operators section for more information