Querying Hierarchical Data
In MetaKraftwerk you can store and query hierarchical data using JOINs or CONNECT BY clause.
See CONNECT BY in Query Syntax section and Common Table Expressions for more information.
Storing Hierarchical Data
Hierarchical data can be stored in a hierarchy of tables or a single table with one (or more) columns representing the hierarchy (e.g. indicating category of products).
Storing hierarchical data across multiple tables
Hierarchical data often stored in Relational databases by using different tables, from wich one table can serve as 'parent' table for an another table with 'child' data and at the same time, contains child data for another 'parent' data table. Data of each hierarchielevel can be stored in separated table.
For example, information about planets and they moons can be orgenized in a tables hierarchie, where planet is "parent table " for "table of moons".
Like here (please unfold):
CREATE TABLE IF NOT EXISTS planets (id INTEGER, name STRING, diameter DECIMAL, m_units STRING);
INSERT INTO planets VALUES
(1,'Jupiter',142984, 'km'),
(2,'Saturn', 120500, 'km'),
(3,'Uranus', 51118 , 'km'),
(4,'Earth', 12742 , 'km'),
(5,'Venus', 12104 , 'km');
SELECT * FROM planets;
CREATE TABLE IF NOT EXISTS top2_moons (planet_id INTEGER, name STRING, diameter DECIMAL, m_units STRING);
INSERT INTO top2_moons VALUES
(1, 'Ganymede', 5268.2 , 'km'),
(1, 'Callisto', 4820.6 , 'km'),
(2, 'Titan', 5149.46, 'km'),
(2, 'Rhea', 1527.6 , 'km'),
(3, 'Titania', 1576.8 , 'km'),
(3, 'Oberon', 1522.8 , 'km'),
(4, 'Moon', 3474.8 , 'km'),
(6, 'Phobos', 22.2 , 'km'),
(6, 'Deimos', 12.6 , 'km');
SELECT * FROM planets;
SELECT * FROM top2_moons;Storing Hierarchical Data in a Single Table
The same hierarchy can be stored in a table. This has the advantage that such data structure is more scalable. For example, the family tree, which was given a new level with each new generation. It is not known how many tables a genealogical tree needs to represent all generations.
In such cases, the data structure of a table can be designed in such a way that it reliably represents hierarchical data. For example, by using a column that contains the parent ID of each generation, so that the derivation of the origin is possible by determining the parent ID for the current ID.
We can thus rebuild one part of our example in one table:
Create space table (please unfold).
CREATE TABLE IF NOT EXISTS space(id INTEGER,
parent_id INTEGER,
name STRING,
object_type STRING);
INSERT INTO space VALUES
(1, NULL, 'universe', 'all'),
(10, 1, 'Milky Way galaxy','galaxy'),
(100, 10, 'Solar System', 'planetsystem'),
(1000, 100, 'Jupiter', 'planet'),
(2000, 100, 'Saturn', 'planet'),
(3000, 100, 'Uranus', 'planet'),
(4000, 100, 'Earth', 'planet'),
(5000, 100, 'Venus', 'planet'),
(6000, 100, 'Mars', 'planet'),
(1001, 1000, 'Ganymede', 'moon'),
(2001, 1000, 'Callisto', 'moon'),
(1001, 2000, 'Titan', 'moon'),
(2001, 2000, 'Rhea', 'moon'),
(1001, 3000, 'Titania', 'moon'),
(2001, 3000, 'Oberon', 'moon'),
(1001, 4000, 'Moon', 'moon'),
(1001, 6000, 'Phobos', 'moon'),
(2001, 6000, 'Deimos', 'moon');Storing hierarchical data in this form works best when data of all hierarchy levels have the same structure or even similar attributes. For example, in our case, these are "name" and "object_type", both of which are assigned STRING values.
If the data structures are very different, this type of storage may not be useful.
For example, to store numerical parameters for the objects, a new table would be recommended that would only contain IDs of the objects.
Query Hierarchical Data with JOIN
Let us find out, wich (to save space only 2 biggest moons for selected planets) moons has planets from the table:
Please create this table if you want to try the examples yourself: "Create space table".
SELECT
pls.name as parent_name,
mns.name AS moon_name
FROM space as pls
LEFT OUTER JOIN space as mns
ON mns.parent_id = pls.id
WHERE pls.object_type = 'planet';+-------------+-----------+
| PARENT_NAME | MOON_NAME |
+-------------+-----------+
| Jupiter | Ganymede |
| Jupiter | Callisto |
| Saturn | Titan |
| Saturn | Rhea |
| Uranus | Titania |
| Uranus | Oberon |
| Earth | Moon |
| Mars | Phobos |
| Mars | Deimos |
| Venus | null |
+-------------+-----------+The disadvantage of this approach is that the query has to be changed every time new levels are added. However, such a query can be extended to include new levels as often as required.
Using CONNECT BY to Query Hierarchical Data
CONNECT BY clause allows to design a query that allows flexible data retrieval from hierarchical data structures regardless of the number of levels.
CONNECT BY clause processes hierarchical data level by level and allows each level to access the data of the previous level by joining the table to itself. The CONNECT BY subclause of the FROM clause iterates to process the data.
For example, wie query the moons of a planets from our space table and create path column using SYS_CONNECT_BY_PATH() system function:
Please create this table if you want to try the examples yourself: "Create space table".
SELECT SYS_CONNECT_BY_PATH(name,'>') AS path, name
FROM space WHERE object_type = 'moon'
START WITH id = 1
CONNECT BY
parent_id = prior id
ORDER BY parent_id
;+----------------------------------------------------------+----------+
| PATH | NAME |
+----------------------------------------------------------+----------+
| >universe>Milky Way galaxy>Solar System>Jupiter>Ganymede | Ganymede |
| >universe>Milky Way galaxy>Solar System>Jupiter>Callisto | Callisto |
| >universe>Milky Way galaxy>Solar System>Saturn>Titan | Titan |
| >universe>Milky Way galaxy>Solar System>Saturn>Rhea | Rhea |
| >universe>Milky Way galaxy>Solar System>Uranus>Titania | Titania |
| >universe>Milky Way galaxy>Solar System>Uranus>Oberon | Oberon |
| >universe>Milky Way galaxy>Solar System>Earth>Moon | Moon |
| >universe>Milky Way galaxy>Solar System>Mars>Phobos | Phobos |
| >universe>Milky Way galaxy>Solar System>Mars>Deimos | Deimos |
+----------------------------------------------------------+----------+The disadvantage of this approach is, that CONNECT BY allows only self-joins and doesn’t allow the code to specify additional columns beyond those in the table itself, the pseudocolumn LEVEL and columns generated by the systemfunctions SYS_CONNECT_BY_PATH and CONNECT_BY_ROOT.
For syntax and other examples, see CONNECT BY in the Query Syntax section.
Non-Contiguous Hierarchies
So far we have only dealt with connected hierarchies. Disconnected hierarchies are those in which one of the levels can be missing: between the parent and the grandchild, the child level is also missing.
In the following example, the interstellar object Oumuamua does not belong to any planetary system in the table and "breaks" the hierarchy that is normal for all other objects.
Please create this table if you want to try the examples yourself: Create space2 table (please unfold).
CREATE TABLE IF NOT EXISTS space2(id INTEGER,
parent_id INTEGER,
name STRING,
object_type STRING);
INSERT INTO space2 VALUES
(1, NULL, 'universe', 'all'),
(10, 1, 'Milky Way galaxy', 'galaxy'),
(100, 10, 'Solar System', 'planetsystem'),
(200, 10, 'Alpha Centauri System', 'planetsystem'),
(3000, 300, 'Oumuamua (interstellar)','comet'),
(1000, 100, 'Sun', 'star'),
(2001, 200, 'Proxima Centauri', 'star'),
(2002, 200, 'Alpha Centauri A', 'star'),
(2003, 200, 'Alpha Centauri B', 'star'),
(2004, 200, 'Proxima Centauri b', 'planet'),
(2005, 200, 'Proxima Centauri c', 'planet'),
(2006, 200, 'Proxima Centauri d', 'planet'),
(300, 20, 'Solar System', 'planetsystem'),
(1001, 100, 'Jupiter', 'planet'),
(1002, 100, 'Saturn', 'planet'),
(1003, 100, 'Uranus', 'planet'),
(1004, 100, 'Earth', 'planet'),
(1005, 100, 'Venus', 'planet'),
(1006, 100, 'Mars', 'planet'),
(10001, 1001, 'Ganymede', 'moon'),
(10002, 1001, 'Callisto', 'moon'),
(10001, 1002, 'Titan', 'moon'),
(10002, 1002, 'Rhea', 'moon'),
(10001, 1003, 'Titania', 'moon'),
(10002, 1003, 'Oberon', 'moon'),
(10001, 1004, 'Moon', 'moon'),
(10001, 1006, 'Phobos', 'moon'),
(10002, 1006, 'Deimos', 'moon');SELECT SYS_CONNECT_BY_PATH(name,'>') AS path, name, object_type, level
FROM space2 WHERE object_type NOT IN ('planet', 'moon')
START WITH id = 1
CONNECT BY
parent_id = prior id
ORDER BY parent_id NULLS FIRST, id;
;+-------------------------------------------------------------------+-----------------------+--------------+
| PATH | NAME | OBJECT_TYPE |
+-------------------------------------------------------------------+-----------------------+--------------+
| >universe | universe | all |
| >universe>Milky Way galaxy | Milky Way galaxy | galaxy |
| >universe>Milky Way galaxy>Solar System | Solar System | planetsystem |
| >universe>Milky Way galaxy>Alpha Centauri System | Alpha Centauri System | planetsystem |
| >universe>Milky Way galaxy>Solar System>Sun | Sun | star |
| >universe>Milky Way galaxy>Alpha Centauri System>Proxima Centauri | Proxima Centauri | star |
| >universe>Milky Way galaxy>Alpha Centauri System>Alpha Centauri A | Alpha Centauri A | star |
| >universe>Milky Way galaxy>Alpha Centauri System>Alpha Centauri B | Alpha Centauri B | star |
+-------------------------------------------------------------------+-----------------------+--------------+As you can see, the interstellar object Oumuamua is not shown in the results, due to unknown parent_id = 300.
In order to correctly query hierarchical data using CONNECT BY, the data in the table must represent a single, contiguous hierarchy. Although it is possible to use CONNECT BY on a table that contains multiple (even independent) hierarchies, only one tree can be queried at a time and that tree must be contiguous.
To correct the space2 picture in this case, we add new category 'Interstellar Objects' at the second Level of the hierarchie:
INSERT INTO space2 VALUES
(300, 1, 'Interstellar Objects', 'interstellar');And then use the unchanged query:
SELECT SYS_CONNECT_BY_PATH(name,'>') AS path, name, object_type
FROM space2 WHERE object_type NOT IN ('planet', 'moon')
START WITH id = 1
CONNECT BY
parent_id = prior id
ORDER BY parent_id NULLS FIRST, id;
;+-------------------------------------------------------------------+-------------------------+--------------+
| PATH | NAME | OBJECT_TYPE |
+-------------------------------------------------------------------+-------------------------+--------------+
| >universe | universe | all |
| >universe>Milky Way galaxy | Milky Way galaxy | galaxy |
| >universe>Interstellar Objects | Interstellar Objects | interstellar |
| >universe>Milky Way galaxy>Solar System | Solar System | planetsystem |
| >universe>Milky Way galaxy>Alpha Centauri System | Alpha Centauri System | planetsystem |
| >universe>Milky Way galaxy>Solar System>Sun | Sun | star |
| >universe>Milky Way galaxy>Alpha Centauri System>Proxima Centauri | Proxima Centauri | star |
| >universe>Milky Way galaxy>Alpha Centauri System>Alpha Centauri A | Alpha Centauri A | star |
| >universe>Milky Way galaxy>Alpha Centauri System>Alpha Centauri B | Alpha Centauri B | star |
| >universe>Interstellar Objects>Oumuamua (interstellar) | Oumuamua (interstellar) | comet |
+-------------------------------------------------------------------+-------------------------+--------------+