Skip to content

Subquery

This document introduces you to the subquery function of MO Intelligence.

Overview

A subquery is a SQL expression nested in another query. With the help of a subquery, the query results of another query can be used in one query.

Generally speaking, from the SQL statement structure, subquery statements generally have the following forms:

  • Scalar Subquery, such as SELECT (SELECT s1 FROM t2) FROM t1.
  • Derived Tables, such as SELECT t1.s1 FROM (SELECT s1 FROM t2) t1.
  • Existential Test, such as WHERE NOT EXISTS(SELECT ... FROM t2), WHERE t1.a IN (SELECT ... FROM t2).
  • Quantified Comparison, such as WHERE t1.a = ANY(SELECT ... FROM t2).
  • As a subquery for the operand of the comparison operator, such as WHERE t1.a > (SELECT ... FROM t2).

For subquery SQL statements, see SUBQUERY.

In addition, in terms of SQL statement execution, subquery statements generally have the following two forms:

  1. Correlated Subquery: Inner queries and outer queries are not independent of each other in database nested queries, and inner queries also rely on outer queries. The execution order is:

  2. First query a record from the outer query.

  3. Then put the query records into the records that meet the conditions in the inner layer query, and then put them into the outer layer for query.

  4. Repeat the above steps

    For example: select * from tableA where tableA.cloumn < (select column from tableB where tableA.id = tableB.id))

  5. Self-contained Subquery: Inner queries in database nested queries are completely independent of outer queries. The execution order is:

  6. Execute the inner layer query first.

  7. Get the results of the inner layer query and bring them into the outer layer, and then execute the outer layer query.

    For example: select * from tableA where tableA.column = (select tableB.column from tableB )

The role of subquery:

  • Subqueries allow structured queries so that each part of a query statement can be separated.
  • Subqueries provide another way to perform some operations that require complex JOIN and UNION.

We will give you a simple example to help you understandassociated subquery andunrelated subquery.

Example

Prepare before starting

You need to confirm that before you start, the following tasks have been completed:

Data preparation

  1. Download the dataset:

    https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/tpch/tpch-1g.zip
    
  2. Create database and data table:

    create database d1;
    use d1;
    CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
                       N_NAME CHAR(25) NOT NULL,
                       N_REGIONKEY INTEGER NOT NULL,
                       N_COMMENT VARCHAR(152),
                       PRIMARY KEY (N_NATIONKEY));
    
    CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
                       R_NAME CHAR(25) NOT NULL,
                       R_COMMENT VARCHAR(152),
                       PRIMARY KEY (R_REGIONKEY));
    
    CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
                     P_NAME VARCHAR(55) NOT NULL,
                     P_MFGR CHAR(25) NOT NULL,
                     P_BRAND CHAR(10) NOT NULL,
                     P_TYPE VARCHAR(25) NOT NULL,
                     P_SIZE INTEGER NOT NULL,
                     P_CONTAINER CHAR(10) NOT NULL,
                     P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                     P_COMMENT VARCHAR(23) NOT NULL,
                     PRIMARY KEY (P_PARTKEY));
    
    CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
                        S_NAME CHAR(25) NOT NULL,
                        S_ADDRESS VARCHAR(40) NOT NULL,
                        S_NATIONKEY INTEGER NOT NULL,
                        S_PHONE CHAR(15) NOT NULL,
                        S_ACCTBAL DECIMAL(15,2) NOT NULL,
                        S_COMMENT VARCHAR(101) NOT NULL,
                        PRIMARY KEY (S_SUPPKEY));
    
    CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
                        PS_SUPPKEY INTEGER NOT NULL,
                        PS_AVAILQTY INTEGER NOT NULL,
                        PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
                        PS_COMMENT VARCHAR(199) NOT NULL,
                        PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));
    
    CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
                        C_NAME VARCHAR(25) NOT NULL,
                        C_ADDRESS VARCHAR(40) NOT NULL,
                        C_NATIONKEY INTEGER NOT NULL,
                        C_PHONE CHAR(15) NOT NULL,
                        C_ACCTBAL DECIMAL(15,2) NOT NULL,
                        C_MKTSEGMENT CHAR(10) NOT NULL,
                        C_COMMENT VARCHAR(117) NOT NULL,
                        PRIMARY KEY (C_CUSTKEY));
    
    CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL,
                       O_CUSTKEY INTEGER NOT NULL,
                       O_ORDERSTATUS CHAR(1) NOT NULL,
                       O_TOTALPRICE DECIMAL(15,2) NOT NULL,
                       O_ORDERDATE DATE NOT NULL,
                       O_ORDERPRIORITY CHAR(15) NOT NULL,
                       O_CLERK CHAR(15) NOT NULL,
                       O_SHIPPRIORITY INTEGER NOT NULL,
                       O_COMMENT VARCHAR(79) NOT NULL,
                       PRIMARY KEY (O_ORDERKEY));
    
    CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL,
                        L_PARTKEY INTEGER NOT NULL,
                        L_SUPPKEY INTEGER NOT NULL,
                        L_LINENUMBER INTEGER NOT NULL,L_QUANTITY    DECIMAL(15,2) NOT NULL,
                        L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                        L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                        L_TAX         DECIMAL(15,2) NOT NULL,
                        L_RETURNFLAG  CHAR(1) NOT NULL,
                        L_LINESTATUS  CHAR(1) NOT NULL,
                        L_SHIPDATE    DATE NOT NULL,
                        L_COMMITDATE  DATE NOT NULL,
                        L_RECEIPTDATE DATE NOT NULL,
                        L_SHIPINSTRUCT CHAR(25) NOT NULL,
                        L_SHIPMODE     CHAR(10) NOT NULL,
                        L_COMMENT      VARCHAR(44) NOT NULL,
                        PRIMARY KEY (L_ORDERKEY, L_LINENUMBER));
    
  3. Import the data into the data table:

    load data local infile '/YOUR_TPCH_DATA_PATH/nation.tbl' into table NATION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/region.tbl' into table REGION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/part.tbl' into table PART FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/supplier.tbl' into table SUPPLIER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/partsupp.tbl' into table PARTSUPP FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/orders.tbl' into table ORDERS FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/customer.tbl' into table CUSTOMER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/lineitem.tbl' into table LINEITEM FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    

Now you can use this data to query.

Uncorrelated subqueries

For uncorrelated subqueries that use subqueries as operands of comparison operators (>/ >=/ < / <= / = / !=), the inner subquery only needs to be queried once, and MO Intelligence will rewrite the inner subquery into a constant when generating the execution plan.

mysql> select p.p_name from (select * from part where p_brand='Brand#21' and p_retailprice between 1100 and 1200) p, partsupp ps where p.p_partkey=ps.ps_partkey and p.p_name like '%pink%' limit 10;

When MO Intelligence executes the above query, it will first execute an inner subquery:

mysql> select * from part where p_brand='Brand#21' and p_retailprice between 1100 and 1200;

The result is:

+-----------------------------------+
| p_name                            |
+-----------------------------------+
| olive chartreuse smoke pink tan   |
| olive chartreuse smoke pink tan   |
| olive chartreuse smoke pink tan   |
| olive chartreuse smoke pink tan   |
| pink sienna dark bisque turquoise |
| pink sienna dark bisque turquoise |
| pink sienna dark bisque turquoise |
| pink sienna dark bisque turquoise |
| honeydew orchid cyan magenta pink |
| honeydew orchid cyan magenta pink |
+-----------------------------------+
10 rows in set (0.06 sec)

For the non-correlated column subqueries in the existence test and set comparison cases, MO Intelligence will rewrite and replace them with equivalents to obtain better execution performance.

Correlated subqueries

For correlated subqueries, since the inner subquery references the columns of the outer query, the subquery needs to be executed once for each row obtained by the outer query. That is to say, if the outer query obtains 10 million results, the subquery will also be executed 10 million times, which will cause the query to consume more time and resources.

Therefore, during the processing, MO Intelligence will try to disassociate the correlated subqueries to improve the query efficiency from the execution plan level.

mysql> select p_name from part where P_PARTKEY in (select PS_PARTKEY from PARTSUPP where PS_SUPPLYCOST>=500) and p_name like '%pink%' limit 10;

When processing this SQL statement, MO Intelligence will rewrite it into an equivalent JOIN query:

select p_name from part join partsupp on P_PARTKEY=PS_PARTKEY where PS_SUPPLYCOST>=500 and p_name like '%pink%' limit 10;

Operation results:

+------------------------------------+
| p_name                             |
+------------------------------------+
| papaya red almond hot pink         |
| turquoise hot smoke green pink     |
| purple cornsilk red pink floral    |
| pink cyan purple white burnished   |
| sandy dark pink indian cream       |
| powder cornsilk chiffon slate pink |
| rosy light black pink orange       |
| pink white goldenrod ivory steel   |
| cornsilk dim pink tan sienna       |
| lavender navajo steel sandy pink   |
+------------------------------------+
10 rows in set (0.23 sec)

As a best practice, in actual development, to improve computing efficiency, try to choose equivalent calculation methods for queries and avoid using associated subqueries for queries.