Skip to content

Multi-table join query

In some scenarios where databases are used, you need to use multiple tables in a query. You can combine the data of two or more tables together through the JOIN statement.

Prepare before starting

You need to confirm that before you start, the following tasks have been completed:

Data preparation

  1. Download the dataset:

    https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/tpch/tpch-1g.zip
    
  2. Create database and data table:

    create database d1;
    use d1;
    CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
                       N_NAME CHAR(25) NOT NULL,
                       N_REGIONKEY INTEGER NOT NULL,
                       N_COMMENT VARCHAR(152),
                       PRIMARY KEY (N_NATIONKEY));
    
    CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
                       R_NAME CHAR(25) NOT NULL,
                       R_COMMENT VARCHAR(152),
                       PRIMARY KEY (R_REGIONKEY));
    
    CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
                     P_NAME VARCHAR(55) NOT NULL,
                     P_MFGR CHAR(25) NOT NULL,
                     P_BRAND CHAR(10) NOT NULL,
                     P_TYPE VARCHAR(25) NOT NULL,
                     P_SIZE INTEGER NOT NULL,
                     P_CONTAINER CHAR(10) NOT NULL,
                     P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                     P_COMMENT VARCHAR(23) NOT NULL,
                     PRIMARY KEY (P_PARTKEY));
    
    CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
                        S_NAME CHAR(25) NOT NULL,
                        S_ADDRESS VARCHAR(40) NOT NULL,
                        S_NATIONKEY INTEGER NOT NULL,
                        S_PHONE CHAR(15) NOT NULL,
                        S_ACCTBAL DECIMAL(15,2) NOT NULL,
                        S_COMMENT VARCHAR(101) NOT NULL,
                        PRIMARY KEY (S_SUPPKEY));
    
    CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
                        PS_SUPPKEY INTEGER NOT NULL,
                        PS_AVAILQTY INTEGER NOT NULL,
                        PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
                        PS_COMMENT VARCHAR(199) NOT NULL,
                        PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));
    
    CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
                        C_NAME VARCHAR(25) NOT NULL,
                        C_ADDRESS VARCHAR(40) NOT NULL,
                        C_NATIONKEY INTEGER NOT NULL,
                        C_PHONE CHAR(15) NOT NULL,
                        C_ACCTBAL DECIMAL(15,2) NOT NULL,
                        C_MKTSEGMENT CHAR(10) NOT NULL,
                        C_COMMENT VARCHAR(117) NOT NULL,
                        PRIMARY KEY (C_CUSTKEY));
    
    CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL,
                       O_CUSTKEY INTEGER NOT NULL,
                       O_ORDERSTATUS CHAR(1) NOT NULL,
                       O_TOTALPRICE DECIMAL(15,2) NOT NULL,
                       O_ORDERDATE DATE NOT NULL,
                       O_ORDERPRIORITY CHAR(15) NOT NULL,
                       O_CLERK CHAR(15) NOT NULL,
                       O_SHIPPRIORITY INTEGER NOT NULL,
                       O_COMMENT VARCHAR(79) NOT NULL,
                       PRIMARY KEY (O_ORDERKEY));
    
    CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL,
                        L_PARTKEY INTEGER NOT NULL,
                        L_SUPPKEY INTEGER NOT NULL,
                        L_LINENUMBER INTEGER NOT NULL,
                        L_QUANTITY DECIMAL(15,2) NOT NULL,
                        L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
                        L_DISCOUNT DECIMAL(15,2) NOT NULL,
                        L_TAX DECIMAL(15,2) NOT NULL,
                        L_RETURNFLAG CHAR(1) NOT NULL,
                        L_LINESTATUS CHAR(1) NOT NULL,
                        L_SHIPDATE DATE NOT NULL,
                        L_COMMITDATE DATE NOT NULL,
                        L_RECEIPTDATE DATE NOT NULL,
                        L_SHIPINSTRUCT CHAR(25) NOT NULL,
                        L_SHIPMODE CHAR(10) NOT NULL,
                        L_COMMENT VARCHAR(44) NOT NULL,
                        PRIMARY KEY (L_ORDERKEY, L_LINENUMBER));
    
  3. Import the data into the data table:

    load data local infile '/YOUR_TPCH_DATA_PATH/nation.tbl' into table NATION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/region.tbl' into table REGION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';load data local infile '/YOUR_TPCH_DATA_PATH/part.tbl' into table PART FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/supplier.tbl' into table SUPPLIER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/partsupp.tbl' into table PARTSUPP FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/orders.tbl' into table ORDERS FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/customer.tbl' into table CUSTOMER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    
    load data local infile '/YOUR_TPCH_DATA_PATH/lineitem.tbl' into table LINEITEM FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    

Now you can use this data to query.

Join Types

Inner Join

The join result of an inner join only returns rows that match the join condition.

Syntax Diagram
SELECT FROM TableA A INNER JOIN TableB B ON A.Key=B.Key innerjoin

There are two ways to write an inner join, and the results are completely equivalent:

mysql> SELECT   
    l_orderkey,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue,
    o_orderdate,
    o_shippriority
FROM
    CUSTOMER,
    ORDERS,
    LINEITEM
WHERE
    c_mktsegment = 'BUILDING'
        AND c_custkey = o_custkey
        AND l_orderkey = o_orderkey
        AND o_orderdate < DATE '1995-03-15'
        AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey , o_orderdate , o_shippriority
ORDER BY revenue DESC , o_orderdate
LIMIT 10;
+------------+---------------------+-------------+----------------+
| l_orderkey | revenue             | o_orderdate | o_shippriority |
+------------+---------------------+-------------+----------------+
|    2456423 | 406181.011100000000 | 1995-03-05  |              0 |
|    3459808 | 405838.698900000000 | 1995-03-04  |              0 |
|     492164 | 390324.061000000000 | 1995-02-19  |              0 |
|    1188320 | 384537.935900000000 | 1995-03-09  |              0 |
|    2435712 | 378673.055800000000 | 1995-02-26  |              0 |
|    4878020 | 378376.795200000000 | 1995-03-12  |              0 |
|    5521732 | 375153.921500000000 | 1995-03-13  |              0 |
|    2628192 | 373133.309400000000 | 1995-02-22  |              0 |
|     993600 | 371407.459500000000 | 1995-03-05  |              0 |
|    2300070 | 367371.145200000000 | 1995-03-13  |              0 |
+------------+---------------------+-------------+----------------+
10 rows in set (0.20 sec)

Written in the form of Join, the syntax is as follows:

mysql> SELECT   
    l_orderkey,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue,
    o_orderdate,
    o_shippriority
FROM
    CUSTOMER
    join ORDERS on c_custkey = o_custkey
    join LINEITEM on l_orderkey = o_orderkey
WHERE
    c_mktsegment = 'BUILDING'
        AND o_orderdate < DATE '1995-03-15'
        AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey , o_orderdate , o_shippriority
ORDER BY revenue DESC , o_orderdate
LIMIT 10;
+------------+---------------------+-------------+----------------+
| l_orderkey | revenue             | o_orderdate | o_shippriority |
+------------+---------------------+-------------+----------------+
|    2456423 | 406181.011100000000 | 1995-03-05  |              0 |
|    3459808 | 405838.698900000000 | 1995-03-04  |              0 |
|     492164 | 390324.061000000000 | 1995-02-19  |              0 |
|    1188320 | 384537.935900000000 | 1995-03-09  |              0 |
|    2435712 | 378673.055800000000 | 1995-02-26  |              0 |
|    4878020 | 378376.795200000000 | 1995-03-12  |              0 |
|    5521732 | 375153.921500000000 | 1995-03-13  |              0 |
|    2628192 | 373133.309400000000 | 1995-02-22  |              0 |
|     993600 | 371407.459500000000 | 1995-03-05  |              0 |
|    2300070 | 367371.145200000000 | 1995-03-13  |              0 |
+------------+---------------------+-------------+----------------+
10 rows in set (0.20 sec)

Outer join

Outer joins are divided intoleft join andright join, and the two can achieve equivalent semantics:

  • LEFT JOIN

Left outer join returns all data rows in the left table and values ​​in the right table that match the join condition. If no matching row is found in the right table, NULL is used to fill it. |Grammar | Illustration | |---|---| |SELECT FROM TableA A LEFT JOIN TableB B ON A.Key=B.Key|leftjoin| |SELECT FROM TableA A LEFT JOIN TableB B ON A.Key=B.Key WHERE B.Key IS NULL|leftjoinwhere|

  • RIGHT JOINThe right outer join returns all records in the right table, as well as the values ​​in the left table that can match the join conditions. If there is no matching value, use NULL to fill it.
Grammar Illustration
SELECT FROM TableA A RIGHT JOIN TableB B ON A.Key=B.Key leftjoinwhere
SELECT FROM TableA A RIGHT JOIN TableB B ON A.Key=B.Key WHERE A.Key IS NULL leftjoinwhere

The statement examples are as follows:

SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey limit 10;

+------------------------+
| c_custkey | c_count |
+------------------------+
| 147457 | 16 |
| 147458 | 7 |
| 147459 | 0 |
| 147460 | 16 |
| 147461 | 7 |
| 147462 | 0 |
| 147463 | 14 |
| 147464 | 11 |
| 147465 | 0 |
| 147466 | 17 |
+------------------------+
10 rows in set (0.93 sec)

or:

SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        ORDERS
    RIGHT OUTER JOIN CUSTOMER ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey limit 10;

+------------------------+
| c_custkey | c_count |
+------------------------+
| 147457 | 16 |
| 147458 | 7 |
| 147459 | 0 |
| 147460 | 16 |
| 147461 | 7 |
| 147462 | 0 |
| 147463 | 14 |
| 147464 | 11 |
| 147465 | 0 |
| 147466 | 17 |
+------------------------+
10 rows in set (0.93 sec)

Full connection

Full connections are union of left and right connections. The join table contains all records of the connected table, which is populated with NULL if matching records are missing.

SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    FULL JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey limit 10;

+------------------------+
| c_custkey | c_count |
+------------------------+
| 1 | 6 |
| 2 | 7 |
| 4 | 20 |
| 5 | 4 |
| 7 | 16 |
| 8 | 13 |
| 10 | 20 |
| 11 | 13 |
| 13 | 18 |
| 14 | 9 |
+------------------------+
10 rows in set (0.77 sec)

Full connections can also obtain the same semantics through rewriting:

SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey
UNION
SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    WHERE c_custkey IS NULL
    GROUP BY c_custkey
limit 10;

+------------------------+
| c_custkey | c_count |
+------------------------+
| 147457 | 16 |
| 147458 | 7 |
| 147459 | 0 |
| 147460 | 16 |
| 147461 | 7 |
| 147462 | 0 |
| 147463 | 14 |
| 147464 | 11 |
| 147465 | 0 |
| 147466 | 17 |
+------------------------+
10 rows in set (1.09 sec)

Implicit connection

In SQL statements, in addition to using JOIN, two or more tables can be joined through the FROM t1, t2 clause, and the connection conditions can be specified through the WHERE t1.id = t2.id clause.