多表连接查询
一些使用数据库的场景中,需要一个查询当中使用到多张表的数据,你可以通过 JOIN 语句将两张或多张表的数据组合在一起。
开始前准备
你需要确认在开始之前,已经完成了以下任务:
已完成单机部署 MatrixOne。
数据准备
- 
下载数据集:
https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/tpch/tpch-1g.zip - 
创建数据库和数据表:
create database d1; use d1; CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INTEGER NOT NULL, N_COMMENT VARCHAR(152), PRIMARY KEY (N_NATIONKEY)); CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL, R_NAME CHAR(25) NOT NULL, R_COMMENT VARCHAR(152), PRIMARY KEY (R_REGIONKEY)); CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE INTEGER NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL, PRIMARY KEY (P_PARTKEY)); CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INTEGER NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL, PRIMARY KEY (S_SUPPKEY)); CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL, PS_SUPPKEY INTEGER NOT NULL, PS_AVAILQTY INTEGER NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL, PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)); CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL, PRIMARY KEY (C_CUSTKEY)); CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL, O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) NOT NULL, PRIMARY KEY (O_ORDERKEY)); CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL, PRIMARY KEY (L_ORDERKEY, L_LINENUMBER)); - 
把数据导入到数据表中:
load data infile '/YOUR_TPCH_DATA_PATH/nation.tbl' into table NATION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/region.tbl' into table REGION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/part.tbl' into table PART FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/supplier.tbl' into table SUPPLIER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/partsupp.tbl' into table PARTSUPP FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/orders.tbl' into table ORDERS FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/customer.tbl' into table CUSTOMER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; load data infile '/YOUR_TPCH_DATA_PATH/lineitem.tbl' into table LINEITEM FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; 
现在你可以使用这些数据进行查询。
Join 类型
内连接
内连接的连接结果只返回匹配连接条件的行。
| 语法 | 图示 | 
|---|---|
| SELECT  | 
![]()  | 
内连接有两种书写方式,在结果上是完全等价的:
mysql> SELECT   
    l_orderkey,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue,
    o_orderdate,
    o_shippriority
FROM
    CUSTOMER,
    ORDERS,
    LINEITEM
WHERE
    c_mktsegment = 'BUILDING'
        AND c_custkey = o_custkey
        AND l_orderkey = o_orderkey
        AND o_orderdate < DATE '1995-03-15'
        AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey , o_orderdate , o_shippriority
ORDER BY revenue DESC , o_orderdate
LIMIT 10;
+------------+---------------------+-------------+----------------+
| l_orderkey | revenue             | o_orderdate | o_shippriority |
+------------+---------------------+-------------+----------------+
|    2456423 | 406181.011100000000 | 1995-03-05  |              0 |
|    3459808 | 405838.698900000000 | 1995-03-04  |              0 |
|     492164 | 390324.061000000000 | 1995-02-19  |              0 |
|    1188320 | 384537.935900000000 | 1995-03-09  |              0 |
|    2435712 | 378673.055800000000 | 1995-02-26  |              0 |
|    4878020 | 378376.795200000000 | 1995-03-12  |              0 |
|    5521732 | 375153.921500000000 | 1995-03-13  |              0 |
|    2628192 | 373133.309400000000 | 1995-02-22  |              0 |
|     993600 | 371407.459500000000 | 1995-03-05  |              0 |
|    2300070 | 367371.145200000000 | 1995-03-13  |              0 |
+------------+---------------------+-------------+----------------+
10 rows in set (0.20 sec)
写成 Join 的形式,语法如下:
mysql> SELECT   
    l_orderkey,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue,
    o_orderdate,
    o_shippriority
FROM
    CUSTOMER
    join ORDERS on c_custkey = o_custkey
    join LINEITEM on l_orderkey = o_orderkey
WHERE
    c_mktsegment = 'BUILDING'
        AND o_orderdate < DATE '1995-03-15'
        AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey , o_orderdate , o_shippriority
ORDER BY revenue DESC , o_orderdate
LIMIT 10;
+------------+---------------------+-------------+----------------+
| l_orderkey | revenue             | o_orderdate | o_shippriority |
+------------+---------------------+-------------+----------------+
|    2456423 | 406181.011100000000 | 1995-03-05  |              0 |
|    3459808 | 405838.698900000000 | 1995-03-04  |              0 |
|     492164 | 390324.061000000000 | 1995-02-19  |              0 |
|    1188320 | 384537.935900000000 | 1995-03-09  |              0 |
|    2435712 | 378673.055800000000 | 1995-02-26  |              0 |
|    4878020 | 378376.795200000000 | 1995-03-12  |              0 |
|    5521732 | 375153.921500000000 | 1995-03-13  |              0 |
|    2628192 | 373133.309400000000 | 1995-02-22  |              0 |
|     993600 | 371407.459500000000 | 1995-03-05  |              0 |
|    2300070 | 367371.145200000000 | 1995-03-13  |              0 |
+------------+---------------------+-------------+----------------+
10 rows in set (0.20 sec)
外连接
外连接又分为左连接、右连接,两者之间是可以实现等价语义的:
LEFT JOIN
左外连接会返回左表中的所有数据行,以及右表当中能够匹配连接条件的值,如果在右表当中没有找到能够匹配的行,则使用 NULL 填充。
| 语法 | 图示 | 
|---|---|
| SELECT  | 
![]()  | 
| SELECT  | 
![]()  | 
RIGHT JOIN
右外连接返回右表中的所有记录,以及左表当中能够匹配连接条件的值,没有匹配的值则使用 NULL 填充。
| 语法 | 图示 | 
|---|---|
| SELECT  | 
![]()  | 
| SELECT  | 
![]()  | 
语句示例如下:
SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey limit 10;
+-----------+---------+
| c_custkey | c_count |
+-----------+---------+
|    147457 |      16 |
|    147458 |       7 |
|    147459 |       0 |
|    147460 |      16 |
|    147461 |       7 |
|    147462 |       0 |
|    147463 |      14 |
|    147464 |      11 |
|    147465 |       0 |
|    147466 |      17 |
+-----------+---------+
10 rows in set (0.93 sec)
或者:
SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        ORDERS
    RIGHT OUTER JOIN CUSTOMER ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey limit 10;
+-----------+---------+
| c_custkey | c_count |
+-----------+---------+
|    147457 |      16 |
|    147458 |       7 |
|    147459 |       0 |
|    147460 |      16 |
|    147461 |       7 |
|    147462 |       0 |
|    147463 |      14 |
|    147464 |      11 |
|    147465 |       0 |
|    147466 |      17 |
+-----------+---------+
10 rows in set (0.93 sec)
全连接
全连接是左右外连接的并集。连接表包含被连接的表的所有记录,如果缺少匹配的记录,即以 NULL 填充。
SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    FULL JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey limit 10;
+-----------+---------+
| c_custkey | c_count |
+-----------+---------+
|         1 |       6 |
|         2 |       7 |
|         4 |      20 |
|         5 |       4 |
|         7 |      16 |
|         8 |      13 |
|        10 |      20 |
|        11 |      13 |
|        13 |      18 |
|        14 |       9 |
+-----------+---------+
10 rows in set (0.77 sec)
全连接同样可以通过改写的方式获得相同的语义:
SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    GROUP BY c_custkey
UNION
SELECT
        c_custkey, COUNT(o_orderkey) AS c_count
    FROM
        CUSTOMER
    LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
        AND o_comment NOT LIKE '%special%requests%')
    WHERE c_custkey IS NULL
    GROUP BY c_custkey
limit 10;
+-----------+---------+
| c_custkey | c_count |
+-----------+---------+
|    147457 |      16 |
|    147458 |       7 |
|    147459 |       0 |
|    147460 |      16 |
|    147461 |       7 |
|    147462 |       0 |
|    147463 |      14 |
|    147464 |      11 |
|    147465 |       0 |
|    147466 |      17 |
+-----------+---------+
10 rows in set (1.09 sec)
隐式连接
在 SQL 语句当中,除了使用 JOIN,也可以通过 FROM t1, t2 子句来连接两张或多张表,通过 WHERE t1.id = t2.id 子句来指定连接的条件。




