DATA BRANCH DIFF
语法说明
DATA BRANCH DIFF 语句用于比较两个表之间的数据差异。该功能类似于 Git 的 diff 命令,可以显示两个数据分支之间的插入、删除和更新操作。
系统会自动识别两个表之间的最近公共祖先(LCA,Lowest Common Ancestor),并基于此计算差异。差异结果包括:
- INSERT:目标表中存在但基准表中不存在的行
- DELETE:基准表中存在但目标表中不存在的行
- UPDATE:两个表中主键相同但其他列值不同的行
语法结构
DATA BRANCH DIFF target_table [{ SNAPSHOT = 'snapshot_name' }]
AGAINST base_table [{ SNAPSHOT = 'snapshot_name' }]
[COLUMNS ( col1 [, col2 ...] )]
[OUTPUT output_option]
COLUMNS 投影(v3.0.10 起支持)
COLUMNS (col1, col2, ...) 将差异行输出里的值列限定到列表中的子集。主键列始终会出现在输出中(用来定位行);不在列表中的非主键列会被忽略。在宽表上只关心少量列的差异时,用它可以显著减少返回数据量。
输出选项
output_option:
COUNT -- 仅返回差异行数
| LIMIT number -- 限制返回的差异行数
| FILE 'directory_path' -- 将差异导出为 SQL 文件
| AS table_name -- 将差异保存到表中(暂不支持)
| SUMMARY -- 返回聚合的 INSERT / DELETE / UPDATE 计数
语法释义
参数说明
| 参数 | 说明 |
|---|---|
target_table |
目标表(要比较的表) |
base_table |
基准表(作为比较基准的表) |
SNAPSHOT = 'snapshot_name' |
可选参数,指定使用某个快照时刻的数据进行比较 |
OUTPUT COUNT |
仅返回差异的行数统计 |
OUTPUT LIMIT number |
限制返回的差异行数 |
OUTPUT FILE 'path' |
将差异导出为 SQL 文件到指定目录,支持本地路径或 Stage 路径(如 stage://stage_name/) |
OUTPUT SUMMARY |
返回聚合的 INSERT / DELETE / UPDATE 计数,替代逐行差异 |
COLUMNS (col1, col2, ...) |
v3.0.10 起支持。把差异输出中的非主键列限定到列表子集,主键列始终返回 |
输出列说明
默认输出包含以下列:
| 列名 | 说明 |
|---|---|
diff target against base |
显示比较的表名 |
flag |
差异类型:INSERT、DELETE 或 UPDATE |
| 其他列 | 表的所有可见列 |
使用说明
LCA(最近公共祖先)
系统会自动检测两个表之间的分支关系:
- 无 LCA:两个表没有共同的祖先,直接比较所有数据
- 有 LCA:两个表有共同的祖先,基于祖先计算增量差异
- 自身为 LCA:一个表是另一个表的祖先
LCA 解析和变更范围计算支持任意 DAG 深度的分支,不仅限于直接的父子关系。多叉树中具有复杂分支历史的结构可以在任意节点对(兄弟节点、表兄弟节点、跨子树节点)之间产生正确的差异。
支持的表类型
- 带主键的表(推荐)
- 复合主键的表
- 无主键的表(使用隐藏的 fake primary key)
示例
示例 1:基本差异比较
比较两个没有共同祖先的表:
-- Expected-Rows: 0
CREATE DATABASE test;
-- Expected-Rows: 0
USE test;
-- Expected-Rows: 0
CREATE TABLE test.t1 (a INT PRIMARY KEY, b VARCHAR(10));
-- Expected-Rows: 0
INSERT INTO test.t1 VALUES (1, '1'), (2, '2'), (3, '3');
-- Expected-Rows: 0
CREATE TABLE test.t2 (a INT PRIMARY KEY, b VARCHAR(10));
-- Expected-Rows: 0
INSERT INTO test.t2 VALUES (1, '1'), (2, '2'), (4, '4');
-- Expected-Rows: 2
DATA BRANCH DIFF test.t2 AGAINST test.t1;
+-------------------+--------+------+------+
| diff t2 against t1 | flag | a | b |
+-------------------+--------+------+------+
| t2 | INSERT | 4 | 4 |
| t1 | INSERT | 3 | 3 |
+-------------------+--------+------+------+
-- Expected-Rows: 2
DATA BRANCH DIFF test.t1 AGAINST test.t2;
+-------------------+--------+------+------+
| diff t1 against t2 | flag | a | b |
+-------------------+--------+------+------+
| t1 | INSERT | 3 | 3 |
| t2 | INSERT | 4 | 4 |
+-------------------+--------+------+------+
-- Expected-Rows: 0
DROP TABLE test.t1;
-- Expected-Rows: 0
DROP TABLE test.t2;
示例 2:比较分支表(有共同祖先)
-- Expected-Rows: 0
CREATE TABLE test.t0 (a INT PRIMARY KEY, b INT);
-- Expected-Rows: 0
INSERT INTO test.t0 VALUES (1, 1), (2, 2), (3, 3);
-- 从 t0 创建两个分支
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t1 FROM test.t0;
-- Expected-Rows: 0
INSERT INTO test.t1 VALUES (4, 4);
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t2 FROM test.t0;
-- Expected-Rows: 0
INSERT INTO test.t2 VALUES (5, 5);
-- Expected-Rows: 2
DATA BRANCH DIFF test.t2 AGAINST test.t1;
+-------------------+--------+------+------+
| diff t2 against t1 | flag | a | b |
+-------------------+--------+------+------+
| t2 | INSERT | 5 | 5 |
| t1 | INSERT | 4 | 4 |
+-------------------+--------+------+------+
-- Expected-Rows: 0
DROP TABLE test.t0;
-- Expected-Rows: 0
DROP TABLE test.t1;
-- Expected-Rows: 0
DROP TABLE test.t2;
示例 3:使用快照比较
-- Expected-Rows: 0
CREATE TABLE test.t1 (a INT PRIMARY KEY, b INT);
-- Expected-Rows: 0
INSERT INTO test.t1 VALUES (1, 1), (2, 2);
-- Expected-Rows: 0
CREATE SNAPSHOT sp1 FOR TABLE test t1;
-- Expected-Rows: 0
INSERT INTO test.t1 VALUES (3, 3);
-- Expected-Rows: 1
UPDATE test.t1 SET b = 10 WHERE a = 1;
-- Expected-Rows: 0
CREATE SNAPSHOT sp2 FOR TABLE test t1;
-- Expected-Rows: 1
DATA BRANCH DIFF test.t1{SNAPSHOT='sp2'} AGAINST test.t1{SNAPSHOT='sp1'};
+--------------------+--------+------+------+
| diff t1 against t1 | flag | a | b |
+--------------------+--------+------+------+
| t1 | INSERT | 3 | 3 |
+--------------------+--------+------+------+
-- Expected-Rows: 0
DROP SNAPSHOT sp1;
-- Expected-Rows: 0
DROP SNAPSHOT sp2;
-- Expected-Rows: 0
DROP TABLE test.t1;
Note
当比较同一张表的两个快照时,系统会基于快照之间的增量变化进行比较。在此示例中,虽然 UPDATE 操作修改了 a=1 的行,但由于快照比较的特殊机制,只显示了新增的行。
示例 4:仅获取差异数量
-- Expected-Rows: 0
CREATE TABLE test.t1 (a INT PRIMARY KEY, b INT);
-- Expected-Rows: 0
INSERT INTO test.t1 SELECT result, result FROM generate_series(1, 1000) g;
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t2 FROM test.t1;
-- Expected-Rows: 0
INSERT INTO test.t2 SELECT result, result FROM generate_series(1001, 2000) g;
-- Expected-Rows: 100
DELETE FROM test.t2 WHERE a <= 100;
-- Expected-Rows: 1
DATA BRANCH DIFF test.t2 AGAINST test.t1 OUTPUT COUNT;
+----------+
| COUNT(*) |
+----------+
| 1100 |
+----------+
-- Expected-Rows: 0
DROP TABLE test.t1;
-- Expected-Rows: 0
DROP TABLE test.t2;
示例 5:限制返回行数
-- Expected-Rows: 0
CREATE TABLE test.t1 (a INT PRIMARY KEY, b INT);
-- Expected-Rows: 0
INSERT INTO test.t1 SELECT result, result FROM generate_series(1, 100) g;
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t2 FROM test.t1;
-- Expected-Rows: 0
INSERT INTO test.t2 SELECT result, result FROM generate_series(101, 200) g;
-- Expected-Rows: 5
DATA BRANCH DIFF test.t2 AGAINST test.t1 OUTPUT LIMIT 5;
+--------------------+--------+------+------+
| diff t2 against t1 | flag | a | b |
+--------------------+--------+------+------+
| t2 | INSERT | 106 | 106 |
| t2 | INSERT | 107 | 107 |
| t2 | INSERT | 117 | 117 |
| t2 | INSERT | 124 | 124 |
| t2 | INSERT | 156 | 156 |
+--------------------+--------+------+------+
-- Expected-Rows: 0
DROP TABLE test.t1;
-- Expected-Rows: 0
DROP TABLE test.t2;
Note
OUTPUT LIMIT 返回的行不一定是按主键顺序排列的,而是按照内部存储顺序返回前 N 行差异。
示例 6:导出差异为 SQL 文件
-- Expected-Rows: 0
CREATE TABLE test.t1 (a INT PRIMARY KEY, b INT);
-- Expected-Rows: 0
INSERT INTO test.t1 VALUES (1, 1), (2, 2);
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t2 FROM test.t1;
-- Expected-Rows: 0
INSERT INTO test.t2 VALUES (3, 3);
-- Expected-Rows: 1
UPDATE test.t2 SET b = 10 WHERE a = 1;
-- Expected-Rows: 1
DELETE FROM test.t2 WHERE a = 2;
-- Expected-Success: false
DATA BRANCH DIFF test.t2 AGAINST test.t1 OUTPUT FILE '/tmp/diff_output/';
+------------------------------------------+------------------------------------------+
| FILE SAVED TO | HINT |
+------------------------------------------+------------------------------------------+
| /tmp/diff_output/diff_t2_t1_20241225.sql | DELETE FROM test.t1, REPLACE INTO test.t1 |
+------------------------------------------+------------------------------------------+
-- Expected-Rows: 0
DROP TABLE test.t1;
-- Expected-Rows: 0
DROP TABLE test.t2;
输出文件类型说明
根据差异类型,系统会生成不同格式的文件:
| 场景 | 文件格式 | 说明 |
|---|---|---|
| 增量同步(目标表非空) | .sql |
包含 DELETE FROM ... 和 REPLACE INTO ... 语句 |
| 全量同步(目标表为空) | .csv |
CSV 格式,可通过 LOAD DATA 导入 |
回放补丁文件
回放 SQL 文件(增量同步):
mysql -h <mo_host> -P <mo_port> -u <user> -p <db_name> < diff_t2_t1_20241225.sql
导入 CSV 文件(全量同步):
-- Expected-Success: false
LOAD DATA LOCAL INFILE '/tmp/diff_output/diff_xxx.csv'
INTO TABLE test.t1
FIELDS ENCLOSED BY '"' ESCAPED BY '\\' TERMINATED BY ','
LINES TERMINATED BY '\n';
示例 6b:导出差异到 Stage(对象存储)
Stage 是 MatrixOne 用于连接外部存储(如 S3、HDFS)的逻辑对象,可以将差异文件直接输出到对象存储,便于跨集群/跨环境同步数据。
-- Expected-Rows: 0
CREATE STAGE my_stage URL = 's3://my-bucket/diff-output/?region=us-east-1&access_key_id=xxx&secret_access_key=yyy';
-- Expected-Success: false
DATA BRANCH DIFF test.t2 AGAINST test.t1 OUTPUT FILE 'stage://my_stage/';
+-------------------------------------------------+------------------------------------------+
| FILE SAVED TO | HINT |
+-------------------------------------------------+------------------------------------------+
| stage://my_stage/diff_t2_t1_20241225.sql | DELETE FROM test.t1, REPLACE INTO test.t1 |
+-------------------------------------------------+------------------------------------------+
-- Expected-Success: false
SELECT load_file(CAST('stage://my_stage/diff_t2_t1_20241225.sql' AS DATALINK));
-- Expected-Rows: 0
DROP STAGE my_stage;
使用 Stage 的优势:
- 安全:无需在每次 SQL 中暴露 AK/SK,管理员配置一次即可
- 便捷:将复杂的 URL 路径封装成简单的对象名
- 跨集群同步:源端写到对象存储,目标端直接读取执行
示例 7:检测更新操作
-- Expected-Rows: 0
CREATE TABLE test.t0 (a INT PRIMARY KEY, b INT, c INT);
-- Expected-Rows: 0
INSERT INTO test.t0 SELECT result, result, result FROM generate_series(1, 100) g;
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t1 FROM test.t0;
-- Expected-Rows: 3
UPDATE test.t1 SET c = c + 1 WHERE a IN (1, 50, 100);
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.t2 FROM test.t0;
-- Expected-Rows: 3
UPDATE test.t2 SET c = c + 2 WHERE a IN (1, 25, 75);
-- 比较差异,将检测到冲突的更新
-- Expected-Rows: 6
DATA BRANCH DIFF test.t2 AGAINST test.t1;
+--------------------+--------+------+------+------+
| diff t2 against t1 | flag | a | b | c |
+--------------------+--------+------+------+------+
| t2 | UPDATE | 1 | 1 | 3 |
| t1 | UPDATE | 1 | 1 | 2 |
| t2 | UPDATE | 25 | 25 | 27 |
| t1 | UPDATE | 50 | 50 | 51 |
| t2 | UPDATE | 75 | 75 | 77 |
| t1 | UPDATE | 100 | 100 | 101 |
+--------------------+--------+------+------+------+
-- Expected-Rows: 0
DROP TABLE test.t0;
-- Expected-Rows: 0
DROP TABLE test.t1;
-- Expected-Rows: 0
DROP TABLE test.t2;
示例 8:复合主键表的差异比较
-- Expected-Rows: 0
CREATE TABLE test.orders (
tenant_id INT,
order_code VARCHAR(8),
amount DECIMAL(12,2),
PRIMARY KEY (tenant_id, order_code)
);
-- Expected-Rows: 0
INSERT INTO test.orders VALUES
(100, 'A100', 120.50),
(100, 'A101', 80.00),
(101, 'B200', 305.75);
-- Expected-Rows: 0
DATA BRANCH CREATE TABLE test.orders_branch FROM test.orders;
-- 在分支上进行修改
-- Expected-Rows: 1
UPDATE test.orders_branch SET amount = 130.50 WHERE tenant_id = 100 AND order_code = 'A100';
-- Expected-Rows: 1
DELETE FROM test.orders_branch WHERE tenant_id = 100 AND order_code = 'A101';
-- Expected-Rows: 0
INSERT INTO test.orders_branch VALUES (102, 'C300', 512.25);
-- 比较差异
-- Expected-Rows: 3
DATA BRANCH DIFF test.orders_branch AGAINST test.orders;
+-----------------------------------+--------+-----------+------------+--------+
| diff orders_branch against orders | flag | tenant_id | order_code | amount |
+-----------------------------------+--------+-----------+------------+--------+
| orders_branch | UPDATE | 100 | A100 | 130.50 |
| orders_branch | DELETE | 100 | A101 | 80.00 |
| orders_branch | INSERT | 102 | C300 | 512.25 |
+-----------------------------------+--------+-----------+------------+--------+
-- Expected-Rows: 0
DROP TABLE test.orders;
-- Expected-Rows: 0
DROP TABLE test.orders_branch;
-- Expected-Rows: 0
DROP DATABASE test;
示例 9:COLUMNS 投影(v3.0.10)
COLUMNS (col1, ..., colN) 把差异输出中的非主键列限定到列表子集,主键列始终返回。
DROP DATABASE IF EXISTS data_branch_diff_columns_demo;
CREATE DATABASE data_branch_diff_columns_demo;
USE data_branch_diff_columns_demo;
CREATE TABLE c1 (
id INT PRIMARY KEY,
name VARCHAR(30),
balance DECIMAL(12,2),
created_at TIMESTAMP,
birthday DATE
);
INSERT INTO c1 VALUES
(1, 'alice', 1000.50, '2024-01-01 10:00:00', '1990-03-15'),
(2, 'bob', 2000.75, '2024-01-02 11:00:00', '1985-07-20'),
(3, 'carol', 3000.00, '2024-01-03 12:00:00', '1992-11-08');
DATA BRANCH CREATE TABLE c1_br FROM c1;
UPDATE c1_br SET balance = 1500.50, name = 'alice_v2' WHERE id = 1;
DELETE FROM c1_br WHERE id = 2;
INSERT INTO c1_br VALUES (4, 'dave', 4000.00, '2024-02-01 09:00:00', '1988-12-25');
-- 全量 diff:所有列。
DATA BRANCH DIFF c1_br AGAINST c1;
-- 仅投影 name 列:输出里只有主键 + name。
DATA BRANCH DIFF c1_br AGAINST c1 COLUMNS (name);
-- 投影两个非主键列。
DATA BRANCH DIFF c1_br AGAINST c1 COLUMNS (name, balance);
-- 把主键与一个非主键列一起列出;主键列始终会返回。
DATA BRANCH DIFF c1_br AGAINST c1 COLUMNS (id, balance);
-- COLUMNS 可以和 OUTPUT LIMIT 组合。
DATA BRANCH DIFF c1_br AGAINST c1 COLUMNS (balance) OUTPUT LIMIT 5;
DROP TABLE c1_br;
DROP TABLE c1;
DROP DATABASE data_branch_diff_columns_demo;
注意事项
-
表结构一致性:进行差异比较的两个表必须具有相同的表结构(列名、列类型)。
-
主键要求:虽然支持无主键的表,但建议使用带主键的表以获得更准确的差异结果。
-
性能考虑:对于大表,差异比较可能需要较长时间。建议使用
OUTPUT COUNT先了解差异规模。 -
快照有效性:使用快照比较时,确保快照存在且有效。
-
输出文件:使用
OUTPUT FILE时,确保目标目录存在且有写入权限。生成的 SQL 文件可以直接执行以同步数据。 -
LCA 检测:系统会自动检测 LCA,无需手动指定。LCA 的存在会影响差异计算的方式。