Using CTAS to replicate tables
What is CTAS
CTAS (Create Table As Select), is a SQL statement used to quickly create a new table (replicate table) based on an existing table or query result. When the CTAS statement is executed, a new table is created directly from the data generated by the SELECT clause, and the column structure and data type of the new table are consistent with the result set in the SELECT clause.
Application scenarios
CTAS has a wide range of application scenarios, including:
-
Data Migration: Using CTAS, you can quickly migrate data from one table to another, while changing the storage structure and distribution strategy of tables to accommodate different queries and storage needs.
-
Data backup: CTAS can be used to create backup copies of data, which is useful for data recovery and historical data analysis.
-
Table structure changes: When you need to modify the table structure (such as adding or removing columns, changing data types, etc.), CTAS can create a new table to reflect these changes without affecting the original table.
-
Data Science and Machine Learning: In data science projects, CTAS can be used to prepare data sets and create clean, formatted data tables suitable for training machine learning models.
CTAS is an efficient SQL operation that dramatically improves the efficiency of data processing and analysis by simplifying data management processes and enhancing operational flexibility. However, when applying CTAS, the level of support for CTAS by the target database system and its potential impact on system performance need to be taken into account to ensure data synchronization and operational accuracy and effectiveness.
Prepare before you start
Completed standalone deployment of MatrixOne.
How to use CTAS
Grammar
CTAS
statements typically take the following form:
sql CREATE [TEMPORARY] TABLE table_name as select
See chapter Create Table As Select for more syntax descriptions
Cases
Suppose we have an e-commerce platform and we want to create a data table to analyze the details of each order including order number, customer ID, order date, product ID, product quantity and product price.
CREATE TABLE orders(
order_id int auto_increment PRIMARY KEY,
customer_id int,
order_date date,
product_id int,
quantity int,
price float
);
INSERT INTO orders(customer_id,order_date,product_id,quantity,price) values(30,"2023-04-01",5001,2,19.99);
INSERT INTO orders(customer_id,order_date,product_id,quantity,price) values(40,"2023-04-02",5002,1,29.99);
INSERT INTO orders(customer_id,order_date,product_id,quantity,price) values(30,"2023-04-03",5001,1,19.99);
mysql> select * from orders;
+----------+-------------+------------+------------+----------+-------+
| order_id | customer_id | order_date | product_id | quantity | price |
+----------+-------------+------------+------------+----------+-------+
| 1 | 30 | 2023-04-01 | 5001 | 2 | 19.99 |
| 2 | 40 | 2023-04-02 | 5002 | 1 | 29.99 |
| 3 | 30 | 2023-04-03 | 5001 | 1 | 19.99 |
+----------+-------------+------------+------------+----------+-------+
3 rows in set (0.00 sec)
--For analysis purposes, we want to calculate the total price for each order and create a new table containing the order number, customer ID, order date, and total order price.
CREATE TABLE orders_analysis AS
SELECT
order_id,
customer_id,
order_date,
product_id,
quantity,
price,
CAST((quantity * price) AS float) AS total_price
FROM
orders;
mysql> select * from orders_analysis;
+----------+-------------+------------+------------+----------+-------+-------------+
| order_id | customer_id | order_date | product_id | quantity | price | total_price |
+----------+-------------+------------+------------+----------+-------+-------------+
| 1 | 30 | 2023-04-01 | 5001 | 2 | 19.99 | 39.98 |
| 2 | 40 | 2023-04-02 | 5002 | 1 | 29.99 | 29.99 |
| 3 | 30 | 2023-04-03 | 5001 | 1 | 19.99 | 19.99 |
+----------+-------------+------------+------------+----------+-------+-------------+
3 rows in set (0.00 sec)
In this example, the CTAS statement not only copies the columns in the original table, but also adds a new calculated column, total_price, which calculates the total price of the order line item by multiplying the number of products per order by the price. This gives us a new table suitable for sales analysis that can be used directly to generate reports or for further data analysis.
This example demonstrates the power of CTAS in data conversion and preparation, which facilitates data analysis by allowing us to clean and convert data while creating new tables.