LOAD DATA

Description

The LOAD DATA statement reads rows from a text file into a table at a very high speed. The file can be read from the server host or a S3 compatible object storage. LOAD DATA is the complement of SELECT ... INTO OUTFILE. To write data from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table, use LOAD DATA. The syntax of the FIELDS and LINES clauses is the same for both statements.

Syntax

> LOAD DATA [LOCAL]
    INFILE 'file_name'
    INTO TABLE tbl_name
    [CHARACTER SET charset_name]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ENCASPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [SET column_name_1=nullif(column_name_1, expr1), column_name_2=nullif(column_name_2, expr2)...]
    [PARALLEL {'TRUE' | 'FALSE'}]
    [STRICT {'TRUE' | 'FALSE'}]

Input File Location

LOAD DATA INFILE 'file_name': Indicates that the data file to be loaded is on the same machine as the MatrixOne host server. file_name can be the relative path name of the storage location of the file, or it can be the absolute path name.
LOAD DATA LOCAL INFILE 'file_name': indicates that the data file to be loaded is not on the same machine as the MatrixOne host server; that is, the data file is on the client server. file_name can be the relative path name of the storage location of the file, or it can be the absolute path name.

CHARACTER SET

If the file content uses a different character set than the default, you can use CHARACTER SET to specify the character set. For example, you can use CHARACTER SET utf8 to specify the character set of the imported content as utf8:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE table1 IGNORE 1 LINES;

Note

In addition to utf8, LOAD DATA supports specifying character sets such as utf_8, UTF_16, UTF_xx, gbk, abcd, and so on. Character sets with - (e.g. utf-8,UTF-16) are not supported.

IGNORE LINES

The IGNORE number LINES clause can be used to ignore lines at the start of the file. For example, you can use IGNORE 1 LINES to skip an initial header line containing column names:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE table1 IGNORE 1 LINES;

Field and Line Handling

Use the FIELDS and LINES parameters to specify how to handle data formats.

For LOAD DATA and SELECT ... INTO OUTFILE statements, the syntax of the FIELDS and LINES clauses is the same. Both clauses are optional, but if both are specified, FIELDS must precede LINES.

If the FIELDS clause is specified, then each of the clauses of FIELDS (TERMINATED BY, [OPTIONALLY] ENCLOSED BY) is also optional, unless you must specify at least one of them.

LOAD DATA also supports the use of hexadecimal ASCII character expressions or binary ASCII character expressions as arguments to FIELDS ENCLOSED BY and FIELDS TERMINATED BY.

If you do not specify a parameter for processing data, the following default values are used:

FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n'

Note

FIELDS TERMINATED BY '\t': with and only \t as delimiters.
ENCLOSED BY '"': with and only " as the included character.
ESCAPED BY '\\':use \ as an escape character, and only as an escape character.
LINES TERMINATED BY '\n': Use and only use \n or \r\n as the line separator.

FIELDS TERMINATED BY

FIELDS TERMINATED BY specifies the delimiter for a field. The FIELDS TERMINATED BY values can be more than one character.

Examples

LOAD DATA INFILE 'data.txt' INTO TABLE table1
  FIELDS TERMINATED BY ',';

FIELDS ENCLOSED BY

FIELDS TERMINATED BY option specifies the character enclose the input values. ENCLOSED BY value must be a single character. If the input values are not necessarily enclosed within quotation marks, use OPTIONALLY before the ENCLOSED BY option.

Examples

LOAD DATA INFILE 'data.txt' INTO TABLE table1
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"';

FIELDS ESCAPED BY

FIELDS ESCAPED BY allows you to specify an escape character, with a default value of \\, which means that \ is an escape symbol that is deleted if the FIELDS ESCAPED BY character is not null and subsequent characters will be used literally as part of the field value.

However, some two-character sequences have special meanings, as shown in the table below:

escape sequence	Sequentially represented characters
\0	A space character
\b	A backspace character
\n	A newline (linefeed) character
\r	A carriage return character
\t	A tab character
\z	ASCII 26 (Control+Z)

Examples

Examples 1

The contents of data.txt are as follows:

(base) admin@admindeMacBook-Pro case % cat data.txt 
1   a\\b

Connect mo than execute the following statement to import the data.txt contents to t1:

create table t1(n1 int,n2 varchar(255));
load data infile 'Users/admin/test/case/data.txt' into table t1;

mysql> select * from t1;
+------+------+
| n1   | n2   |
+------+------+
|    1 | a\b  |
+------+------+
1 row in set (0.00 sec)

The result of n2 is a\b because the first \ was an escape character and was deleted.

Examples 2

The contents of data.txt are as follows:

(base) admin@admindeMacBook-Pro case % cat data.txt 
1   a\\b

Connect mo than execute the following statement to import the data.txt contents to t2:

create table t2(n1 int,n2 varchar(255));
load data infile 'Users/admin/test/case/data.txt' into table t2 fields escaped by 'a';

mysql> select * from t2;
+------+------+
| n1   | n2   |
+------+------+
|    1 | \\b  |
+------+------+
1 row in set (0.00 sec)

The result of n2 is \\b, because here we specified the escape character as a, so a is removed.

Examples 3

The contents of data.txt are as follows:

(base) admin@admindeMacBook-Pro case % cat data.txt 
1   a\\b

Connect mo than execute the following statement to import the data.txt contents to t3:

create table t3(n1 int,n2 varchar(255));
load data infile 'Users/admin/test/case/data.txt' into table t3 fields escaped by '';

mysql> SELECT * FROM t3;
+------+------+
| n1   | n2   |
+------+------+
|    1 | a\\b |
+------+------+
1 row in set (0.01 sec)

The result of n2 is a\\b, and when ESCAPED BY is empty, it is read as is without escaping the characters.

Examples 4

The contents of data.txt are as follows:

(base) admin@admindeMacBook-Pro case % cat data.txt 
1   a\0b
2   c\bd
3   a\nb
4   a\rb
5   a\tb
6   a\Zb

Connect mo than execute the following statement to import the data.txt contents to t4:

create table t4(n1 int,n2 varchar(255));
load data infile 'Users/admin/test/case/data.txt' into table t4;

mysql> select * from t1;
+------+------+
| n1   | n2   |
+------+------+
|    1 | a b  |
|    2 | d  |
|    3 | a
b  |
b  | 4 | a
|    5 | a  b  |
|    6 | ab  |
+------+------+
6 rows in set (0.01 sec)

when n1=1, the result of n2 is a b because \0 is a space character;

When n1=2, the result of n2 is d because \b is a backspace and a is deleted;

When n1=3, the result of n2 is a plus a newline b because \n is a newline;

When n1=4, the result of n2 is a plus b after a carriage return, because \r is a carriage return;

When n1=5, the result of n2 is b for a b because \t is a tab;

When n1=6, the result of n2 is ab, because \z is a terminator.

LINES TERMINATED BY

LINES TERMINATED BY specifies the delimiter for the a line. The LINES TERMINATED BY values can be more than one character.

Examples

For example, to read a file separated by comma, the syntax is:

LOAD DATA INFILE 'data.txt' INTO TABLE table1
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n';

LINE STARTING BY

If all the input lines have a common prefix that you want to ignore, you can use LINES STARTING BY 'prefix_string' to skip the prefix and anything before it. If a line does not include the prefix, the entire line is skipped. Suppose that you issue the following statement:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE table1
  FIELDS TERMINATED BY ','  LINES STARTING BY 'xxx';

If the data file looks like this:

xxx"abc",1
something xxx"def",2
"ghi",3

The resulting rows are ("abc",1) and ("def",2). The third row in the file is skipped because it does not contain the prefix.

SET

MatrixOne only supports SET column_name=nullif(column_name,expr). That is, when column_name = expr, it returns NULL; otherwise, it returns the original value of column_name. For example, SET a=nullif(a,1), if a=1, returns NULL; otherwise, it returns the original value of column a.

By setting the parameter, you can use SET column_name=nullif(column_name,"null") to return the NULL value in the column when loading the file.

Example

The details of the local file test.txt are as follows:

id,user_name,sex
1,"weder","man"
2,"tom","man"
null,wederTom,"man"

Create a table named user in MatrixOne:

create database aaa;
use aaa;
CREATE TABLE `user` (`id` int(11) ,`user_name` varchar(255) ,`sex` varchar(255));

Load test.txt into the table user:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE user SET id=nullif(id,"null");

The result of the talbe is as below:

select * from user;
+------+-----------+------+
| id   | user_name | sex  |
+------+-----------+------+
|    1 | weder     | man  |
|    2 | tom       | man  |
| null | wederTom  | man  |
+------+-----------+------+

PARALLEL

For a sizeable well-formed file, such as a JSOLLines file or a CSV file with no line breaks in a line of data, you can use PARALLEL to load the file in parallel to speed up the loading.

For example, for a large file of 2 G, use two threads to load; the second thread first splits and locates the 1G position, then reads and loads backward. In this way, two threads can read large files at the same time, and each thread can read 1G of data.

Enable/Disable Parallel Loading Command Line Example:

--  Enable Parallel Loading
load data infile 'file_name' into table tbl_name FIELDS TERMINATED BY '|' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES PARALLEL 'TRUE';

--  Disable Parallel Loading
load data infile 'file_name' into table tbl_name FIELDS TERMINATED BY '|' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES PARALLEL 'FALSE';

--  Parallel loading is disabled by default
load data infile 'file_name' into table tbl_name FIELDS TERMINATED BY '|' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES;

Note

[PARALLEL {'TRUE' | 'FALSE'}] currently only support TRUE or FALSE and are not case-sensitive.

Note: If the PARALLEL field is not added in the LOAD statement, for CSV files, parallel loading is disabled by default; for JSOLLines files, parallel loading is enabled by default. If there is a line terminator in the CSV file, such as '\n', otherwise it may cause data errors when the file is loaded. If the file is too large, manually splitting the file from the '\n' as the starting and ending point is recommended, then enabling parallel loading.

STRICT

MatrixOne supports the use of the STRICT parameter to specify the way to cut the file in parallel, which is only valid when PARALLEL is TRUE. The default value of STRICT is TRUE, which indicates that the cut is done using read-ahead detection, which not only relies on the line breaks for the segmentation, but also performs a read-ahead to verify that it matches the column definitions of the table, and will process it as a valid segmentation point only if the data meets the column definitions. Only if the data matches the column definition will it be processed as a valid split point. When the parameter is FALSE, it will use the newline character (default is '\n') to cut in the parallel import of the cut file, and in the case of data with newline character, there may be an error in the cut.

Example:

-- Enable pre-reading mode
load data infile 'file_name' into table tbl_name PARALLEL 'TRUE' STRICT 'TRUE';

-- Turn off pre-reading mode
load data infile 'file_name' into table tbl_name PARALLEL 'TRUE' STRICT 'FALSE';

Supported file formats

In MatrixOne's current release, LOAD DATA supports CSV(comma-separated values) format and JSONLines format file. See full tutorials for loading csv and jsonline.

Note

LOAD DATA supports importing lz4, gz, bz2, zlib, flate, and does not support importing compressed files ending with .tar or .tar.xx.

CSV format standard description

The CSV format loaded by MatrixOne conforms to the RFC4180 standard, and the CSV format is specified as follows:

Each record is on a separate line, separated by a newline character (CRLF):
```
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
```
Imported into the table as follows:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | aaa | b bb | ccc | | zzz | yyy | xxx | +---------+---------+---------+
The last record in the file can have a terminating newline or no terminating newline (CRLF):
```
aaa,bbb,ccc CRLF
zzz,yyy,xxx
```
Imported into the table as follows:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | aaa | b bb | ccc | | zzz | yyy | xxx | +---------+---------+---------+
An optional header line appears as the first line of the file and has the same format as a standard record line. For example:
```
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
```
Imported into the table as follows:

+------------+------------+------------+ | field_name | field_name | field_name | +------------+------------+------------+ | aaa | bbb | ccc | | zzz | yyy | xxx | +------------+------------+------------+
In the header and each record, there may be one or more fields separated by commas. Whitespace within a field is part of the field and should not be ignored. A comma cannot follow the last field in each record. For example:
```
aaa,bbb,ccc
```
Or:
```
a aa, bbb,cc c
```
Both examples are correct.

Imported into the table as follows:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | aaa | bbb | ccc | +---------+---------+---------+

Or:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | a aa | bbb | cc c | +---------+---------+---------+
Each field can be enclosed in double quotes or not. Double quotes cannot appear inside a field if the field is not enclosed in double-quotes. For example:
```
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
```
Or:
```
"aaa","bbb",ccc CRLF
zzz,yyy,xxx
```
Both examples are correct.

Imported into the table as follows:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | aaa | bbb | ccc | | zzz | yyy | xxx | +---------+---------+---------+
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:
```
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
```
Imported into the table as follows:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | aaa | b bb | ccc | | zzz | yyy | xxx | +---------+---------+---------+
If double quotation marks are used to enclose the field, then multiple double quotation marks appearing in the field must also be enclosed in double quotation marks; otherwise, the first quotation mark of two double quotation marks in the field will be parsed as an escape character, thus keep a single, double quote. For example:
```
"aaa","b","bb","ccc"
```
The above CSV will parse "b""bb" into b"bb; if the correct field is b""bb, then it should be written as:
```
"aaa","b""""bb","ccc"
```
Or:
```
"aaa",b""bb,"ccc"
```
Imported into the table as follows:

+---------+---------+---------+ | col1 | col2 | col3 | +---------+---------+---------+ | aaa | b""bb | ccc | +---------+---------+---------+

Examples

The SSB Test is an example of LOAD DATA syntax. Complete a SSB Test with MatrixOne

> LOAD DATA INFILE '/ssb-dbgen-path/lineorder_flat.tbl ' INTO TABLE lineorder_flat;

The above statement means: load the lineorder_flat.tbl data set under the directory path /ssb-dbgen-path/ into the MatrixOne data table lineorder_flat.

You can also refer to the following syntax examples to quickly understand LOAD DATA:

Example 1: LOAD CSV

Simple example

The data in the file locally named char_varchar.csv is as follows:

a|b|c|d
"a"|"b"|"c"|"d"
'a'|'b'|'c'|'d'
"'a'"|"'b'"|"'c'"|"'d'"
"aa|aa"|"bb|bb"|"cc|cc"|"dd|dd"
"aa|"|"bb|"|"cc|"|"dd|"
"aa|||aa"|"bb|||bb"|"cc|||cc"|"dd|||dd"
"aa'|'||aa"|"bb'|'||bb"|"cc'|'||cc"|"dd'|'||dd"
aa"aa|bb"bb|cc"cc|dd"dd
"aa"aa"|"bb"bb"|"cc"cc"|"dd"dd"
"aa""aa"|"bb""bb"|"cc""cc"|"dd""dd"
"aa"""aa"|"bb"""bb"|"cc"""cc"|"dd"""dd"
"aa""""aa"|"bb""""bb"|"cc""""cc"|"dd""""dd"
"aa""|aa"|"bb""|bb"|"cc""|cc"|"dd""|dd"
"aa""""|aa"|"bb""""|bb"|"cc""""|cc"|"dd""""|dd"
|||
||||
""|""|""|
""""|""""|""""|""""
""""""|""""""|""""""|""""""

Create a table named t1 in MatrixOne:

mysql> drop table if exists t1;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t1(
    -> col1 char(225),
    -> col2 varchar(225),
    -> col3 text,
    -> col4 varchar(225)
    -> );
Query OK, 0 rows affected (0.02 sec)

Load the data file into table t1:

load data infile '<your-local-file-path>/char_varchar.csv' into table t1 fields terminated by'|';

The query result is as follows:

mysql> select * from t1;
+-----------+-----------+-----------+-----------+
| col1      | col2      | col3      | col4      |
+-----------+-----------+-----------+-----------+
| a         | b         | c         | d         |
| a         | b         | c         | d         |
| 'a'       | 'b'       | 'c'       | 'd'       |
| 'a'       | 'b'       | 'c'       | 'd'       |
| aa|aa     | bb|bb     | cc|cc     | dd|dd     |
| aa|       | bb|       | cc|       | dd|       |
| aa|||aa   | bb|||bb   | cc|||cc   | dd|||dd   |
| aa'|'||aa | bb'|'||bb | cc'|'||cc | dd'|'||dd |
| aa"aa     | bb"bb     | cc"cc     | dd"dd     |
| aa"aa     | bb"bb     | cc"cc     | dd"dd     |
| aa"aa     | bb"bb     | cc"cc     | dd"dd     |
| aa""aa    | bb""bb    | cc""cc    | dd""dd    |
| aa""aa    | bb""bb    | cc""cc    | dd""dd    |
| aa"|aa    | bb"|bb    | cc"|cc    | dd"|dd    |
| aa""|aa   | bb""|bb   | cc""|cc   | dd""|dd   |
|           |           |           |           |
|           |           |           |           |
|           |           |           |           |
| "         | "         | "         | "         |
| ""        | ""        | ""        | ""        |
+-----------+-----------+-----------+-----------+
20 rows in set (0.00 sec)

Add conditional Example

Following the example above, you can modify the LOAD DATA statement and add LINES STARTING BY 'aa' ignore 10 lines; at the end of the statement to experience the difference:

delete from t1;
load data infile '<your-local-file-path>/char_varchar.csv' into table t1 fields terminated by'|' LINES STARTING BY 'aa' ignore 10 lines;

The query result is as follows:

mysql> select * from t1;
+---------+---------+---------+---------+
| col1    | col2    | col3    | col4    |
+---------+---------+---------+---------+
| aa"aa   | bb"bb   | cc"cc   | dd"dd   |
| aa""aa  | bb""bb  | cc""cc  | dd""dd  |
| aa""aa  | bb""bb  | cc""cc  | dd""dd  |
| aa"|aa  | bb"|bb  | cc"|cc  | dd"|dd  |
| aa""|aa | bb""|bb | cc""|cc | dd""|dd |
|         |         |         |         |
|         |         |         |         |
|         |         |         |         |
| "       | "       | "       | "       |
| ""      | ""      | ""      | ""      |
+---------+---------+---------+---------+
10 rows in set (0.00 sec)

As you can see, the query result ignores the first line and and ignores the common prefix aa.

For more information on loding csv, see Import the .csv data.

Example 2: LOAD JSONLines

Simple example

The data in the file locally named jsonline_array.jl is as follows:

[true,1,"var","2020-09-07","2020-09-07 00:00:00","2020-09-07 00:00:00","18",121.11,["1",2,null,false,true,{"q":1}],"1qaz",null,null]
["true","1","var","2020-09-07","2020-09-07 00:00:00","2020-09-07 00:00:00","18","121.11",{"c":1,"b":["a","b",{"q":4}]},"1aza",null,null]

Create a table named t1 in MatrixOne:

mysql> drop table if exists t1;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t1(col1 bool,col2 int,col3 varchar(100), col4 date,col5 datetime,col6 timestamp,col7 decimal,col8 float,col9 json,col10 text,col11 json,col12 bool);
Query OK, 0 rows affected (0.03 sec)

Load the data file into table t1:

load data infile {'filepath'='<your-local-file-path>/jsonline_array.jl','format'='jsonline','jsondata'='array'} into table t1;

The query result is as follows:

mysql> select * from t1;
+------+------+------+------------+---------------------+---------------------+------+--------+---------------------------------------+-------+-------+-------+
| col1 | col2 | col3 | col4       | col5                | col6                | col7 | col8   | col9                                  | col10 | col11 | col12 |
+------+------+------+------------+---------------------+---------------------+------+--------+---------------------------------------+-------+-------+-------+
| true |    1 | var  | 2020-09-07 | 2020-09-07 00:00:00 | 2020-09-07 00:00:00 |   18 | 121.11 | ["1", 2, null, false, true, {"q": 1}] | 1qaz  | NULL  | NULL  |
| true |    1 | var  | 2020-09-07 | 2020-09-07 00:00:00 | 2020-09-07 00:00:00 |   18 | 121.11 | {"b": ["a", "b", {"q": 4}], "c": 1}   | 1aza  | NULL  | NULL  |
+------+------+------+------------+---------------------+---------------------+------+--------+---------------------------------------+-------+-------+-------+
2 rows in set (0.00 sec)

Add conditional Example

Following the example above, you can modify the LOAD DATA statement and add ignore 1 lines at the end of the statement to experience the difference:

delete from t1;
load data infile {'filepath'='<your-local-file-path>/jsonline_array.jl','format'='jsonline','jsondata'='array'} into table t1 ignore 1 lines;

The query result is as follows:

mysql> select * from t1;
+------+------+------+------------+---------------------+---------------------+------+--------+-------------------------------------+-------+-------+-------+
| col1 | col2 | col3 | col4       | col5                | col6                | col7 | col8   | col9                                | col10 | col11 | col12 |
+------+------+------+------------+---------------------+---------------------+------+--------+-------------------------------------+-------+-------+-------+
| true |    1 | var  | 2020-09-07 | 2020-09-07 00:00:00 | 2020-09-07 00:00:00 |   18 | 121.11 | {"b": ["a", "b", {"q": 4}], "c": 1} | 1aza  | NULL  | NULL  |
+------+------+------+------------+---------------------+---------------------+------+--------+-------------------------------------+-------+-------+-------+
1 row in set (0.00 sec)

As you can see, the query result ignores the first line.

For more information on loding JSONLines, see Import the JSONLines data.

Constraints

The REPLACE and IGNORE modifiers control handling of new (input) rows that duplicate existing table rows on unique key values (PRIMARY KEY or UNIQUE index values) are not supported in MatrixOne yet.
Input pre-pressing with SET is supported very limitedly. Only SET columns_name=nullif(expr1,expr2) is supported.
When enabling the parallel loading, it must be ensured that each row of data in the file does not contain the specified line terminator, such as '\n'; otherwise, it will cause data errors during file loading.
The parallel loading of files requires that the files be in uncompressed format, and parallel loading of files in compressed form is not currently supported.
When you use load data local, you need to use the command line to connect to the MatrixOne service host: mysql -h <mo-host -ip> -P 6001 -uroot -p111 --local-infile.