Insert jsonlines file
This document will guide you how to import JSONLines format data (i.e. jl or jsonl files) into MatrixOne Intelligence through the LOAD DATA LOCAL command.
About JSONLines format
JSON (JavaScript Object Notation) is a lightweight data exchange format. You can refer to the [official documentation] (https://www.json.org/json-en.html) for more information about JSON.
JSONLines Text format, also known as newline-delimited JSON, is a more convenient storage structured data format that can process one record at a time. Each line is a complete, legal JSON value; it takes \n as the line separator. Each line of JSONLines is independent, so no comma is required for the beginning or end of the line. The entire content of JSONLines does not need to be enclosed with [] or {}.
JSONLines is more friendly to data streams, because each new line means a separate entry, so files in JSON line format can be streamed. It does not require a custom parser, it just reads one line, parses it into JSON, and then reads one line, parses it into JSON, until it is finished.
The JSONLines format has the following three requirements:
*UTF-8 Encoding: JSON allows encoding Unicode strings using only ASCII escape sequences, but in text editors, these escapes are difficult to read. The author of a JSON Lines file can choose to escape characters to handle pure ASCII files.
*Each line is a legal JSON value: The most common value is an object or an array, and any JSON value is legal.
*Line delimiter is \n: Because surrounding spaces are implicitly ignored when parsing JSON values, it supports \r\n while also \r\n.
Valid JSONLines format for MatrixOne Intelligence
The JSONLines format only requires that each line has a valid JSON value. But MatrixOne Intelligence requires a more structured JSONLines format, and only JSON objects or JSON arrays with the same type of value and normal structure are allowed in MatrixOne Intelligence. If your JSONLines file has a nested structure, MatrixOne Intelligence does not support loading it for the time being.
An valid object JSONLines example:
{"id":1,"father":"Mark","mother":"Charlotte"}
{"id":2,"father":"John","mother":"Ann"}
{"id":3,"father":"Bob","mother":"Monika"}
Invalid object JSONLines example (with nested structure):
{"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]}
{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]}
{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]}
A valid array JSONLines example, which is more like the csv format.
["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true]
Invalid array JSONLines example (the reason for invalidity is because the data type and column number do not match):
["Gilbert", "2013", 24, true, 100]
["Alexa", "2013", "twenty nine", true]
["May", "2012B", 14, "no"]
["Deloise", "2012A", 19, true, 40]
Syntax Structure
LOAD DATA LOCAL INFILE
{'filepath'='FILEPATH', 'compression'='COMPRESSION_FORMAT', 'format'='FILE_FORMAT', 'jsondata'='object'/'array'} INTO TABLE table_name [IGNORE x LINES/ROWS]
[PARALLEL {'TRUE' | 'FALSE'}];
Particle description
| Parameters | Values | Must/Optional | Description |
|---|---|---|---|
| filepath | String | Must | File path |
| compression | auto/none/bz2/gzip/lz4 | Optional | Compression format |
| format | csv/jsonline | Optional | Load file format, default .csv |
| jsondata | object/array | Optional | JSON data format. If format is jsonline,must specify jsondata |
| table_name | String | Must | Table name to load data into the table |
| x | Number | Optional | Lines to be ignored when loading |
DDL Guide for JSONLines Format Data
Before loading JSONLines data into MatrixOne Intelligence, you need to create a table first.
Since the JSON data type is different from the data type of MatrixOne Intelligence, see the table below to view the data type of JSON data type corresponding to MatrixOne Intelligence:
| JSON Types | Data Types in MatrixOne |
|---|---|
| String | VARCHAR (fixed length string) |
| String | TEXT (Long Text Data) |
| String | DATETIME or TIMESTAMP (format is "YYYY-MM-DD HH:MM:SS.XXXXXX") |
| String | DATE (format is "YYYY-MM-DD") |
| String | TIME (format is "HH-MM-SS.XXXXXX") |
| Number | INT (Integer) |
| Number | FLOAT or DOUBLE (float number) |
| Boolean | BOOL(true/false) |
| Object | Json Type |
| Array | Json Type |
| Null | Supports all types |
For example, you can first create a data table for a JSONLines format file using SQL statements, as shown below:
mysql> create table t1 (name varchar(100), session varchar(100), score int, completed bool);
["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true]
Example
The following snippet is a complete SQL example of loading a JSONLines file into MatrixOne Intelligence.
#Load a BZIP2 compressed jsonline object file
load data local infile {'filepath'='data.bzip2', 'compression'='bz2','format'='jsonline','jsondata'='object'} into table db.a;
#Load a plain jsonline array file
load data local infile {'filepath'='data.jl', 'format'='jsonline','jsondata'='array'} into table db.a;
#Load a gzip compressed jsonline array file and ignore the first line
load data local infile {'filepath'='data.jl.gz', 'compression'='gzip','format'='jsonline','jsondata'='array'} into table db.a ignore 1 lines;
Tutorial Example
In this tutorial, you will be guided on how to load two jsonline files with object and array json format.
-
Prepare the data. You can also download and use our prepared jl file, for example: jsonline_object.jl
-
Open the terminal, enter the directory where the jl file is located, enter the following command line to display the specific contents in the file:
> cd /$filepath > head jsonline_object.jl {"col1":true,"col2":1,"col3":"var","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col6":"2020-09-07 00:00:00","col7":"18","col8":121.11} {"col1":"true","col2":"1","col3":"var","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col6":"2020-09-07 00:00:00","col7":"18","col8":"121.11"} {"col6":"2020-09-07 00:00:00","col7":"18","col8":"121.11","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col1":"true","col2":"1","col3":"var"} {"col2":1,"col3":"var","col1":true,"col6":"2020-09-07 00:00:00","col7":"18","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col8":121.11} ```3. Start the MySQL client and connect to MatrixOne Intelligence, for example:mysql -h freetier-01.cn-hangzhou.cluster.matrixonecloud.cn -P 6001 -u 585b49fc_852b_4bd1_b6d1_d64bc1d8xxxx:admin:accountadmin -p --local-infile
```
-
Create tables in MatrixOne Intelligence:
create database db1; use db1; drop table if exists t1; create table t1(col1 bool,col2 int,col3 varchar, col4 date,col5 datetime,col6 timestamp,col7 decimal,col8 float); -
Execute
LOAD DATA LOCALon the corresponding file path in the MySQL client and import the jsonline_object.jl and jsonline_array.jl files:load data local infile {'filepath'='$filepath/jsonline_object.jl','format'='jsonline','jsondata'='object'} into table t1; -
After the import is successful, use the following SQL statement to view the import results:
select * from t1; col1 col2 col3 col4 col5 col6 col7 col8 true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11 true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11 true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11 true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11