Skip to content

Insert jsonlines file

This document will guide you how to import JSONLines format data (i.e. jl or jsonl files) into MatrixOne Intelligence through the LOAD DATA LOCAL command.

About JSONLines format

JSON (JavaScript Object Notation) is a lightweight data exchange format. You can refer to the [official documentation] (https://www.json.org/json-en.html) for more information about JSON.

JSONLines Text format, also known as newline-delimited JSON, is a more convenient storage structured data format that can process one record at a time. Each line is a complete, legal JSON value; it takes \n as the line separator. Each line of JSONLines is independent, so no comma is required for the beginning or end of the line. The entire content of JSONLines does not need to be enclosed with [] or {}.

JSONLines is more friendly to data streams, because each new line means a separate entry, so files in JSON line format can be streamed. It does not require a custom parser, it just reads one line, parses it into JSON, and then reads one line, parses it into JSON, until it is finished.

The JSONLines format has the following three requirements:

*UTF-8 Encoding: JSON allows encoding Unicode strings using only ASCII escape sequences, but in text editors, these escapes are difficult to read. The author of a JSON Lines file can choose to escape characters to handle pure ASCII files.

*Each line is a legal JSON value: The most common value is an object or an array, and any JSON value is legal.

*Line delimiter is \n: Because surrounding spaces are implicitly ignored when parsing JSON values, it supports \r\n while also \r\n.

Valid JSONLines format for MatrixOne Intelligence

The JSONLines format only requires that each line has a valid JSON value. But MatrixOne Intelligence requires a more structured JSONLines format, and only JSON objects or JSON arrays with the same type of value and normal structure are allowed in MatrixOne Intelligence. If your JSONLines file has a nested structure, MatrixOne Intelligence does not support loading it for the time being.

An valid object JSONLines example:

{"id":1,"father":"Mark","mother":"Charlotte"}
{"id":2,"father":"John","mother":"Ann"}
{"id":3,"father":"Bob","mother":"Monika"}

Invalid object JSONLines example (with nested structure):

{"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]}
{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]}
{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]}

A valid array JSONLines example, which is more like the csv format.

["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true]

Invalid array JSONLines example (the reason for invalidity is because the data type and column number do not match):

["Gilbert", "2013", 24, true, 100]
["Alexa", "2013", "twenty nine", true]
["May", "2012B", 14, "no"]
["Deloise", "2012A", 19, true, 40]

Syntax Structure

LOAD DATA LOCAL INFILE
    {'filepath'='FILEPATH', 'compression'='COMPRESSION_FORMAT', 'format'='FILE_FORMAT', 'jsondata'='object'/'array'} INTO TABLE table_name [IGNORE x LINES/ROWS]
    [PARALLEL {'TRUE' | 'FALSE'}];

Particle description

Parameters Values ​​ Must/Optional Description
filepath String Must File path
compression auto/none/bz2/gzip/lz4 Optional Compression format
format csv/jsonline Optional Load file format, default .csv
jsondata object/array Optional JSON data format. If format is jsonline,must specify jsondata
table_name String Must Table name to load data into the table
x Number Optional Lines to be ignored when loading

DDL Guide for JSONLines Format Data

Before loading JSONLines data into MatrixOne Intelligence, you need to create a table first.

Since the JSON data type is different from the data type of MatrixOne Intelligence, see the table below to view the data type of JSON data type corresponding to MatrixOne Intelligence:

JSON Types Data Types in MatrixOne
String VARCHAR (fixed length string)
String TEXT (Long Text Data)
String DATETIME or TIMESTAMP (format is "YYYY-MM-DD HH:MM:SS.XXXXXX")
String DATE (format is "YYYY-MM-DD")
String TIME (format is "HH-MM-SS.XXXXXX")
Number INT (Integer)
Number FLOAT or DOUBLE (float number)
Boolean BOOL(true/false)
Object Json Type
Array Json Type
Null Supports all types

For example, you can first create a data table for a JSONLines format file using SQL statements, as shown below:

mysql> create table t1 (name varchar(100), session varchar(100), score int, completed bool);
["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true]

Example

The following snippet is a complete SQL example of loading a JSONLines file into MatrixOne Intelligence.

#Load a BZIP2 compressed jsonline object file
load data local infile {'filepath'='data.bzip2', 'compression'='bz2','format'='jsonline','jsondata'='object'} into table db.a;

#Load a plain jsonline array file
load data local infile {'filepath'='data.jl', 'format'='jsonline','jsondata'='array'} into table db.a;

#Load a gzip compressed jsonline array file and ignore the first line
load data local infile {'filepath'='data.jl.gz', 'compression'='gzip','format'='jsonline','jsondata'='array'} into table db.a ignore 1 lines;

Tutorial Example

In this tutorial, you will be guided on how to load two jsonline files with object and array json format.

  1. Prepare the data. You can also download and use our prepared jl file, for example: jsonline_object.jl

  2. Open the terminal, enter the directory where the jl file is located, enter the following command line to display the specific contents in the file:

    > cd /$filepath
    > head jsonline_object.jl
    {"col1":true,"col2":1,"col3":"var","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col6":"2020-09-07 00:00:00","col7":"18","col8":121.11}
    {"col1":"true","col2":"1","col3":"var","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col6":"2020-09-07 00:00:00","col7":"18","col8":"121.11"}
    {"col6":"2020-09-07 00:00:00","col7":"18","col8":"121.11","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col1":"true","col2":"1","col3":"var"}
    {"col2":1,"col3":"var","col1":true,"col6":"2020-09-07 00:00:00","col7":"18","col4":"2020-09-07","col5":"2020-09-07 00:00:00","col8":121.11}
    ```3. Start the MySQL client and connect to MatrixOne Intelligence, for example:
    

    mysql -h freetier-01.cn-hangzhou.cluster.matrixonecloud.cn -P 6001 -u 585b49fc_852b_4bd1_b6d1_d64bc1d8xxxx:admin:accountadmin -p --local-infile

    ```

  3. Create tables in MatrixOne Intelligence:

    create database db1;
    use db1;
    drop table if exists t1;
    create table t1(col1 bool,col2 int,col3 varchar, col4 date,col5 datetime,col6 timestamp,col7 decimal,col8 float);
    
  4. Execute LOAD DATA LOCAL on the corresponding file path in the MySQL client and import the jsonline_object.jl and jsonline_array.jl files:

    load data local infile {'filepath'='$filepath/jsonline_object.jl','format'='jsonline','jsondata'='object'} into table t1;
    
  5. After the import is successful, use the following SQL statement to view the import results:

    select * from t1;
    col1 col2 col3 col4 col5 col6 col7 col8
    true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11
    true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11
    true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11
    true 1 var 2020-09-07 2020-09-07 00:00:00 2020-09-07 00:00:00 18 121.11