Skip to content

Importing Files from Object Storage

Overview

S3 (Simple Storage Service) object storage refers to Amazon's simple storage service. You can also use S3-compatible object storage to store almost any type and size of data, including data lakes, cloud-native applications, and mobile applications. If you are new to S3 object services, you can find a basic introduction in AWS.

AWS S3 has been highly successful for over a decade, making it the standard for object storage. As a result, almost all major public cloud vendors offer S3-compatible object storage services.

MatrixOne Intelligence supports loading files from S3-compatible object storage services into the database. MatrixOne Intelligence supports AWS and mainstream domestic cloud vendors (Alibaba Cloud, Tencent Cloud, etc.).

In MatrixOne Intelligence, there are two ways to import data from S3-compatible object storage:

  • Use Load data with s3option to load files into MatrixOne Intelligence. This method will load the data into MatrixOne Intelligence, and all subsequent queries will be performed within MatrixOne Intelligence.
  • Create an "external table" with s3option mapped to S3 files and query this external table directly. This method allows data access through the S3-compatible object storage service; network latency for each query will be calculated.

Method 1: LOAD DATA

Syntax Structure

LOAD DATA
    | URL s3options {"endpoint"='<string>', "access_key_id"='<string>', "secret_access_key"='<string>', "bucket"='<string>', "role_arn"='xxxx', "external_id"='yyy', "filepath"='<string>', "region"='<string>', "compression"='<string>'}
    INTO TABLE tbl_name
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [IGNORE number {LINES | ROWS}]
    [PARALLEL {'TRUE' | 'FALSE'}]

Parameter Description

Parameter Description
endpoint The URL to connect to the object storage service. For example: s3.us-west-2.amazonaws.com
access_key_id Access key ID
secret_access_key Secret access key
bucket The S3 bucket to access
role_arn
external_id
filepath Relative file path. Supports regular expressions like /files/*.csv.
region Object storage service region
compression Compression format of the S3 file. If empty or "none", it indicates an uncompressed file. Supported fields or compression formats are "auto", "none", "gzip", "bz2", and "lz4".

Other parameters are the same as general LOAD DATA parameters. For more information, refer to [移除了无效网址].

Syntax Examples:

# LOAD a csv file from AWS S3 us-east-1 region, test-load-mo bucket, without compression
LOAD DATA URL s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-load-mo', "filepath"='test.csv', "region"='us-east-1', "compression"='none'} INTO TABLE t1 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';

# LOAD all csv files from Alibaba Cloud OSS Shanghai region, test-load-data bucket, without compression
LOAD DATA URL s3option{"endpoint"='oss-cn-shanghai.aliyuncs.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-load-data', "filepath"='/test/*.csv', "region"='oss-cn-shanghai', "compression"='none'} INTO TABLE t1 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';

# LOAD a csv file from Tencent Cloud COS Shanghai region, test-1252279971 bucket, with bz2 compression
LOAD DATA URL s3option{"endpoint"='cos.ap-shanghai.myqcloud.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-1252279971', "filepath"='test.csv.bz2', "region"='ap-shanghai', "compression"='bz2'} INTO TABLE t1 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';

!!! note MatrixOne Intelligence provides security assurance for S3 authentication information. Sensitive information such as access_key_id and secret_access_key will be hidden in system table records (statement_info) to ensure your account security.

Tutorial: Loading Files from AWS S3

This tutorial will guide you through the process of loading a.csv file from AWS S3.

If you already have an AWS account and have prepared your data files in your S3 service, please continue reading this tutorial section.

If you have not yet prepared your data files, please register and upload your data files first; you can also check the AWS S3 official tutorial. If you want to upload your data files to Alibaba Cloud OSS or Tencent Cloud COS, the operation process is similar to AWS S3.


!!! note Due to account privacy, this code example does not display account information such as access_key_id and secret_access_key. You can read this document to understand the main steps; specific data and account information will not be displayed.

  1. Download the data file. Go toAWS S3 > buckets, create a bucket namedtest-loading with public access, and upload the file char_varchar_1.csv.

    public block

  2. Get or create your AWS Access key. Go toYour Account Name > Security Credentials, and retrieve your existing access keys or create a new one.

    Access Key

    You can obtain the Access key and Secret access key from the downloaded credentials or this webpage. Retrieve Access Key 3. Create a table in MatrixOne Intelligence. An SQL example is as follows:

    create database db;
    use db;
    drop table if exists t1;
    create table t1(col1 char(225), col2 varchar(225), col3 text, col4 varchar(225));
    
  3. Import the file into MatrixOne Intelligence:

    LOAD DATA URL s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-loading', "filepath"='char_varchar_1.csv', "region"='us-east-1', "compression"='none'} INTO TABLE t1;
    
  4. After the import is complete, you can run the SQL statement to check if the file import was successful:

    mysql> select * from t1;
    +-----------+-----------+-----------+-----------+
    | col1      | col2      | col3      | col4      |
    +-----------+-----------+-----------+-----------+
    | a         | b         | c         | d         |
    | a         | b         | c         | d         |
    | 'a'       | 'b'       | 'c'       | 'd'       |
    | 'a'       | 'b'       | 'c'       | 'd'       |
    | aa,aa     | bb,bb     | cc,cc     | dd,dd     |
    | aa,       | bb,       | cc,       | dd,       |
    | aa,,,aa   | bb,,,bb   | cc,,,cc   | dd,,,dd   |
    | aa',',,aa | bb',',,bb | cc',',,cc | dd',',,dd |
    | aa"aa     | bb"bb     | cc"cc     | dd"dd     |
    | aa"aa     | bb"bb     | cc"cc     | dd"dd     |
    | aa"aa     | bb"bb     | cc"cc     | dd"dd     |
    | aa""aa    | bb""bb    | cc""cc    | dd""dd    |
    | aa""aa    | bb""bb    | cc""cc    | dd""dd    |
    | aa",aa    | bb",bb    | cc",cc    | dd",dd    |
    | aa"",aa   | bb"",bb   | cc"",cc   | dd"",dd   |
    |           |           |           |           |
    |           |           |           |           |
    | NULL      | NULL      | NULL      | NULL      |
    |           |           |           |           |
    | "         | "         | "         | "         |
    | ""        | ""        | ""        | ""        |
    +-----------+-----------+-----------+-----------+
    21 rows in set (0.03 sec)
    

Method 2: Specify S3 Files to External Tables

Syntax Structure

create external table t(...) URL s3option{"endpoint"='<string>', "access_key_id"='<string>', "secret_access_key"='<string>', "bucket"='<string>', "filepath"='<string>', "region"='<string>', "compression"='<string>'}     
[{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
]
[IGNORE number {LINES | ROWS}];

!!! note MatrixOne Intelligence currently only supports select operations on external tables and does not support delete, insert, or update at this time.

Parameter Description

Parameter Description
endpoint The URL to connect to the object storage service. For example: s3.us-west-2.amazonaws.com
access_key_id Access key ID
secret_access_key Secret access key
bucket The S3 bucket to access
filepath Relative file path. Supports regular expressions like /files/*.csv.
region Object storage service region
compression Compression format of the S3 file. If empty or "none", it indicates an uncompressed file. Supported fields or compression formats are "auto", "none", "gzip", "bz2", and "lz4".

Other parameters are the same as general LOAD DATA parameters. For more information, refer to [移除了无效网址].

For more information about external tables, refer to [移除了无效网址].

Syntax Examples:

## Create an external table for a .csv file from AWS S3
create external table t1(col1 char(225)) url s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-loading', "filepath"='test.csv', "region"='us-east-1', "compression"='none'} fields terminated by ',' enclosed by '\"' lines terminated by '\n';

## Create an external table for a .csv file compressed with BZIP2 from Tencent Cloud
create external table t1(col1 char(225)) url s3option{"endpoint"='cos.ap-shanghai.myqcloud.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-1252279971', "filepath"='test.csv.bz2', "region"='ap-shanghai', "compression"='bz2'} fields terminated by ',' enclosed by '\"' lines terminated by '\n' ignore 1 lines;

Tutorial: Creating External Tables with S3 Files

This tutorial will guide you through the entire process of creating an external table using a.csv file from AWS S3.


!!! note Due to account privacy, this code example does not display account information such as access_key_id and secret_access_key. You can read this document to understand the main steps; specific data and account information will not be displayed.

  1. Download the data file. Go toAWS S3 > buckets, create a bucket namedtest-loading with public access, and upload the file char_varchar_1.csv.

    public block 2. Get or create your AWS Access key. Go toYour Account Name > Security Credentials, and retrieve your existing access keys or create a new one.

    Access Key You can obtain the Access key and Secret access key from the downloaded credentials or this webpage. Retrieve Access Key 3. In MatrixOne Intelligence, specify the S3 file to the external table:

    create database db;
    use db;
    drop table if exists t1;
    create external table t1(col1 char(225), col2 varchar(225), col3 text, col4 varchar(225)) url s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-loading', "filepath"='char_varchar_1.csv', "region"='us-east-1', "compression"='none'} fields terminated by ',' enclosed by '\"' lines terminated by '\n';
    
  2. After successful import, you can run the following SQL statement to view the import results. You will notice that the query speed is significantly slower than querying from a local table.

    select * from t1;
    +-----------+-----------+-----------+-----------+
    | col1      | col2      | col3      | col4      |
    +-----------+-----------+-----------+-----------+
    | a         | b         | c         | d         |
    | a         | b         | c         | d         |
    | 'a'       | 'b'       | 'c'       | 'd'       |
    | 'a'       | 'b'       | 'c'       | 'd'       |
    | aa,aa     | bb,bb     | cc,cc     | dd,dd     |
    | aa,       | bb,       | cc,       | dd,       |
    | aa,,,aa   | bb,,,bb   | cc,,,cc   | dd,,,dd   |
    | aa',',,aa | bb',',,bb | cc',',,cc | dd',',,dd |
    | aa"aa     | bb"bb     | cc"cc     | dd"dd     |
    | aa"aa     | bb"bb     | cc"cc     | dd"dd     |
    | aa"aa     | bb"bb     | cc"cc     | dd"dd     |
    | aa""aa    | bb""bb    | cc""cc    | dd""dd    |
    | aa""aa    | bb""bb    | cc""cc    | dd""dd    |
    | aa",aa    | bb",bb    | cc",cc    | dd",dd    |
    | aa"",aa   | bb"",bb   | cc"",cc   | dd"",dd   |
    |           |           |           |           |
    |           |           |           |           |
    | NULL      | NULL      | NULL      | NULL      |
    |           |           |           |           |
    | "         | "         | "         | "         |
    | ""        | ""        | ""        | ""        |
    +-----------+-----------+-----------+-----------+
    21 rows in set (1.32 sec)
    
  3. (Optional) If you need to import external table data into a data table in MatrixOne Intelligence, use the following SQL statement:

    Create a new table t2 in MatrixOne Intelligence:

    create table t2(col1 char(225), col2 varchar(225), col3 text, col4 varchar(225));
    

    Import external table t1 into t2:

    insert into t2 select * from t1;