HTAP
With the growth of enterprise scale and the explosive increase in data volume, traditional transactional databases are beginning to be stretched when dealing with advanced applications such as data analysis, and cannot meet the multi-dimensional needs of enterprises for data analysis. The proliferation of business systems and the increasing complexity of business logic have forced many companies to split their databases. However, this split architecture is difficult to implement complex correlation statistics and real-time analysis across shards due to the physical isolation between instances, which greatly limits the application potential of the database. To address these challenges, real-time data warehouse technology emerged. However, although real-time data warehouses provide solutions, the complexity and high cost of their architecture require enterprises to invest a lot of human resources in maintaining and building complex data synchronization links, which is not necessary in many cases. MatrixOne's HTAP Mixed Workload Engine provides a breakthrough solution. It seamlessly integrates online transaction processing and real-time data analysis capabilities through a single database core, thereby eliminating the need for data synchronization in traditional architectures. This integrated architecture design not only greatly simplifies the technical architecture and improves the efficiency of business processing, but also injects new vitality into the company's continuous innovation and efficiency improvement. The implementation of MatrixOne HTAP enables enterprises to respond to the dual challenges of data analysis and transaction processing more agilely, achieving instant analysis of data and rapid decision-making, while avoiding additional maintenance costs and technical complexity.
What is HTAP?
HTAP, or Hybrid Transactional/Analytical Processing, is a database architecture that can handle both transactional (OLTP) and analytical (OLAP) workloads within a single database system. The advantage of this architecture is that it can reduce data storage redundancy, improve query performance, and support real-time data analysis to help enterprises make quick decisions.
Application scenarios
The application scenarios of HTAP database are mainly concentrated in business environments that need to handle high concurrent transactions and complex analysis queries at the same time. The following are some specific application scenarios:
-
Real-time risk control and anti-fraud in the financial industry: The financial industry has an urgent need for real-time monitoring and analysis of transaction data. The HTAP database can realize the synchronization of transactions and risk analysis, helping financial institutions to detect and block fraud in a timely manner. behavior to reduce losses.
-
Real-time data processing in IoT systems: The amount of data generated by IoT devices is huge, and the HTAP database can process and analyze these data in real time to provide support for intelligent decision-making, such as in industrial automation and smart city construction.
-
Data service platform or data middle platform: HTAP database can be used as the underlying support of the data service platform, providing unified data views and real-time data analysis capabilities, supporting the data needs of different departments within the enterprise, and promoting data sharing and business Innovation.
-
Network management and optimization in the telecommunications industry: Telecom operators can use the HTAP database to monitor network status in real time, analyze user behavior patterns, optimize network resource allocation, and improve network service quality.
-
Real-time analysis in enterprise resource planning (ERP) systems: In ERP systems, HTAP databases can provide real-time business insights and decision support, helping companies quickly respond to market changes and optimize resource allocation.
Through its row-column coexistence storage architecture, the HTAP database effectively combines the advantages of OLTP and OLAP, supports real-time transaction processing and complex data analysis, and meets the needs of modern enterprises for data real-time, consistency and analysis capabilities. With the development of technology and the deepening of application, HTAP database will play an important role in more fields.
Preparation before starting
Environment configuration
Before you begin, make sure you have downloaded and installed the following software:
-
Confirm that you have completed Stand-alone Deployment of MatrixOne.
-
Make sure you have installed Python 3.8(or plus). Use the following code to check the Python version to confirm the installation was successful:
#To check with Python installation and its version python3 -V
-
Confirm that you have completed installing the MySQL client.
Application cases
We create a real-time risk control and anti-fraud application example in the financial industry based on Scenario 1, mainly by simulating concurrent transactions and regularly checking whether the total transaction amount of any user exceeds their account balance within 24 hours. The system will report these risky transactions.
Create table
Prepare a user table and transaction table
drop table transactions;
drop table users;
CREATE TABLE IF NOT EXISTS Users (
user_id INT AUTO_INCREMENT PRIMARY KEY, --user id
username VARCHAR(255) NOT NULL, --username
account_balance DECIMAL(10, 2) NOT NULL --account balance
);
CREATE TABLE IF NOT EXISTS Transactions (
transaction_id INT AUTO_INCREMENT PRIMARY KEY, --transaction id
user_id INT NOT NULL, --user id
amount DECIMAL(10, 2) NOT NULL, --transaction amount
transaction_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP, --transaction time
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
Create database connection pool
import mysql.connector
def create_connection_pool(max_connections=10):
return mysql.connector.pooling.MySQLConnectionPool(
pool_name="my_connection_pool",
pool_size=max_connections,
host='127.0.0.1',
database = 'test',
user = 'root',
password = '111',
port = 6001,
autocommit=True
)
Prepare user information data
Here, we will programmatically generate 100 pieces of random user account data.
import random
def insert_users(pool):
conn = pool.get_connection()
try:
cursor = conn.cursor()
for _ in range(100): # Suppose we want to insert 100 users
username = f'user{_}'
account_balance = round(random.uniform(100, 10000), 2)
sql = "INSERT INTO Users (username, account_balance) VALUES (%s, %s)"
cursor.execute(sql, (username, account_balance))
except Exception as e:
print(e)
Prepare transaction information data
Transaction information is randomly generated through the program.
import secrets
def insert_transactions(pool):
conn = pool.get_connection()
try:
cursor = conn.cursor()
for _ in range(10): # Assume we want to insert 10 pieces of data
user_id = random.randint(1, 100) # Assume user ID from 1 to 100
amount = round(secrets.randbelow(1000 *100) /100, 2)
sql = """
INSERT INTO Transactions (user_id, amount)
VALUES (%s, %s)
"""
cursor.execute(sql, (user_id, amount))
except Exception as e:
print(e)
Define risk control analysis rules
We define risk control rules as: summarizing all transactions of each user within a day and calculating the total transaction amount. If it is found that the cumulative transaction amount of a user on the same day exceeds the current balance in his account, the system will automatically mark the user as a suspicious user, thus triggering a risk warning.
def analyze_transactions(pool):
try:
conn = pool.get_connection()
cursor = conn.cursor()
sql = """
SELECT
t.user_id,
SUM(t.amount) AS total_transaction_amount,
t.transaction_time,
u.account_balance
FROM
Transactions t
JOIN
Users u ON t.user_id = u.user_id
WHERE
t.transaction_time > DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY
t.user_id, t.transaction_time,u.account_balance
HAVING
SUM(t.amount) > u.account_balance;
"""
cursor.execute(sql)
result = cursor.fetchall()
for row in result:
print("suspicious account id",row[0],"total transaction amount of the day",row[1],"transaction time",row[2],"account balance",row[3])
except Exception as e:
print(e)
Trading and Analysis
Here, we created five threads to simulate a parallel processing trading environment. After each transaction operation is successfully completed, the system will automatically start the risk control analysis process.
import threading
def thread_insert():
#Create connection pool
pool = create_connection_pool()
if pool:
#Create multiple threads
threads = []
for _ in range(5): # Create 5 threads
thread = threading.Thread(target=insert_transactions, args=(pool,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join() # Wait for all threads to complete
analyze_transactions(pool)
Create a scheduled task
By arranging scheduled tasks, the simulation generates a continuous stream of real-time transaction data.
import time
from apscheduler.schedulers.background import BackgroundScheduler
def main():
pool = create_connection_pool()
insert_users(pool)
while True:
#Create background scheduler
scheduler = BackgroundScheduler()
# Execute the data insertion task every minute and conduct risk control analysis
scheduler.add_job(thread_insert, 'interval', minutes=1)
# Start scheduler
scheduler.start()
# Prevent script from exiting
try:
while True:
time.sleep(2)
except (KeyboardInterrupt, SystemExit):
# Close the scheduler
scheduler.shutdown()
View results
if __name__ == '__main__':
main()
Console output:
Suspicious account id 56 Total transaction amount for the day 754.52 Transaction time 2024-07-12 15:57:33 Account balance 516.29
Suspicious account id 82 Total transaction amount for the day 817.18 Transaction time 2024-07-12 15:57:33 Account balance 461.46
Suspicious account id 40 Total transaction amount for the day 502.32 Transaction time 2024-07-12 15:57:33 Account balance 174.15
Suspicious account id 82 Total transaction amount for the day 525.94 Transaction time 2024-07-12 15:57:33 Account balance 461.46
...