Transaction General Concepts

Why Are Transactions Needed?

In many large, critical applications, computers execute a vast number of tasks every second. More often than not, it's not the tasks themselves, but combining these tasks to fulfill a business requirement that forms atransaction. What happens if one task successfully executes, but an error occurs in a second or third related task? This error could very likely leave the system data in an inconsistent state. This is where transactions become crucially important; they can rescue system data from such an inconsistent state.

How can we understand transactions? Consider a banking system. Without transactions to control and manage data, it's highly probable that customer A could withdraw money from a corporate account while customers B and C are simultaneously withdrawing money from the same account. Each transfer involves changes to at least two account details (e.g., A's account credited, corporate account debited; B's account credited, corporate account debited; C's account credited, corporate account debited). Without transactions, the exact balance of the account would be indeterminate. By introducing transactions, the fundamental characteristics (ACID) ensure that financial operations in the banking ledger areatomic (indivisible), that other users see amounts withisolation, that each operation maintainsconsistency, and that all operations aredurable. This guarantees that the bank system's incoming and outgoing transactions remain consistent.

What Is a Transaction?

Adatabase transaction (or simply, atransaction) is a logical unit within a database management system's execution process, consisting of a finite sequence of database operations.

A transaction is essentially a series of SQL statements that are eithercommitted orrolled back as a single logical unit.

Characteristics of Transactions

Transactions typically possess four characteristics, collectively known asACID:

Atomicity*: The atomicity of a transaction means that it is anindivisible unit**. Operations within a transaction either all occur or none of them occur.

For example, consider the following transaction:

```sql
start transaction;
insert into t1 values(1,2,3),(4,5,6);
update t2 set c1='b' where c1='B';
commit;
```

If either the `insert` into `t1` or the `update` of `t2` fails, the entire transaction will be rolled back. The transaction will only commit successfully if both statements succeed; it will not allow one operation to succeed while the other fails.

Consistency*: The consistency of a transaction means that before and after the transaction, the data must remaincorrect** and adhere to all relevant data constraints.

For example, if we create a new table in the database:

```sql
create table t1(a int primary key,b varchar(5) not null);
```

To ensure data consistency when inserting data, you must ensure that the data types and ranges for columns `a` and `b` are correct, and also satisfy the primary key constraint for column `a` and the non-null constraint for column `b`:

```sql
insert into t1 values(1,'abcde'),(2,'bcdef');
```

*Isolation: The isolation of a transaction means that when multiple users access the database concurrently, transactions must adhere to predefined isolation levels. Within the scope of a determined isolation level, one transaction should not be interfered with by another.

For example, consider the following transaction example, where the session isolation level is Read Committed. In session 1, the data visible is as follows:

```sql
select * from t1;
+------+------+
| a    | b    |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
+------+------+
```

Now, in session 2, perform the following operation:

```sql
begin;
delete from t1 where a=3;
```

In session 1, the data you see will still not have changed:

```sql
select * from t1;
+------+------+
| a    | b    |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
+------+------+
```

It's only after committing the current transaction in session 2:

```sql
commit;
```

that session 1 will see the result of the committed transaction:

```sql
select * from t1;
+------+------+
| a    | b    |
+------+------+
|    1 | a    |
|    2 | b    |
+------+------+
```

Durability*: The durability of a transaction means that once a transaction iscommitted in the database, its changes to the data arepermanent**, regardless of whether the database software restarts.

Types of Transactions

In databases, transactions are classified into the following categories:

Based on whether their start and end are explicitly defined, they are divided intoexplicit transactions andimplicit transactions.
Based on the stage of resource lock usage, they are divided intooptimistic transactions andpessimistic transactions.

These two major categories of transactions are independent of each other. An explicit transaction can be optimistic or pessimistic, and a pessimistic transaction can be explicit or implicit.

Explicit Transactions and Implicit Transactions

*Explicit Transactions: Generally, a transaction can be explicitly started by executing a BEGIN statement. It can be explicitly ended by executing COMMIT or ROLLBACK. The way MatrixOne's explicit transactions start and execute is slightly different; refer to MatrixOne Intelligence's explicit transactions.

*Implicit Transactions: This means transactions can implicitly begin and end without the need for BEGIN TRANSACTION, COMMIT, or ROLLBACK statements. Implicit transactions behave the same way as explicit transactions. However, the rules for determining when an implicit transaction starts differ from those for explicit transactions. MatrixOne's implicit transactions start and execute differently; refer to MatrixOne Intelligence's implicit transactions.

Optimistic Transactions and Pessimistic Transactions

Regardless of whether a transaction is optimistic or pessimistic, their execution results are the same. That is, the requirements for ACID levels for operations within a transaction are entirely identical. There is no situation where optimistic transactions are more lenient or pessimistic transactions are stricter regarding atomicity, consistency, isolation, or durability.

The difference between optimistic and pessimistic transactions lies solely in their different execution strategies for two-phase commit based on the state of the pending business. The choice is based on the executor's judgment, and their efficiency depends on the actual state of the business being processed (the frequency of write conflicts in concurrent transactions). Specifically, they make different assumptions about the state of transaction-related resources, thus placing write locks at different stages.

At the start of anoptimistic transaction, it's assumed that the related tables are in a state where write conflicts will not occur. Data insertions, modifications, or deletions are buffered in memory, and no locks are applied to the data during this phase. Locks are applied to the corresponding data tables or rows only at the time of data submission, and then unlocked after submission is complete.

Conversely, in apessimistic transaction, it's assumed at the start of the transaction that write conflicts are highly likely to occur in the related tables. Therefore, locks are applied to the relevant tables or rows proactively. Only then are the related data inserted, modified, or deleted in memory and committed. The data is unlocked only after the commit or rollback is complete.

Optimistic and pessimistic transactions have the following advantages and disadvantages during use:

Optimistic transactions* are more friendly to systems with fewer write operations and more read operations, avoiding deadlocks. Optimistic transactions, when submitting larger transactions, might repeatedly retry and ultimately fail due to conflicts. Pessimistic transactions are more friendly to systems with more write operations, preventing write-write conflicts at the database level. *Pessimistic transactions, in highly concurrent scenarios, if a long-running transaction occurs, might lead to system blockage and impact throughput.

For details on MatrixOne Intelligence's optimistic transactions, refer to MatrixOne Intelligence's optimistic transactions.

For details on MatrixOne Intelligence's pessimistic transactions, refer to MatrixOne Intelligence's pessimistic transactions.

Transaction Isolation

One of the characteristics of transactions is isolation, often referred to astransaction isolation.

Among the four ACID properties of database transactions, isolation is the most flexible constraint. To achieve higher isolation levels, database systems typically use locking mechanisms or multi-version concurrency control mechanisms. Application software also requires additional logic to function correctly.

Many Database Management Systems (DBMS) define different "transaction isolation levels" to control the degree of locking. In many database systems, most transactions avoid high isolation levels (like Serializable) to reduce the overhead of locking. Programmers need to carefully analyze the database access code to ensure that lowering the isolation level doesn't cause subtle code errors. Conversely, higher isolation levels increase the likelihood of deadlocks, which also need to be avoided during programming.

Since a higher isolation level prohibits no operations that a lower isolation level allows, a DBMS is permitted to use a higher isolation level than the one requested.

The ANSI/ISO SQL standard defines four standard isolation levels:

Isolation Level	Dirty Write	Dirty Read	Non-Repeatable Read (Fuzzy Read)	Phantom Read
READ UNCOMMITTED	Not Possible	Possible	Possible	Possible
READ COMMITTED	Not Possible	Not Possible	Possible	Possible
REPEATABLE READ	Not Possible	Not Possible	Not Possible	Possible
SERIALIZABLE	Not Possible	Not Possible	Not Possible	Not Possible

Read Uncommitted*: Read Uncommitted is thelowest isolation level**. It allows "dirty reads," meaning a transaction can see modifications made by other transactions that have not yet been committed.

*Read Committed: In the Read Committed level, DBMSs based on lock-based concurrency control need to hold write locks on selected objects until the end of the transaction, but read locks are released immediately after the SELECT operation completes. Like the previous isolation level, it does not require "range locks."

*Repeatable Read: In the Repeatable Read isolation level, DBMSs based on lock-based concurrency control need to hold read locks and write locks on selected objects until the end of the transaction, but it does not require "range locks," so "phantom reads" may occur. MatrixOne Intelligence implements Snapshot Isolation. To maintain consistency with MySQL's isolation levels, MatrixOne Intelligence's Snapshot Isolation is also referred to as Repeatable Read.

Serializable*: Serializable is thehighest isolation level**. On DBMSs based on lock-based concurrency control, Serializable requires that read and write locks on selected objects are not released until the transaction ends. A "range-lock" should be acquired when a SELECT query uses a WHERE clause to describe a range.

By requiring more restrictions than lower isolation levels, higher levels provide stronger isolation. The standard allows transactions to run at a stronger transaction isolation level.

!!! note MatrixOne Intelligence's transaction isolation differs slightly from the common definitions of isolation levels. Refer to MatrixOne Intelligence's Isolation Levels.