SAMPLE sampling function
The SAMPLE sampling function function is a key tool for handling large amounts of data analysis, mainly used to quickly reduce the scope of query.
- Syntax Structure
SELECT SAMPLE(<column_list>, <N ROWS>/<K PERCENT>) FROM <table> [WHERE ...] [GROUP BY ...] [ORDER BY ...] [LIMIT ...] [OFFSET ...]
<column_list>: Selected column names list.-
<N ROWS>/<K PERCENT>: Specifies the number of samples returned (N rows) or percentage (K%). -
Functional Features
-
The SAMPLE function will filter the table and then perform sampling.
- Returns N random samples in the table, or K% random samples.
- When N rows are specified, N is a positive integer of 1-1000.
- When K% is specified, the value range of K is 0.01-99.99, which represents the probability that each row is selected. The result may be different each time and the number of rows is not fixed. For example, if the table has 10,000 rows, execute SAMPLE(a, 50 PERCENT); Since each row has a 50% probability of being selected, it is similar to throwing a 10,000 coin, the probability of the front and back sides is 50% each time, but the final result may be 350 front and 650 reverse sides.
- Supports multi-column sampling, such as SELECT SAMPLE(a,b,c, 100 ROWS) FROM t1;.
-
Can be used in combination with WHERE clause, GROUP BY clause, etc.
-
Application example
SELECT SAMPLE(a, 100 ROWS) FROM t1; -- Return 100 random samples
SELECT SAMPLE(a, 0.2 PERCENT) FROM t1; -- Returns about 0.2% of the sample
SELECT SAMPLE(a, 100 ROWS) FROM t1 WHERE a > 1; -- Filter first and then sample
SELECT a, SAMPLE(b, 100 ROWS) FROM t1 GROUP BY a; -- Post-group sampling