BITMAP Function
Function description
The BITMAP function is a set of built-in functions for processing bitmaps, bitmap is a continuous memory fragment stored as binary data types. These functions are particularly suitable for counting different values when dealing with hierarchical aggregation (such as multiple grouping sets), and the return result is consistent with count(distinct) but is more efficient.
We can only use one bit to identify whether an element exists or not. If it exists, it is 0. Use the nth bit of bitmap to record whether this element exists.
We stipulate that the maximum width of bitmap is 32768(2^15 = 4K). For non-negative integer n, the lower 15 bits (binary) are taken as the position in the bitmap, and the other high bits are the number of bitmap buckets. The following figure is the logic diagram of bitmap:

Each bucket is a bitmap. Since each bucket is orthogonal, each bucket performs operations (or, bit_count) only in the current bucket without having to care about other buckets.
Here are some commonly used BITMAP functions and their usage:
BITMAP_BUCKET_NUMBER
The purpose of the BITMAP_BUCKET_NUMBER() function is to determine the number of the bucket to which the given value belongs. A bucket is a larger set of bits that can contain multiple bits, each representing a specific value in the dataset. This function returns the bucket number. A bucket number is usually used to group bitmaps when performing an aggregation operation.
grammar
> BITMAP_BUCKET_NUMBER(numeric_expr)
Parameter definition
| Parameters | Description |
|---|---|
| numeric_expr | Required. Can cast into non-negative expressions. |
Example
mysql> SELECT bitmap_bucket_number(0);-- Return 0, indicating the location of the first bucket, the first bucket record 0-32767
+------------------------------+
| bitmap_bucket_number(0) |
+------------------------------+
| 0 |
+------------------------------+
1 row in set (0.00 sec)
mysql> SELECT bitmap_bucket_number(32767);-- Return 0 because 32767 belongs to the end position of the first bucket
+----------------------------------+
| bitmap_bucket_number(32767) |
+----------------------------------+
| 0 |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT bitmap_bucket_number(32768);-- Return 1 because 32768 belongs to the starting position of the second bucket
+----------------------------------+
| bitmap_bucket_number(32768) |
+----------------------------------+
| 1 |
+----------------------------------+
1 row in set (0.00 sec)
BITMAP_BIT_POSITION
The BITMAP_BIT_POSITION() function returns the relative bit position of the given value in the bucket (indexed from 0 to end with 32767). Used with BITMAP_BUCKET_NUMBER() to uniquely identify any number in a bitmap. Because the actual BITMAP_BIT_POSITION() marks the low 15 bits of the parameter (in binary), BITMAP_BUCKET_NUMBER() marks the high bits of the parameter.
grammar
BITMAP_BIT_POSITION(numeric_expr)
Parameter definition
| Parameters | Description |
|---|---|
| numeric_expr | Required. Can cast into non-negative expressions. |
Example
mysql> SELECT bitmap_bit_position(0);-- Return 0 because 0 is in the first position of the first bucket
+-----------------------------+
| bitmap_bit_position(0) |
+-----------------------------+
| 0 |
+-----------------------------+
1 row in set (0.00 sec)
mysql> SELECT bitmap_bit_position(32767);-- Return 32767 because the position of 32767 in the first bucket is the last
+----------------------------------+
| bitmap_bit_position(32767) |
+----------------------------------+
| 32767 |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT bitmap_bit_position(32768);-- Return 0 because 32768 is in the first position of the second bucket
+----------------------------------+
| bitmap_bit_position(32768) |
+----------------------------------+
| 0 |
+----------------------------------+
1 row in set (0.00 sec)
--40000's binary is: 1001110001000000, bitmap_bit_position record low 15 bits: 001110001000000, bitmap_bucket_number record high 1: 1
mysql> select bin(bitmap_bucket_number(40000)), bin(bitmap_bit_position(40000)), bin(40000);
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| bin(bitmap_bucket_number(40000)) | bin(bitmap_bit_position(40000)) | bin(40000) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | 1110001000000 | 1001110001000000 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.01 sec)
BITMAP_COUNT
The BITMAP_COUNT() function is used to calculate the number of bits set to 1 in bitmap, thereby obtaining the total number of different values. This is equivalent to performing a COUNT(DISTINCT) operation on a bitmap, but is usually faster than the traditional COUNT(DISTINCT) query.
The BITMAP_COUNT() function is generally used in conjunction with the following BITMAP_CONSTRUCT_AGG() and BITMAP_OR_AGG() functions.
BITMAP_CONSTRUCT_AGG
BITMAP_CONSTRUCT_AGG() is an aggregate function that is used in the database to build bitmap.
The BITMAP_CONSTRUCT_AGG() function is useful when counting a dense set of non-repetitive integer values, because it efficiently converts these values into bitmap form.
grammar
BITMAP_CONSTRUCT_AGG( <bit_position> )
Parameter definition
| Parameters | Description |
|---|---|
| bit_position | Required. Position in bitmap (returned by the BITMAP_BIT_POSITION function) |
Example
CREATE TABLE t1 ( n1 int);
INSERT INTO t1 VALUES(0),(1),(1),(32767);--Insert the data in [0,32767]
mysql> select * from t1;
+--------+
| n1 |
+--------+
| 0 |
| 1 |
| 1 |
| 32767 |
+--------+
4 rows in set (0.01 sec)
mysql> SELECT BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1)) AS bitmap FROM t1;
+-----------------------------+
| bitmap |
+-----------------------------+
| :0 ? |
+-----------------------------+
1 row in set (0.00 sec)
Note
The bitmap column contains the physical representation of bitmap and is not readable. To determine which bits are set, we should use the BITMAP function in conjunction with the BITMAP function (rather than checking the binary value yourself).
mysql> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS n1_discnt FROM t1;----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+---------------+
| n1_discnt |+---------------+
| 3 |
+---------------+
1 row in set (0.00 sec)
mysql> SELECT count(DISTINCT n1) AS n1_discnt FROM t1;--Return consistent
+---------------+
| n1_discnt |
+---------------+
| 3 |
+---------------+
1 row in set (0.01 sec)
INSERT INTO t1 VALUES(32768),(32769),(65535);--Insert data greater than 32767
mysql> select * from t1;
+--------+
| n1 |
+--------+
| 0 |
| 1 |
| 1 |
| 32767 |
| 32768 |
| 32769 |
| 65535 |
+--------+
7 rows in set (0.01 sec)
--The result is the same as the first insertion, because bucket_bit_position = n1 % 32768, the data inserted for the second time is located in the same position as the data inserted for the first time, so it is deduplicated.
mysql> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS n1_discnt FROM t1;
+---------------+
| t1_bitmap |
+---------------+
| 3 |
+---------------+
1 row in set (0.00 sec)
mysql> SELECT bitmap_bit_position(0), bitmap_bit_position(1), bitmap_bit_position(32767), bitmap_bit_position(32768), bitmap_bit_position(65535);
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| bitmap_bit_position(0) | bitmap_bit_position(1) | bitmap_bit_position(32767) | bitmap_bit_position(32768) | bitmap_bit_position(65535) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | 1 | 32767 | 0 | 32767 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)
Therefore, if you want to derepeat data greater than 32767, you need to combine the BITMAP_BUCKET_NUMBER() function.
--Grouped by buckets, the first bucket has three non-repetitive numbers (0,1,32767), and the second bucket has three non-repetitive numbers (32768,32769,65535).
mysql> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS t1_bitmap FROM t1 GROUP BY BITMAP_BUCKET_NUMBER(n1);
+---------------+
| t1_bitmap |
+---------------+
| 3 |
| 3 |
+---------------+
2 rows in set (0.01 sec)
--Combined with the sum() function to calculate the non-repetitive value of n1
mysql> SELECT SUM(t1_bitmap) FROM (
-> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS t1_bitmap
-> FROM t1
-> GROUP BY BITMAP_BUCKET_NUMBER(n1)
-> );
+---------------------+
| sum(t1_bitmap) |
+---------------------+
| 6 |
+---------------------+
1 row in set (0.01 sec)
BITMAP_OR_AGG
The BITMAP_OR_AGG() function is used to calculate the bitwise or (OR) results of multiple bitmap. It is usually used to merge multiple bitmaps to represent the combined information of all input bitmaps in one bitmap.
BITMAP_OR_AGG() is very useful when it is necessary to perform set concatenation operations on data of different dimensions, especially in data warehouses and analytical queries.
grammar
BITMAP_OR_AGG( bitmap )
Parameter definition
| Parameters | Description |
|---|---|
| bitmap | Required. All bitmaps bitmaps are bitwise or merged. |
Example
--Create a table to store information about the author's published books, including the author's name, year of publication and book id
CREATE TABLE book_table(
id int auto_increment primary key,
author varchar(100),
pub_year varchar(100),
book_id int
);
INSERT INTO book_table(author,pub_year,book_id) VALUES
('A Author','2020',1),('A Author','2020',1),('A Author','2020',1),('A Author','2020',32768),
('A Author','2021',32767),('A Author','2021',32768),('A Author','2021',65536),
('B Author','2020',2),('B Author','2020',10),('B Author','2020',32769),
('B Author','2021',5),('B Author','2021',65539);
mysql> select * from book_table;
+------+--------------+--------------+
| id | author | pub_year | book_id |
+------+--------------+--------------+
| 1 | A Author | 2020 | 1 |
| 2 | A Author | 2020 | 1 |
| 3 | A Author | 2020 | 32768 |
| 4 | A Author | 2021 | 32767 |
| 5 | A Author | 2021 | 32768 |
| 6 | A Author | 2021 | 65536 |
| 7 | B Author | 2020 | 2 |
| 8 | B Author | 2020 | 10 |
| 9 | B Author | 2020 | 32769 |
| 10 | B Author | 2021 | 5 |
| 11 | B Author | 2021 | 65539 |
+------+--------------+--------------+
11 rows in set (0.00 sec)
--Define a precalculation table and save the coarse-grained calculation results in the table. Subsequent aggregation of various different dimensions can use the results in the precalculation table. After simple calculations, the results can be obtained and the query can be accelerated.
CREATE TABLE precompute AS
SELECT
author,
pub_year,
BITMAP_BUCKET_NUMBER(book_id) as bucket,
BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(book_id)) as bitmap
FROM book_table
GROUP BY author,pub_year,bucket;
mysql> select * from precompute;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| author | pub_year | bucket | bitmap |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Author A | 2020 | 0 | :0 |
| Author A | 2020 | 1 | :0 |
| Author A | 2021 | 0 | :0 ? |
| Author A | 2021 | 1 | :0 |
| Author A | 2021 | 2 | :0 |
| Author B | 2020 | 0 | :0 |
| Author B | 2020 | 1 | :0 |
| Author B | 2021 | 0 | :0 |
| Author B | 2021 | 2 | :0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Calculate the number of deduplications of book_id under the aggregation of the author and publication year, which reflects the number of books published by the author in different years.
The --sum() function accumulates the number of 1 in the bitmap of different buckets.
--For example, when author=A author, pub_year=2020, book_id=(1,1,32768), it is book_id=(1,32768), but 1 is located in the first bucket,32768 is located in the second bucket, so sum is needed for accumulation.
mysql>SELECT
-> author,
-> pub_year,
-> SUM(BITMAP_COUNT(bitmap))
-> FROM precompute
-> GROUP BY author,pub_year;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| author | pub_year | sum(bitmap_count(bitmap)) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Author A | 2020 | 2 |
| Author A | 2021 | 3 |
| Author B | 2020 | 3 |
| Author B | 2021 | 2 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 rows in set (0.00 sec)
mysql> SELECT author,pub_year,count( DISTINCT book_id) FROM book_table group by author,pub_year;--Return consistent
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| author | pub_year | count(distinct book_id) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| A Author | 2020 | 2 |
| A Author | 2021 | 3 |
| B Author | 2020 | 3 |
| B Author | 2021 | 2 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 rows in set (0.00 sec)
-- Calculate the number of deduplication of book_id in the case of author aggregation, which reflects the number of books published by the author in total.
--BITMAP_OR_AGG() function combines bitmap with different dimensions (same author and different years).
--For example, when author=A author, pub_date=2020, book_id deduplication is (1,32768), pub_date=2021, book_id deduplication is (32767,32768,65536), BITMAP_OR_AGG does or operates the bitmap of two different years to obtain book_id=(1,32767,32768,65536), and finally sum() accumulates book_id of different bucktets.
mysql> SELECT author, SUM(cnt) FROM (
-> SELECT
-> author,
-> BITMAP_COUNT(BITMAP_OR_AGG(bitmap)) cnt
-> FROM precompute
-> GROUP BY author,bucket
-> )
-> GROUP BY author;
+-------------------------+
| author | sum(cnt) |
+-------------------------+
| Author A | 4 |
| Author B | 5 |
+-------------------------+
2 rows in set (0.01 sec)
mysql> SELECT author,count(DISTINCT book_id) FROM book_table GROUP BY author;--Return to the same
+------------------------------------------------------------------------------------------------------------------------------
| author | count(distinct book_id) |
+------------------------------------------------------------------------------------------------------------------------------
| A Author | 4 |
| B Author | 5 |
+------------------------------------------------------------------------------------------------------------------------------
2 rows in set (0.00 sec)