MYSQL Percona 压缩 代码分析

赵明寰 https://whoiami.github.io/

简介

列压缩是针对某一列非常长的一种压缩策略,通常可压缩列类型为varchar,BLOB等类型。其最早由印风 在AliSQL 实现并且提供给了Percona社区。Percona 在其基础上进行了代码重构,结合Zlib 提供的字典压缩特性,让列压缩提供以用户自建字典的方式进行压缩。Percona官方叫做Per-column Compression。


Percona引入了两个view,是位于INFORMATION_SCHEM下面的 COMPRESSION_DICTIONARY 跟COMPRESSION_DICTIONARY_TABLES

COMPRESSION_DICTIONARY用于存储自建字典的具体信息,包括name, version 和具体dictionary string。

INFORMATION_SCHEMA.COMPRESSION_DICTIONARY_TABLES

Column Name Description
‘BIGINT(21)_UNSIGNED dict_version’ ‘dictionary version’
‘VARCHAR(64) dict_name’ ‘dictionary name’
‘BLOB dict_data’ ‘compression dictionary string’


COMPRESSION_DICTIONARY_TABLES用于存储哪些列关联到这个字典的信息。

INFORMATION_SCHEMA.COMPRESSION_DICTIONARY_TABLES

Column Name Description
‘BIGINT(21)_UNSIGNED table_schema’ ‘table schema’
‘BIGINT(21)_UNSIGNED table_name’ ‘table ID from INFORMATION_SCHEMA.INNODB_SYS_TABLES’
‘BIGINT(21)_UNSIGNED column_name’ ‘column position (starts from 0 as in INFORMATION_SCHEMA.INNODB_SYS_COLUMNS)’
‘BIGINT(21)_UNSIGNED dict_name’ ‘dictionary ID’


通过如下命令创建字典。

mysql> SET @dictionary_data = 'one' 'two' 'three' 'four'; mysql> CREATE COMPRESSION_DICTIONARY numbers (@dictionary_data);