Loading...
 

MySQLPartitioning

For large tables partitioning them can significantly improve performance. If the table has a date field a good way to partition would be to use MONTH-HASH partitioning. The date field needs to be the primary key or at least part of the primary key

Creating a table with this partitioning

CREATE TABLE `stats` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `time` datetime,
  `client` int(11),
  `api` int(11),
  `num_requests` int(255),
  `avg_time` float,
  `90_percentile` float,
  `95_percentile` float,
  PRIMARY KEY (`id,time`)
) ENGINE=MyINNODB 
  AUTO_INCREMENT=0 
  DEFAULT CHARSET=utf8 
  COLLATE=utf8_unicode_ci 
  PARTITION BY RANGE( MONTH(time) )   
  PARTITIONS 12   
  SUBPARTITION BY HASH( TO_DAYS(time) )   
  SUBPARTITIONS 2 (   
    PARTITION pJAN VALUES LESS THAN (2),   
    PARTITION pFEB VALUES LESS THAN (3),   
    PARTITION pMAR VALUES LESS THAN (4),   
    PARTITION pAPR VALUES LESS THAN (5),   
    PARTITION pMAY VALUES LESS THAN (6),   
    PARTITION pJUN VALUES LESS THAN (7),   
    PARTITION pJUL VALUES LESS THAN (8),   
    PARTITION pAUG VALUES LESS THAN (9),   
    PARTITION pSEP VALUES LESS THAN (10),   
    PARTITION pOCT VALUES LESS THAN (11),   
    PARTITION pNOV VALUES LESS THAN (12),   
    PARTITION pDEC VALUES LESS THAN MAXVALUE   
  )
;

Modifying an old table

Add the date field into the primary key

MariaDB [api_perf]> alter table stats DROP PRIMARY KEY, ADD  PRIMARY KEY (id,time);
Query OK, 0 rows affected (0.75 sec)
Records: 0  Duplicates: 0  Warnings: 0

Alter the table

MariaDB [api_perf]> alter table stats 
    ->   PARTITION BY RANGE( MONTH(time) )
    ->   PARTITIONS 12
    ->   SUBPARTITION BY HASH( TO_DAYS(time) )
    ->   SUBPARTITIONS 2 (
    ->   PARTITION pJAN VALUES LESS THAN (2),
    ->   PARTITION pFEB VALUES LESS THAN (3),
    ->   PARTITION pMAR VALUES LESS THAN (4),
    ->   PARTITION pAPR VALUES LESS THAN (5),
    ->   PARTITION pMAY VALUES LESS THAN (6),
    ->   PARTITION pJUN VALUES LESS THAN (7),
    ->   PARTITION pJUL VALUES LESS THAN (8),
    ->   PARTITION pAUG VALUES LESS THAN (9),
    ->   PARTITION pSEP VALUES LESS THAN (10),
    ->   PARTITION pOCT VALUES LESS THAN (11),
    ->   PARTITION pNOV VALUES LESS THAN (12),
    ->   PARTITION pDEC VALUES LESS THAN MAXVALUE
    ->   )
    -> ;
Query OK, 0 rows affected (2.95 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [api_perf]>

Testing

This function within MySQL allows the data to be recycled back into the month partition regardless of the year. It saves having to create new partitions all the time but will still need to have a scheduled job to clear up old data.

By performing inserts over different years and checking the number of TABLE_ROWS per partition, we can see the reuse of the same partition.

MariaDB [api_perf]> insert into stats (time,client,api,num_requests,avg_time) values("2013-02-01 00:00:00",1,1,30,12.2);
Query OK, 1 row affected (0.00 sec)

MariaDB [api_perf]> insert into stats (time,client,api,num_requests,avg_time) values("2014-02-01 00:00:00",1,1,30,12.2);
Query OK, 1 row affected (0.05 sec)

MariaDB [api_perf]> insert into stats (time,client,api,num_requests,avg_time) values("2015-02-01 00:00:00",1,1,30,12.2);
Query OK, 1 row affected (0.00 sec)

MariaDB [api_perf]> 
MariaDB [api_perf]> 
MariaDB [api_perf]> 
MariaDB [api_perf]> 
MariaDB [api_perf]> 
MariaDB [api_perf]> 
MariaDB [api_perf]> 
MariaDB [api_perf]> select * FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME='stats' and PARTITION_NAME="pFEB"\G;
*************************** 1. row ***************************
                TABLE_CATALOG: NULL
                 TABLE_SCHEMA: api_perf
                   TABLE_NAME: stats
               PARTITION_NAME: pFEB
            SUBPARTITION_NAME: pFEBsp0
   PARTITION_ORDINAL_POSITION: 2
SUBPARTITION_ORDINAL_POSITION: 1
             PARTITION_METHOD: RANGE
          SUBPARTITION_METHOD: HASH
         PARTITION_EXPRESSION:  MONTH(time)
      SUBPARTITION_EXPRESSION:  TO_DAYS(time)
        PARTITION_DESCRIPTION: 3
                   TABLE_ROWS: 1
               AVG_ROW_LENGTH: 16384
                  DATA_LENGTH: 16384
              MAX_DATA_LENGTH: NULL
                 INDEX_LENGTH: 0
                    DATA_FREE: 0
                  CREATE_TIME: NULL
                  UPDATE_TIME: NULL
                   CHECK_TIME: NULL
                     CHECKSUM: NULL
            PARTITION_COMMENT: 
                    NODEGROUP: default
              TABLESPACE_NAME: NULL
*************************** 2. row ***************************
                TABLE_CATALOG: NULL
                 TABLE_SCHEMA: api_perf
                   TABLE_NAME: stats
               PARTITION_NAME: pFEB
            SUBPARTITION_NAME: pFEBsp1
   PARTITION_ORDINAL_POSITION: 2
SUBPARTITION_ORDINAL_POSITION: 2
             PARTITION_METHOD: RANGE
          SUBPARTITION_METHOD: HASH
         PARTITION_EXPRESSION:  MONTH(time)
      SUBPARTITION_EXPRESSION:  TO_DAYS(time)
        PARTITION_DESCRIPTION: 3
                   TABLE_ROWS: 2
               AVG_ROW_LENGTH: 8192
                  DATA_LENGTH: 16384
              MAX_DATA_LENGTH: NULL
                 INDEX_LENGTH: 0
                    DATA_FREE: 0
                  CREATE_TIME: NULL
                  UPDATE_TIME: NULL
                   CHECK_TIME: NULL
                     CHECKSUM: NULL
            PARTITION_COMMENT: 
                    NODEGROUP: default
              TABLESPACE_NAME: NULL
2 rows in set (0.00 sec)

ERROR: No query specified