MySQL Index
There are some expensive operations for hive metastore when accessing or storing metadatas on RDBMS.
Here are some official hive patches for indexing.
-- HIVE-21063
CREATE UNIQUE INDEX `NOTIFICATION_LOG_EVENT_ID` ON NOTIFICATION_LOG (`EVENT_ID`) USING BTREE;
-- HIVE-21487
CREATE INDEX COMPLETED_COMPACTIONS_RES ON COMPLETED_COMPACTIONS (CC_DATABASE,CC_TABLE,CC_PARTITION);
-- HIVE-27165
DROP INDEX TAB_COL_STATS_IDX ON TAB_COL_STATS;
CREATE INDEX TAB_COL_STATS_IDX ON TAB_COL_STATS (DB_NAME, TABLE_NAME, COLUMN_NAME, CAT_NAME) USING BTREE;
DROP INDEX PCS_STATS_IDX ON PART_COL_STATS;
CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME,CAT_NAME) USING BTREE;
When you upgrade your hive, there are some changes on tables in rdbms. You can find needed SQLs depending on version at https://github.com/apache/hive/tree/master/standalone-metastore/metastore-server/src/main/sql/mysql.
Additionally, slow query has been found for column EVENT_TIME
on table NOTIFICATION_LOG
. There is no official patch related to this but I create an index as follows:
CREATE INDEX `NOTIFICATION_LOG_EVENT_TIME` ON NOTIFICATION_LOG (`EVENT_TIME`) USING BTREE;
When using MySQL 8.x
Since MySQL >= 8.0, utf8mb4
and utf8mb4_0900_ai_ci
are used as default character and collation.
On apache hive, there are some related patches but not fully resolved. (e.g. https://issues.apache.org/jira/browse/HIVE-18083)
Therefore, if you want to use MySQL >= 8.0 with Hive 3.1.3, I advise you to set mysql connection options on jdbc url like this:
jdbc:mysql://{your_mysql_host}/{database}?characterEncoding=latin1&connectionCollation=latin1_bin
Refer to: