How to Convert an Entire Database From UTF8 to UTF8MB4

  • by
Since MySQL UTF8 supports only up to 3-byte characters, so we can’t store 4-byte characters in UTF8 columns. This will be a problem for serving visitors who come from multi-byte language (1 to 4 bytes) countries.

For more flexible and extendable in the future, MySQL DBA should consider to convert an entire database from UTF8 to UTF8MB4 in order to accommodate up to 4 bytes characters.

The very first step is to set MySQL server configuration.
  1. MySQL Server Configuration
  2. According to the official document:
    MySQL :: MySQL 5.7 Reference Manual :: 10.1.5 Configuring the Character Set and Collation for Applications
    Specify character settings at server startup. To select a character set and collation at server startup, use the –character-set-server and –collation-server options. For example, to specify the options in an option file, include these lines:
    [mysqld]
    character-set-server=utf8
    collation-server=utf8_general_ci

    These settings apply server-wide and apply as the defaults for databases created by any application, and for tables created in those databases.
    We should follow the manual to add two lines in the file.
    [root@test ~]# vi /etc/my.cnf
    ...
    [mysqld]
    ...
    # Startup MySQL server with utf8mb4
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci

    [client]
    default-character-set = utf8mb4

    [mysql]
    default-character-set = utf8mb4
    ...

    Please restart mysqld service for further configurations.
    [root@test ~]# service mysqld restart
    Stopping mysqld:                                           [  OK  ]
    Starting mysqld:                                           [  OK  ]

  3. MySQL Clients Configuration
  4. You have to SET NAMES for the connections to utf8mb4. Take PHP code snippet as an example:
    ...
    $options = array(PDO::ATTR_PERSISTENT => true, PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci', PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION);
    ...

  5. Database Character Set Conversion.
  6. You can use mysql command-line to execute the conversion.
    [root@test ~]# mysql -u root -p -D db1 -e "alter database `db1` character set = utf8mb4 collate = utf8mb4_unicode_ci"
    Enter password:

  7. Drop all foreign key constrains before altering tables.
  8. Before you execute the sql file, you have to make sure all the reference constrains are all dropped. Otherwise, you will get ERROR 1832 on string type column foreign constrains. For example.
    ERROR 1832 (HY000): Cannot change column 'facebook_url': used in a foreign key constraint 'urls_fk'
  9. All Tables Character Set Conversion.
  10. It would be a hard work to identify all tables and do the conversion one by one when there’re thousands of tables in the database. Next, I introduce a smarter way to output all executable sql statements to a file by one line command.
    [root@test ~]# mysql -u root -p -D db1 -s -e "select concat('alter table ', table_name, ' convert to character set utf8mb4 collate utf8mb4_unicode_ci;') from information_schema.tables where table_schema = 'db1'" > utf8mb4.sql
    Enter password:

    Let’s see the content of the sql file:
    [root@test ~]# cat utf8mb4.sql
    alter table table1 convert to character set utf8mb4 collate utf8mb4_unicode_ci;
    alter table table2 convert to character set utf8mb4 collate utf8mb4_unicode_ci;
    ...

    Let’s do the actual conversion on all tables.
    [root@test ~]# mysql -u root -p -D db1 < utf8mb4.sql
    Enter password:

    During the process, you might see ERROR 1071 at line xxx. To solve the problem, you may refer to this post:
    How to Resolve ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

  11. Recreate all dropped reference constrains.

Leave a Reply

Your email address will not be published. Required fields are marked *