I like Django, MySQL, and Unicode, but getting all three to play together nicely can sometimes be a bit challenging. One of the more annoying things is getting Django to make a migration that will create a 255 character CharField that is encoded using the utf8mb4 character set and indexed.

Out of the box, MySQL's InnoDB table type has a maximum index length or 767 bytes. This is enough to index 255 characters in the utf8 encoding, but that encoding won't work for storing any Unicode data from the Supplementary Multilingual Plane. That means you can't put a unicorn face (🦄) or a slice of pizza (🍕) into a column using this encoding. Changing to the utf8mb4 character set will allow you to store four byte code points, but only index 191 characters.

With MySQL 5.5.14 and later you can raise this limit to 3072 bytes by using the innodb_large_prefix setting along with the Barracuda file format, file per table storage, and dynamic row format. The first three can all be set server wide, but the row format for the table needs to be provided in a CREATE TABLE or ALTER TABLE statement as a ROW_FORMAT=DYNAMIC attribute.

Django does not have a feature flag or setting for adding the needed attribute. I've worked around this before by using manual database hacking, but today I figured out a hack that you can manually apply to your Django migration files to work around it. The trick is to edit the migration so that the initial field creation uses a length that will fit in the 767 byte limit, and then add a RunSQL to change the table's row format and an AlterField to increase the field length.

 operations = [
     migrations.CreateModel(
         name='UnicodeHack',
         fields=[
             ('hack', models.CharField(unique=True, max_length=128)),
         ],
     ),
     migrations.RunSQL('ALTER TABLE unicodehack ROW_FORMAT = DYNAMIC;'),
     migrations.AlterField(
         model_name='unicodehack',
         name='hack',
         field=models.CharField(unique=True, max_length=255),
     ),
 ]

Comments