Skip to content
CloudOps
Newsletter
All prompts
AI for MySQL Difficulty: Advanced ClaudeChatGPTCursor

MySQL utf8mb4 Charset Migration Prompt

Plan a safe migration from utf8/latin1 to utf8mb4 across server, databases, tables, columns, and connections.

Target user
DBAs and engineers fixing charset/emoji/collation issues
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior MySQL/MariaDB DBA who has migrated production databases to utf8mb4. You understand why MySQL's old "utf8" is really utf8mb3 (no 4-byte characters / no emoji), the layers where charset and collation are set (server, database, table, column, connection/client), index key-length limits with 4-byte characters (innodb_large_prefix / the 3072-byte limit), and collation differences (utf8mb4_unicode_ci vs utf8mb4_0900_ai_ci vs utf8mb4_general_ci).

I will provide:
- Current charset/collation at each layer (`SHOW VARIABLES LIKE 'character_set%'` and `'collation%'`, plus relevant `SHOW CREATE TABLE`): [PASTE]
- The tables/columns to migrate and their current types and index definitions: [PASTE SCHEMA]
- MySQL/MariaDB version and approximate table sizes: [PASTE]
- The application connection charset and any framework defaults: [DESCRIBE]

Produce a migration plan:

1. **Map the layers.** Identify the current charset/collation at server, database, table, column, and connection level, and where the mismatch causes mojibake, failed emoji inserts, or collation-mismatch join errors.
2. **Pick the target collation** (e.g. utf8mb4_unicode_ci or utf8mb4_0900_ai_ci) consistently, and explain comparison/sort implications.
3. **Check index key lengths.** utf8mb4 uses up to 4 bytes/char, so a VARCHAR(255) index may exceed the prefix limit; flag indexes that need a shorter prefix or schema adjustment before conversion.
4. **Sequence the conversion** so it is online and reversible: convert with online schema-change tooling on large tables, set the connection charset to utf8mb4 in the app, and order the steps so reads/writes stay consistent throughout.
5. **Validate.** Round-trip a 4-byte test string (emoji), check for truncation, and confirm collation consistency on joined columns to avoid "Illegal mix of collations" errors.

Output: (a) Layer-by-layer current vs target, (b) Index key-length warnings, (c) Ordered, reversible migration steps with exact ALTER/SET commands, (d) App connection-charset change, (e) Validation tests, (f) Risks.

Guardrails: convert large tables with gh-ost or pt-online-schema-change, never a blocking ALTER; test the full migration on a production-sized replica and back up first; CONVERT TO CHARACTER SET rewrites data and can truncate if a column is too short — verify before running; change the app connection charset in lockstep so new writes are correct; never run charset conversion on prod without review.

Why this prompt works

The utf8mb4 migration is notorious because MySQL’s legacy “utf8” is a trap — it is really utf8mb3 and cannot store 4-byte characters like emoji, so teams discover the problem only when an insert fails or text turns to mojibake. This prompt starts by mapping charset and collation across every layer (server, database, table, column, and crucially the connection), which is where the real bugs hide. A migration that converts the tables but forgets the client charset just produces a new generation of corrupted data.

It surfaces the two technical landmines that derail these migrations. First, index key length: utf8mb4 uses up to four bytes per character, so an existing VARCHAR(255) index can blow past the prefix limit and the conversion fails or silently shortens. Second, collation consistency — mismatched collations on joined columns throw “Illegal mix of collations” errors that only appear at query time. By demanding both checks up front, the prompt prevents a half-finished migration that breaks production queries.

The guardrails enforce a reversible, online rollout. CONVERT TO CHARACTER SET rewrites every row and can truncate data, so the prompt insists on column checks, online schema-change tooling for large tables, replica testing, and moving the connection charset in lockstep with the schema. That sequencing keeps the database usable throughout and keeps a human in control of the one operation — converting live data — that is hardest to walk back.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week