Designing MySQL JSON Columns With AI

I’ve spent enough years cleaning up after “just throw it in a JSON column” decisions to be allergic to the phrase. JSON in MySQL 8.0 is genuinely good now, but it’s a precision tool, not a junk drawer. What changed my workflow recently is using an AI assistant to draft the schema and the queries, then verifying every single thing on a replica before it touches production. The AI is fast at boilerplate and remembers the syntax for JSON_TABLE better than I do at 2 a.m. I’m the one who decides whether the design is actually right.

This guide walks through how I make that call: when JSON beats normalized columns, how to query it without writing garbage, how to validate it, and how to index it so it isn’t a sequential-scan trap.

When JSON Actually Makes Sense

The honest heuristic: reach for a JSON column when the shape is genuinely variable per row, sparse, or you never filter or join on the inner fields. Think third-party API payloads you store verbatim, feature flags that differ per tenant, or audit metadata you only ever read back whole.

The moment you find yourself writing WHERE data->>'$.status' = 'active' across millions of rows on a hot path, that field wanted to be a real column. Normalize it. JSON does not give you free schema flexibility; it gives you deferred schema cost, and that bill always comes due.

Here’s a prompt I lean on to pressure-test a design before I commit to it:

You are a senior MySQL 8.0 DBA. I have an orders table where each order carries a variable set of fulfillment attributes (carrier, tracking number, customs flags, gift options). Some orders share fields, some don’t. I’m considering a single attributes JSON column. List the specific fields that should be promoted to normalized columns because they’ll be filtered, sorted, or joined on, and which are safe to leave in JSON. Be opinionated.

A useful response looks like this:

Promote to real columns: carrier (you’ll filter and group by it for SLA reports), tracking_number (looked up directly, needs an index), and shipped_at (range-scanned in dashboards). Keep in JSON: gift_message, customs_flags (sparse, read-whole), and packaging_options. Rationale: anything in a WHERE, JOIN, or ORDER BY on a hot path pays a generated-column-plus-index tax if it lives in JSON, so it’s cheaper to model it directly. Validate by checking which fields appear in your slowest queries first.

That last sentence is the part I actually act on. The AI proposes; I check it against the real query log.

Querying: `->>` and `JSON_TABLE`

The two operators worth memorizing are -> (returns a JSON value, keeps quotes) and ->> (unquoting extraction, shorthand for JSON_UNQUOTE(JSON_EXTRACT(...))). On a hot path you almost always want ->> so you’re comparing a plain string, not a quoted JSON scalar.

-- The arrow you usually want: unquoted extraction
SELECT id, attributes->>'$.carrier' AS carrier
FROM orders
WHERE attributes->>'$.carrier' = 'DHL'
LIMIT 20;

When the JSON holds an array of objects and you need it as rows, JSON_TABLE is the tool. It turns nested JSON into a derived relational table you can join against.

-- Explode an array of line items into rows
SELECT o.id, li.sku, li.qty
FROM orders o,
     JSON_TABLE(
       o.attributes,
       '$.line_items[*]'
       COLUMNS (
         sku VARCHAR(64) PATH '$.sku',
         qty INT          PATH '$.qty'
       )
     ) AS li
WHERE o.id = 42;

I always run JSON_TABLE queries on a replica first with a real EXPLAIN. They’re convenient but they materialize per row, and the optimizer can’t reach inside them. Convenience is not free.

Validating With `JSON_SCHEMA_VALID` CHECK Constraints

MySQL 8.0.17+ supports CHECK constraints, and JSON_SCHEMA_VALID() lets you enforce real structure on a JSON column instead of trusting your application layer. This is how you keep a JSON column from silently rotting.

ALTER TABLE orders
ADD CONSTRAINT chk_attributes_shape CHECK (
  JSON_SCHEMA_VALID(
    '{
      "type": "object",
      "properties": {
        "carrier": { "type": "string" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "sku": { "type": "string" },
              "qty": { "type": "integer", "minimum": 1 }
            },
            "required": ["sku", "qty"]
          }
        }
      }
    }',
    attributes
  )
);

A word of caution from experience: adding a CHECK like this against an existing table will fail if any current row violates it, and the error message won’t tell you which row. Before applying, I run the validation as a read-only query on the replica to find offenders.

-- Find rows that would violate the constraint, on the replica
SELECT id
FROM orders
WHERE NOT JSON_SCHEMA_VALID(@schema, attributes)
LIMIT 100;

This is exactly the kind of thing AI is good for: ask it to generate the JSON Schema from a sample document, then you review the constraints. It’ll happily get minimum and required right; it occasionally hallucinates schema keywords MySQL doesn’t support, which is why you test it rather than trust it.

Indexing JSON Through Generated Columns

You cannot index a JSON column directly, and you cannot index a raw ->> expression in older minor versions reliably. The durable pattern is a generated column that extracts the value, then an index on that column.

ALTER TABLE orders
  ADD COLUMN carrier VARCHAR(32)
    GENERATED ALWAYS AS (attributes->>'$.carrier') VIRTUAL,
  ADD INDEX idx_orders_carrier (carrier);

VIRTUAL generated columns aren’t stored; they’re computed on read, and the index materializes the value. That’s usually what you want for JSON extractions because it avoids row-size bloat. Use STORED only when the expression is expensive and read far more than written.

Then confirm the optimizer actually uses it. Querying the original ->> expression often works because MySQL can match it to the generated column, but never assume.

EXPLAIN
SELECT id FROM orders WHERE attributes->>'$.carrier' = 'DHL';
-- Look for key: idx_orders_carrier, not type: ALL

If you see type: ALL, the index isn’t being used and you’re full-scanning. I’ve been burned by collation mismatches here, the generated column inheriting a different collation than the literal, so always read the plan, don’t infer it.

The Pitfalls Worth Naming

A few traps I check for every time, and ask the AI to check too:

Unindexed filters. Any WHERE on a JSON path without a backing generated column is a scan. The query works in dev and dies at scale.
-> vs ->> confusion. Comparing -> output to a string fails silently because of quoting. Use ->> for scalars.
Whole-document rewrites. Updating one key rewrites the entire JSON document and its indexes. Hot, large documents updated frequently are a write-amplification problem.
Collation surprises. JSON string extractions default to utf8mb4; mismatches break index usage and case sensitivity.
Schema drift. Without a CHECK constraint, the column will accumulate three incompatible shapes within a year. Validate.

My loop is the same every time. I describe the access pattern to the AI, let it draft the generated column, the index, and the schema constraint, then I clone a slice of production onto a replica, run EXPLAIN, run the validation query, and check write cost. The AI gets me to a testable draft in minutes. The replica tells me the truth.

If you want more reading in this vein, I keep adding to the MySQL guides, and the prompt library has the schema-review and EXPLAIN-interpretation prompts I reuse for this exact workflow. The companion piece on generated columns and functional indexes goes deeper on the indexing half of this story.

Treat AI as a fast, forgetful junior who knows the manual. Let it draft. Then verify on a replica, because production is not the place to discover your JSON column was a full-table scan in a trench coat.