Generated Columns and Functional Indexes in MySQL With AI
A senior DBA walks through MySQL 8.0 generated columns and functional indexes with AI: VIRTUAL vs STORED, indexing JSON and case-insensitive search, EXPLAIN proof.
- #mysql
- #ai
- #indexing
- #performance
- #explain
I’ve lost count of the times a developer handed me a query that “should be fast” and the fix was a generated column or a functional index they didn’t know existed. MySQL 8.0 gives you two clean ways to index an expression instead of a raw column, and both are easy to get subtly wrong. These days I let an AI assistant draft the column definition or the index, because it remembers the exact syntax and the VIRTUAL/STORED keyword placement. Then I do the only part that matters: I run EXPLAIN on a replica and confirm the optimizer actually picked it up. AI writes plausible DDL. The plan tells me whether plausible is true.
This is the indexing companion to the JSON column guide; here the focus is the expressions and the proof.
VIRTUAL vs STORED Generated Columns
A generated column is computed from other columns in the same row. The choice between VIRTUAL and STORED is the first thing people fumble.
VIRTUAL columns aren’t stored on disk; they’re computed on read. They add nothing to row size and are cheap to add to a large table because the ALTER is fast (an instant metadata change in most cases). When you index a VIRTUAL column, the index materializes the value, so reads through the index are still fast.
STORED columns are written to disk at insert/update time. They cost space and write effort, and adding one to a big table rebuilds it. You want STORED only when the expression is genuinely expensive to compute and read far more often than written, or when you need it usable in a context that VIRTUAL can’t support.
My default is VIRTUAL. Reach for STORED deliberately, not reflexively.
-- VIRTUAL: cheap to add, computed on read, index materializes it
ALTER TABLE users
ADD COLUMN email_domain VARCHAR(255)
GENERATED ALWAYS AS (SUBSTRING_INDEX(email, '@', -1)) VIRTUAL,
ADD INDEX idx_users_email_domain (email_domain);
Here’s the prompt I use to make the call rather than guessing:
You are a senior MySQL 8.0 DBA. I want to filter a 40-million-row
userstable by email domain (SUBSTRING_INDEX(email,'@',-1)). The table is write-heavy. Should this be a VIRTUAL or STORED generated column, and why? Give me the ALTER statement and tell me what the trade-off costs at this scale.
A response I’d act on:
Use VIRTUAL. The expression is cheap, and on a write-heavy 40M-row table a STORED column would add per-row storage plus write cost on every insert and email update, and the ALTER would rebuild the whole table. VIRTUAL keeps writes untouched and the secondary index on it stays current automatically. ALTER:
ADD COLUMN email_domain VARCHAR(255) GENERATED ALWAYS AS (SUBSTRING_INDEX(email,'@',-1)) VIRTUAL, ADD INDEX idx_users_email_domain (email_domain). Trade-off: index maintenance on writes that change
That final instruction is mine to carry out, not the AI’s.
Functional Indexes: Skip the Column
Since 8.0.13 you can index an expression directly without a named generated column. Under the hood MySQL creates a hidden virtual column, but you write less DDL. The expression must be wrapped in parentheses.
-- Functional index directly on an expression
CREATE INDEX idx_users_lower_name
ON users ((LOWER(last_name)));
The catch that bites everyone: the query has to use the exact expression for the optimizer to match the index. WHERE LOWER(last_name) = 'smith' uses idx_users_lower_name; WHERE last_name = 'Smith' does not. This is precisely where I make the AI generate both the index and the matching query, then I verify the pairing holds.
Case-Insensitive Search Done Right
A common ask is case-insensitive lookups where the column collation is case-sensitive (or binary). A functional index on LOWER() is the clean answer.
-- Index the lowercased value
CREATE INDEX idx_orders_lower_coupon
ON orders ((LOWER(coupon_code)));
-- Query must match the indexed expression exactly
EXPLAIN
SELECT id FROM orders WHERE LOWER(coupon_code) = 'summer';
If your real need is broad case-insensitive comparison everywhere, sometimes the better answer is a case-insensitive collation like utf8mb4_0900_ai_ci on the column itself. I ask the AI to lay out both options; I pick based on whether the case-insensitivity is local to a few queries (functional index) or a property of the column (collation).
Indexing JSON Extractions
JSON columns can’t be indexed directly, so you index an extraction expression, exactly the bridge between this topic and JSON design. You can do it as a named generated column or as a functional index.
-- Functional index on a JSON path extraction
CREATE INDEX idx_orders_carrier
ON orders ((CAST(attributes->>'$.carrier' AS CHAR(32))));
I lean toward CAST(... AS CHAR(n)) here so the indexed type and length are explicit. Leaving the type implicit is how you end up with a collation or length mismatch that quietly disables the index. Again: explicit beats clever.
Proving It With EXPLAIN
This is the whole point. None of the above is real until the plan shows the index. Three things I check on the replica every time:
-- 1. Does it pick the index at all?
EXPLAIN
SELECT id FROM users WHERE email_domain = 'example.com';
-- Want: key = idx_users_email_domain, type = ref. Bad: type = ALL.
-- 2. For functional indexes, does the exact expression match?
EXPLAIN
SELECT id FROM orders WHERE LOWER(coupon_code) = 'summer';
-- 3. Read the full optimizer reasoning when in doubt
EXPLAIN FORMAT=JSON
SELECT id FROM orders WHERE attributes->>'$.carrier' = 'DHL';
type: ALL means a full scan and a missing index. possible_keys listing the index but key being NULL means the optimizer considered it and rejected it, usually a type or collation mismatch, or statistics that make a scan look cheaper. EXPLAIN FORMAT=JSON gives the detail to diagnose which.
A subtlety AI often forgets: the optimizer chooses based on cardinality estimates. After adding a generated column and index on a large table, run ANALYZE TABLE so the stats reflect reality before you trust the plan.
mysql --host=replica.internal -e "ANALYZE TABLE mydb.users;"
My Workflow, Concretely
The loop is the same on every one of these:
- Describe the slow query and access pattern to the AI.
- Let it draft the generated column or functional index, plus the matching query.
- Apply it on a replica seeded from production data.
- Run
ANALYZE TABLE, thenEXPLAINandEXPLAIN FORMAT=JSON. - Confirm
keynames my index andtypeisref/range, notALL. - Only then write the migration for production, during a maintenance window if it’s a
STOREDcolumn on a large table.
The AI’s value is speed to a correct-looking draft and never fumbling the GENERATED ALWAYS AS (...) VIRTUAL syntax. My value is refusing to ship anything EXPLAIN hasn’t confirmed. A generated column that the optimizer ignores is just a slower insert with extra steps.
More in the same series lives under the MySQL guides, and the EXPLAIN-reading and index-design prompts I reuse are in the prompt library. Draft with AI, verify on a replica, and let the plan, not the prose, decide.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.