AI Strategy Frameworks: A Practitioner Comparison — Frameworks illustration
Frameworks

AI Strategy Frameworks: A Practitioner Comparison

Six widely-cited AI strategy frameworks scored against eight criteria — McKinsey, Microsoft CAF, Gartner, Databricks, IBM watsonx, and the operator-built four-question diagnostic. Honest about what each gets wrong.

The first time I scored an AI strategy framework against an actual engagement, the framework lost. The client had paid for a three-tier capability model — foundational, differentiating, transformative — from a Big-4 firm in early 2024. Eighteen months later, two of the “differentiating” capabilities had been productised by hyperscalers and were sitting in the foundational tier of the same firm’s 2025 refresh. The client had spent €1.6M building what was now a commodity. The framework had not been wrong, exactly. It had been static, and the technology had not.

That is the problem with framework selection. The widely-cited frameworks — McKinsey’s three-horizon model, Microsoft’s Cloud Adoption Framework for AI, Gartner’s AI Maturity stages, Databricks’s Lakehouse-shaped strategy, IBM’s watsonx playbook — were written to be sold to a category, not to age well in a specific organisation. The right move is not to pick one and execute it. The right move is to read four or five of them, pick the vocabulary your board already speaks, and overlay the four-question diagnostic from the root hub on top of the labels.

This page is the long version of that argument. It also previews the four sub-pages: enterprise frameworks, the development process, effective vs. theatrical frameworks, and the considerations missing from most templates.

What a framework is for, and what it is not

A framework is a vocabulary plus a sequence. The vocabulary lets twenty people in three departments use the same words for the same things — “foundational capability”, “leader posture”, “operating budget tier”. The sequence tells them which decisions block which other decisions — you cannot pick a vendor before you have decided posture; you cannot decide posture before you know the failure tolerance.

Frameworks fail when the vocabulary becomes the artefact. I have read strategy documents that spent forty pages establishing the terms and four pages applying them. The mechanism is recognisable: the consultant who wrote the vocabulary is also being paid to teach it. The teaching becomes the deliverable. The application becomes the client’s homework.

The opposite failure mode is treating the framework as the strategy. A McKinsey three-horizon plot is not an AI strategy. It is a way of grouping the AI strategy you have already decided into a shape a board can read. Neither the McKinsey deck nor the Gartner stage chart contains the decisions that matter. The decisions are upstream of both.

The eight criteria

The frameworks I will compare here were selected for one reason: each one is currently being executed inside at least one client I have worked with in 2024–2026. I score them on eight criteria, weighted toward the questions a working CTO or CIO actually has to defend.

One. Does it name a posture. Leader, follower, absentee. Frameworks that fail this criterion produce strategies that read as if every organisation should be a leader. The implicit posture is “leader,” because that is what the vendor selling the framework needs the client to be.

Two. Does it name a cost ceiling. Discretionary, operating, transformative. Most published frameworks treat budget as an output of the strategy. It is an input. The single most predictive variable in whether an AI programme ships is whether the cost ceiling was named before the capability list.

Three. Does it survive a 50% budget cut. A useful framework tells you which capabilities die first. A useless one assumes the budget will be defended. In 2025 I sat through three board meetings where the AI budget was cut by between 30% and 70% mid-cycle; the frameworks that survived were the ones with explicit priority tiers.

Four. Does it name failure modes. A framework that does not contain a failure-mode appendix is not a working document. It is a brochure. The NIST AI Risk Management Framework — which is not a strategy framework, but is the closest thing to a public-domain working document — is shaped almost entirely around failure modes. That is not an accident.

Five. Is the vendor independent. Microsoft CAF is excellent if you are on Azure; it is a sales document if you are not. Databricks’s framework is excellent if your data already lives in a lakehouse; it is a migration plan if it does not. Independence is not a deal-breaker — vendor frameworks are often the best engineered — but it has to be priced in.

Six. Is it current. Frameworks published before mid-2024 do not survive the agentic-orchestration reality of 2026. The capability tiers have shifted; the cost curves have shifted; the regulatory landscape has shifted (the EU AI Act’s August 2026 obligations bite specifically on high-risk systems, not on the whole stack — but they still rule out frameworks that pre-date them). A framework’s date stamp matters more than its prestige.

Seven. Can it be falsified. This is the test most fail. A framework you cannot falsify — that has no failure conditions, no diagnostic that distinguishes “working” from “broken” — is not a framework. It is a worldview. McKinsey’s State-of-AI worldview is interesting and well-researched, but it is not an engineering tool because it cannot be wrong.

Eight. Is it free. Not in the trivial sense — most of these have public-domain summaries — but in the sense that you can hand the framework to a procurement team without first paying for an engagement. Gated content is a meaningful tax on adoption, and on the post-engagement second opinion that good strategy work needs.

How the frameworks score

I will not reproduce the full scoring sheet on this page; it lives in the comparison piece at /framework/effective/, with the Google Sheet behind it published CC-BY-4.0. The headline:

McKinsey three-horizon. Strong on posture vocabulary, weak on cost ceiling, fails the falsifiability test. Best used as a board-communication shape, not an engineering tool. Score: 4 out of 8.

Microsoft Cloud Adoption Framework for AI. Strong on sequencing, strong on currency (updated 2026-04), vendor-dependent. Genuinely useful if you are already on Azure; mildly misleading if you are not. Score: 5 out of 8.

Gartner AI Maturity Model. Strong on labels, weak on stage-transition realism. The Gartner stages assume a budget-naive progression that does not match how money actually moves in enterprises. Best used as a board memo language, not a planning tool. The full read is at /maturity/gartner/. Score: 4 out of 8.

Databricks Lakehouse-shaped strategy. Strong on data architecture, weak on everything else. If your AI strategy is fundamentally a data strategy — and for many regulated industries it still is — this is the most coherent published framework. Score: 4 out of 8, but with a strong fit for a specific archetype.

IBM watsonx playbook. Strong on governance, weak on currency, vendor-dependent. The watsonx framework was the first to treat governance as a first-class input rather than an appendix, and that contribution holds up. Everything downstream of it is a sales motion. Score: 4 out of 8.

The operator-built four-question diagnostic. Strong on posture, cost, falsifiability; weak on stage vocabulary. Designed to overlay on top of one of the above, not replace it. Score: 6 out of 8, with the caveat that I built it, so the scoring is not blind.

None of them score above 6. That is the point. Framework selection is not the bottleneck; the bottleneck is whether the people executing have agreed on the answers to the four questions before they pick the vocabulary.

The four-question diagnostic, expanded

The diagnostic from the root hub is four questions. The expansion below is what each question looks like when it leaves the strategy document and meets a finance committee.

Posture does not survive contact with budgets unless it is written as a constraint on the capability list, not as a chapter heading. A leader-posture strategy that proposes seventeen capabilities is not a leader-posture strategy. It is a follower-posture strategy with a leader-posture cover page. The test: cut your capability list to three things and explain why those three. If you cannot, the posture is wrong.

Cost ceiling is the question every framework treats as someone else’s problem. The honest move is to write the ceiling into the first page of the document, in euros or dollars, with a year, and let every subsequent section be falsified against it. A capability whose three-year operating cost exceeds the ceiling dies in section two, not section nine. This is the section that consulting frameworks describe in the abstract because writing real numbers makes the engagement scope harder to expand.

Timeline pressure is where most strategies overstate. A three-year strategy in a market with six-month decision windows is a follower-posture strategy whether or not it says so. The honest tell: read the strategy and ask which decisions are time-locked by external factors (regulatory deadlines, competitor moves, model-cost curves) and which are time-locked only by internal scheduling. The internal-only decisions are usually the ones that slip; the external ones are the ones that bite.

Failure tolerance is the question every governance section answers without naming. The mistake I see most is uniform tolerance — a single governance posture applied across high-tolerance internal tools and zero-tolerance customer-facing systems. The result is either over-governed internal experiments that die in compliance review, or under-governed customer systems that ship without the controls they need. Both are symmetric versions of the same mistake.

Answer those four for each capability on your strategy’s capability list, and you have a working document. Skip them, and you have a brochure with a Gartner stage chart on page seven.

What the sub-pages cover

/framework/enterprise/ — Enterprise AI strategy. The same four questions, but with the enterprise-specific complications: multi-business-unit governance, shared-services budget allocation, the CIO-CTO-CAIO boundary, and the procurement contracts that fail when AI usage scales. The single most-read piece in this cluster.

/framework/development/ — Developing an AI strategy from scratch. The six-week-to-sixteen-week template, with the weekly checkpoints, the stakeholder map, and the three documents that have to exist before week four. Written for the CTO or CIO who has been handed the file and a calendar deadline.

/framework/effective/ — Effective vs. theatrical frameworks. The full scoring sheet. Eight criteria, six frameworks, weighted scoring. CC-BY-4.0 so you can fork it and publish your own.

/framework/considerations/ — The considerations missing from most templates. Model-cost volatility, vendor lock-in calculations, the GDPR and EU AI Act interaction, the “what happens when the team that built it leaves” question. The appendix material that turns out to matter more than the body.

What I would do on Monday morning

If you are starting from a blank page, do not pick a framework first. Spend the first day writing answers to the four questions, one paragraph each, no template. If you cannot, you do not have a strategy problem; you have a decision problem. Solve the decision problem before the strategy problem.

If you have answers but no document, pick the vocabulary your board already speaks. Microsoft CAF if you are on Azure. Gartner labels if your reporting line runs through analyst subscriptions. McKinsey horizons if that is the consulting accent your CEO is fluent in. The framework choice matters less than consistency in the document.

If you have a document but it is not surviving execution, the failure is almost always at the posture-cost mismatch. Re-read your own document and find the place where the leader-posture aspiration meets the follower-posture budget. That paragraph is where the next mid-programme decision will break the programme. Rewrite it before someone else does, in a meeting, with the wrong incentive structure.

The sequencing of those moves is most of the work. The framework is the second-order question. The first-order question is whether the four answers exist and whether the people executing have read them.

The one stress test I would run on any framework, including my own, is the 50% cut question. Take the most expensive line item in your current AI plan and ask: if the board cut this budget in half tomorrow, which specific capabilities survive, in what order, on what evidence. A framework that cannot answer that question in writing is not a strategy framework. It is a wish list with a Gantt chart attached. The frameworks that score highest on this page survive the 50% cut conversation; the ones that score lowest are the ones the cut conversation exposes.


Sources & methodology

If you find a scoring decision you would make differently, send the alternative and I will link a fork from the next refresh.

Across the guide

Frequently asked questions

Which AI strategy framework should we actually use?
None of the named ones, unmodified. Pick the framework whose vocabulary your board already speaks — Gartner if your reporting line runs through analyst subscriptions, Microsoft CAF if you are already on Azure, McKinsey if your CEO's last consulting engagement was with McKinsey — and then overlay the four-question diagnostic on top of it. The framework gives you the labels; the diagnostic gives you the answers.
Is the McKinsey AI strategy framework worth paying for?
The framework itself is in the public domain via the State of AI reports. What you pay McKinsey for is the engagement that produces a tailored version, and whether that is worth €1–4M depends on whether your bottleneck is analysis or accountability. If it is analysis, hire a fractional CAIO or CTO at a tenth of the cost. If it is accountability — getting a CEO to commit to cuts — McKinsey is occasionally the cheapest way to buy a forcing function.
How long does it take to develop an AI strategy?
Six weeks if someone has run a programme before, twelve to sixteen weeks if no one has. The schedule that runs longer than that is usually a stakeholder-management problem disguised as an analysis problem. The honest test: if you cannot draft the four-question diagnostic in a single working day, the bottleneck is not framework choice — it is that the organisation has not yet decided what it actually wants from AI.
What is the most common mistake in AI strategy development?
Confusing posture with ambition. Most strategies I have audited claim leader posture on the cover and follower-posture budget in the appendix. The mismatch never survives the first mid-programme decision. Decide posture before vocabulary; the rest of the document either follows or it does not.