Effective AI Strategy: A Falsifiable Definition, and the Eight-Criterion Scoring Sheet
The board meeting that taught me the difference happened in March of last year. A division of a German industrial group I had been advising at arm’s length presented its AI strategy refresh. Forty-two pages, six capability streams, a maturity model, a roadmap, a governance overlay, a benefits-realisation framework, the full kit. The presenting CIO closed by claiming the previous strategy — the one being refreshed — had been effective. The CFO, who I respect, asked a clean question. Which workstream did the previous strategy kill. There was a pause. The CIO said the strategy had not killed any workstreams; everything was still in progress. The CFO asked which workstream’s budget had been redirected. Another pause. None had been redirected. The CFO closed the conversation with a sentence I have been quoting since. “Then the strategy did not do any work. It just described what we were already doing.”
That is the operational definition of effective. A strategy is effective when it produces a decision the organisation would not have made without it. Not a decision the organisation eventually made anyway. Not a decision the strategy described after the fact. A decision that the document forced into existence and that has the document’s fingerprints on it — a cut, a re-prioritisation, a posture clarification, a budget reallocation, a workstream kill. If the strategy cannot point at one of these in its first twelve months, it was not a strategy. It was a description of work in progress dressed in strategy clothes.
The rest of this page is the long version of that test, with the eight-criterion scoring sheet that operationalises it. The scoring sheet is published CC-BY-4.0 as a Google Sheet linked at the end of this page; fork it, change the weights, score your own strategy against it, and if you disagree with the weights I will link a fork from the next refresh.
Effective as a falsifiable claim
The word “effective” gets used aspirationally in strategy documents — we will develop an effective AI strategy — and that usage is parasitic on a stronger sense of the word that the operator has to defend. An aspirational claim is unfalsifiable; an operational claim is falsifiable. An effective strategy makes claims that can be wrong. A theatrical strategy makes claims that cannot be.
This is Goodhart’s law applied in reverse. Goodhart said that when a measure becomes a target, it stops being a good measure. The reverse is also true: when a measure becomes a target that does not commit to any specific outcome, it stops being a measure at all. “We will adopt AI capabilities that strengthen our competitive position” is a target without a measure. “We will reduce average customer-service handle time by 20% on the top three inquiry categories by Q4 2026, and we will discontinue the programme if the reduction is below 12% by end of Q2” is a target with a measure. The first cannot be falsified. The second can be. The first is theatrical regardless of how many pages it sits behind; the second is effective regardless of how short the document is.
I hammer this point because roughly nine out of ten strategy documents I am asked to review fail it on page one. The executive summary contains zero falsifiable claims. The body contains a handful. The appendix — where the consulting hours run lower and the operator sometimes gets to write directly — contains most of the falsifiable material in the document. The geometry tells you what the document is actually for. A document whose falsifiable content lives only in the appendix is a brochure with footnotes. A document whose executive summary is itself falsifiable is the rarer artefact, and it is the one that survives execution.
The contrast: theatrical strategies and what they look like
A theatrical strategy is not a bad-faith document. Most of them are written in good faith by competent people who have absorbed the genre conventions of strategy consulting and reproduced them. The conventions push toward generality, toward optionality, toward language that cannot fail because it does not commit. The result is a document that reads as substantial and is structurally vacuous. The author is not lying. The author is performing the shape they were trained to produce.
Three structural tells distinguish theatrical strategies from effective ones, and I look for all three when I am asked to pressure-test a document before it goes to the board.
Tell one: the absent cut list. An effective strategy contains a list of capabilities or workstreams that will be cut if the budget tightens by 30%, named explicitly, in order. A theatrical strategy contains a sentence like “the programme will be re-prioritised in line with available budget” and moves on. The first answers the question; the second deflects it. I have never seen a strategy without a written cut list survive a real mid-cycle budget cut. The cut conversation defaults to the most senior person’s prior beliefs, which are usually wrong, and the programme drifts in their direction.
Tell two: the success criteria that cannot be missed. An effective strategy attaches each capability to a specific outcome with a number, a date, and a discontinuation threshold. A theatrical strategy attaches each capability to a “value driver” or a “strategic objective” with no number, no date, and no discontinuation threshold. The theatrical version cannot fail because it never committed to anything specific enough to fail at. The effective version can fail, which is the point — failing is how the organisation learns the assumption was wrong.
Tell three: the absent posture-ambition reconciliation. An effective strategy reconciles the posture statement (leader, follower, absentee, from the frameworks hub) against the capability list and the budget. If the posture says leader and the budget says follower, the document names the gap and either changes the posture or changes the budget. A theatrical strategy lets the gap sit unreconciled, claims leader posture on the cover, and lists seventeen capabilities that the follower-posture budget cannot fund. The mismatch will surface in execution as a sequence of compromised commitments, and the strategy will be blamed for ambitions it did not authorise.
These three tells correlate. A document with one of them usually has all three. A document with none of them is almost always written by an operator who has been on the receiving end of a budget cut and structured the document to survive the next one. That is the credential I look for in the author byline before I read the body.
The eight-criterion scoring sheet
The scoring sheet is the working version of the falsifiability test, applied to both the strategy document and the framework that produced it. The eight criteria are the same ones used to score the six widely-cited frameworks in the parent hub, reproduced here with the operational weight I assign each one when I am scoring a specific strategy document rather than a framework in the abstract.
One. Posture named. Is leader, follower, or absentee posture stated explicitly on page one? Weight: 15%. The single highest-leverage criterion because it cascades into every subsequent decision.
Two. Cost ceiling named. Is the discretionary, operating, or transformative budget tier named with a euro or dollar figure and a year? Weight: 15%. The criterion most strategies fail by treating budget as an output rather than an input.
Three. Survives a 50% cut. Does the document name which capabilities are cut first, second, and third when the budget tightens by 50%, in writing, before the cut conversation begins? Weight: 15%. The criterion that distinguishes a strategy from a Gantt chart.
Four. Failure modes named. Does the document contain an appendix or section listing the failure modes for each capability, with the data that would trigger discontinuation? Weight: 10%. The criterion that distinguishes a working document from a brochure.
Five. Vendor independent. Is the framework that shaped the document independent of the vendors whose products will be procured under it? Weight: 10%. Not a deal-breaker but has to be priced in.
Six. Currency. Was the framework published or refreshed after mid-2024, with the agentic-orchestration reality and the EU AI Act August 2026 deadline reflected? Weight: 10%. Frameworks older than this are budget-naive in ways that bite mid-programme.
Seven. Falsifiability. Does the document make claims that can be wrong, with the conditions under which they would be wrong stated explicitly? Weight: 15%. The criterion the appendix usually passes and the executive summary usually fails.
Eight. Free at point of access. Can the framework be handed to a procurement team without first paying for an engagement to receive it? Weight: 10%. Gated frameworks are a tax on the post-engagement second opinion that good strategy work needs.
A strategy scoring above 70 on the weighted sheet is effective by this definition. A strategy scoring between 50 and 70 is partially effective, with named gaps. A strategy scoring below 50 is theatrical. A companion scoring sheet holds the detail behind each number: the rubric for every criterion and the worked examples from three anonymised engagements.
The scoring is meant to be done by the operator, not the strategy author. The author is structurally too close to the document to score it honestly. The most useful version of this exercise is to have the named owner of the strategy score it, and then have an external advisor score it independently, and compare. The deltas are the conversation worth having. They are also, in my experience, the conversation that produces the highest-value rework in the shortest time — about three days of operator effort, against two months of consulting time to write the document in the first place.
What an effective strategy gets you, and what it does not
An effective strategy does three things. It produces a decision the organisation would not have made without it. It survives a mid-cycle budget cut without political damage to the named owner. It dates well enough that the six-month and twelve-month reviews extend the document rather than replace it.
It does not, by itself, guarantee a successful programme. The execution is its own work — the roadmap cluster covers that — and the most effective strategy in the world can be undone by a wrong sequencing decision or a missed budget transition. What it does is eliminate one specific failure mode: the failure of the document itself to serve as an anchor when execution gets hard. With an effective strategy, the mid-programme decision conversations happen against a written reference. Without one, they happen against the most senior person’s recollection of what was decided at the offsite in October, which is rarely the same as what was actually decided.
The other thing an effective strategy does, less often acknowledged, is make the named owner’s job easier. The standing-to-cut question that the enterprise piece names — whether the owner has the political capital to make the cuts the strategy implies — is partially solved by the strategy itself, when it names the cuts in advance. The owner is not making a fresh decision in March; the owner is executing a decision the document already named in October. The political cost of an executed decision is lower than the political cost of a fresh one, and effective strategies use this geometry deliberately.
What I would do on Monday morning if I were testing an existing strategy
If you have an AI strategy document and want to know whether it is effective or theatrical without reading all forty pages, here is the test that takes about ninety minutes.
Open the document to the executive summary. Find the posture statement. Note whether it names leader, follower, or absentee posture explicitly. If it does not, the document fails criterion one. Find the cost ceiling. Note whether it names a number with a year. If it does not, criterion two fails. Find the cut list. Note whether the document names which capabilities are cut first under a 50% budget tightening. If it does not, criterion three fails. Find the failure modes. Note whether each capability has named discontinuation thresholds. If it does not, criterion four fails.
If three of the four fail, the document is theatrical. Do not try to rescue it section by section. Rewrite the executive summary against the four-question diagnostic from the root hub, then propagate the rewrite into the body and the appendix. The exercise takes about a week if the operator doing it has standing to make decisions, and three months if it has to be done by committee. The standing question matters more than the framework choice.
If two of the four fail, the document is salvageable. The most common shape of a partially-effective strategy is one where the posture and cost ceiling are named but the cut list and the failure modes are missing. The fix is to add a single appendix section — five pages, maximum — with the cut list and the failure-mode triggers stated. The body of the document does not have to change. The appendix is what the named owner will reference when the mid-cycle cut conversation happens.
If one of the four fails, the document is effective with one named gap. Close the gap. Date the document. Schedule the six-month review. Move on.
If none of the four fail, you are in the top decile of AI strategy documents I have read. Send it to me and I will link it as an example with permission. There are not many, and the operator community benefits from seeing the shape.
Sources & methodology
- NIST AI Risk Management Framework, v1.0 — the public-domain reference for the failure-modes criterion
- EU AI Act, Regulation (EU) 2024/1689 — the regulatory floor referenced by the currency criterion
- Microsoft Cloud Adoption Framework — AI workloads — the framework that ports cleanest onto enterprise structures and scores highest on currency
- Goodhart, C. (1975), “Problems of Monetary Management” — the original “when a measure becomes a target, it stops being a good measure” argument that underwrites the falsifiability criterion
- Methodology: the eight-criterion scoring sheet is published as a Google Sheet under CC-BY-4.0. Weights are the operator-built defaults from twelve fractional engagements 2023–2026; fork the sheet and change the weights for your context. Engagements informing the rubric are anonymised; year, sector, and approximate headcount are stated on each worked example.
If you score a strategy against the sheet and disagree with the weights, send the fork and I will link it from the next refresh. The interesting outcome of public scoring work is the disagreements, not the agreements.
