Enterprise AI Roadmap Development: Phasing the Work When the Budget is Real
The roadmap that nearly cost me a client was the one I wrote with too much architectural conviction. Eighteen months at an EU industrial firm, a serious AI thesis on operations efficiency, a CFO who had given us a real operating budget rather than the usual discretionary scrap. Month two of the engagement, I produced a roadmap that started with the data platform, scheduled MLOps for months four through eight, governance for months six through ten, and the first shipping application for month twelve. Every dependency was correct. The architecture was clean. The only thing wrong with it was that the COO had no answer to “what will the back office see in the first six months.” His honest answer, when I asked, was “nothing visible, but the foundations will be sound.” He said this in the tone of someone who had heard it before and knew how it ended.
I cut the roadmap that week. We brought one application forward — a contact-centre summarisation tool that had been scheduled for month fifteen — and built the platform underneath it as we went. The contact centre saw something usable at month four. The platform finished at month thirteen, more or less when it would have anyway, with one important difference: the platform had been shaped by a real application’s requirements rather than an imagined one’s. The engagement renewed for a second year. The COO is now my single largest reference.
That is what enterprise AI roadmap development is for. It is not the dependency-ordered work plan that the engineering instinct produces. It is the sequencing decision that determines which audience the work is designed for at each step. Get that right, and the dependencies sort themselves. Get it wrong, and the most architecturally beautiful roadmap in the world produces a platform with no users.
The parent hub claim, made executable
The roadmap hub argues that you should phase by use case, not by workstream. This page is the execution-level version of that claim — the three patterns that put the rule into practice, the two patterns that violate it and fail, the six-month checkpoint template that keeps the work honest, and the assumption-kill criterion design that makes mid-flight pivots survivable.
The strategy-level question — what to do — gets answered in the AI strategy document, in the four-question diagnostic at the root hub. The roadmap-level question — when — splits into two layers. The first layer is the eighteen-month outline: what gets sequenced ahead of what, at the workstream and capability level. The second layer is the six-month detail: which use case is funding the work, which capability ships in the window, which assumption is load-bearing, and which kill criterion is active. This page is about the second layer. The first layer is the budget-and-posture conversation that the parent hub handles.
The three patterns that work
I have run, audited, or advised on roughly twenty enterprise AI roadmaps in the operating-budget band — meaning the work was funded out of year-on-year cost base rather than discretionary innovation budget. Three sequencing patterns recur in the ones that delivered. They are not mutually exclusive; some programmes use one for the first year and another for the second. They share the property that something real ships to a real user before month six.
The lead-use-case pattern. Pick one application that will fund the first year of the programme. Build the platform underneath it as the application requires, not as the architecture would imagine. The application has a real customer with real performance requirements and real cost constraints, so the platform that emerges underneath it is the right shape — even if it is not the architecturally pure shape. Use case two extends the platform in a direction the lead use case did not require. Use case three is where genuine reuse begins, and where you start refactoring for the platform shape rather than the application shape.
The lead-use-case pattern is the right answer for most mid-cap enterprises (€500M to €3B revenue) starting a serious AI programme in 2026. The reason is structural: at that scale, the organisation does not have the engineering depth to staff multiple concurrent use cases at production quality, and the operating-budget defence in year two is easier when there is one named application with measurable returns than when there are three under-resourced ones. The failure mode of this pattern is the temptation to keep building the platform after the lead use case has shipped, rather than extending into use case two. The platform team that succeeds at building the lead-use-case platform usually wants to spend year two perfecting the platform; the right move is to release them onto use case two and accept the architectural roughness for another year.
The platform-extraction pattern. Start with two or three concurrent use cases sharing the same engineering team. Do not build the platform first; let the use cases use whatever ad-hoc infrastructure they need. After six months of running them, extract the platform from what they have in common — the shared model-serving layer, the shared evaluation harness, the shared cost-monitoring tooling. The platform is then a refactoring, not a greenfield build.
This pattern is harder to staff but produces better platforms when it works, because the platform has three real applications pulling on it rather than one or zero. It is the right answer for larger enterprises (€3B+ revenue) with the engineering depth to staff parallel use cases, and for AI-native teams inside larger firms where the operating-budget defence is less precarious. The failure mode is the extraction never happening — three use cases ship, the team stays in delivery mode for years, and the platform debt compounds. I mandate a named “platform extraction” milestone at month six or nine, with an executive owner who is measured on enforcement rather than on oversight.
The federated-team pattern. For genuinely large enterprises (€10B+ revenue, multiple distinct business units), multiple small AI teams under existing functional executives — the COO has one, the CTO has one, the chief commercial officer has one — with a thin coordinating function above them. The coordinating function is usually a CAIO seat (see the CAIO playbook), with budget authority over cross-cutting capability and governance, but not over delivery. Each federated team runs its own roadmap, with the coordinating function ensuring the roadmaps are compatible on vendor selection, model-platform choice, and regulatory posture.
The federated pattern is the right answer when the organisation is too large for a single AI programme to make sense and too distributed for centralisation to work. The failure mode is the coordinating function collapsing into an advisory CoE — see the second failing pattern below — or fragmenting into incompatible per-team architectures that produce real integration costs in year three. The discipline required is a written, dated architectural agreement at the coordinating-function level, refreshed every six months, with budget consequences for teams that diverge from it.
The two patterns that fail predictably
The two patterns I have watched fail with monotonous regularity are common enough that they need explicit naming. Both fail for the same structural reason: they prioritise the architecturally pure shape over the shape that has a real user pulling on it.
Foundation-first sequencing. Build the data platform, then MLOps, then governance, then the applications. The argument sounds correct — the foundations have to come before the building — and is wrong for AI work today in a way it was not wrong for previous platform transitions. The reason is that AI application requirements are inherently more volatile than the data-platform and MLOps requirements of the 2018 era; they rarely survive contact with real users for more than a quarter. The application that was imagined at month two is not the application that gets built at month twelve; the model landscape has shifted, the cost economics have changed, the use case has been redefined under contact with users. The platform built for the month-two imagined application is the wrong shape for the month-twelve real one, and either gets retrofitted at considerable cost or quietly abandoned.
The vendor incentive is what makes foundation-first so persistent. Data-platform vendors, MLOps vendors, and governance-tooling vendors all benefit commercially from foundation-first sequencing, because their products get budgeted in months one through nine of an eighteen-month plan rather than getting tacked on after a shipping application has revealed which features actually matter. The pitch is well-rehearsed and the boards approving the work usually do not have the operating experience to see through it. The honest move is the parent hub’s: phase by use case, build the platform underneath it. The dishonest one is to accept the foundation-first roadmap because it is what the platform vendors have written down.
One-CoE-to-rule-them-all. A single AI Center of Excellence is positioned as the source of all AI capability across the enterprise — strategy, governance, tooling, model selection, delivery support, training. The CoE is staffed with credible AI engineers and led by an experienced director. It is given an advisory mandate without budget authority over the workstreams it advises. By month nine, it is either being ignored by the functional executives who are running the actual work, or it has become a bottleneck on every AI decision in the firm, slowing the work it was supposed to accelerate.
The CoE pattern fails because advisory authority without budget authority cannot survive a real budget cycle. The functional executive who has been told to deliver an AI capability under operating-cost pressure will route around an advisory CoE that adds review cycles without delivery resources. The CoE that tries to assert authority without budget becomes a politically costly overlay, and the next planning cycle reduces its headcount. By month eighteen it is gone or it is a documentation function. The structural fix is either to give the CoE genuine budget authority — at which point it is no longer a CoE, it is a delivery function with a misleading name — or to use the federated-team pattern, in which the coordinating function has limited but real authority over specific cross-cutting decisions. The CoE-without-authority pattern survives only inside firms that are not doing serious AI work; the moment the work is serious, the structural inadequacy surfaces.
The six-month checkpoint, written down
The six-month checkpoint is the artefact that turns the roadmap from a wish list into a working document. The checkpoint is not a review meeting. It is a one-page document, dated, signed by the named owner, circulated to the executive sponsor and the CFO. The format that has held up across the engagements I have run:
Item one: the load-bearing assumption for the next six-month chunk. One sentence. Specific enough to be falsified by data. Not “we believe AI will improve operations efficiency” — that is not falsifiable in any meaningful window. Specific: “The agentic flow we are deploying in the contact centre will reduce average handle time on tier-one tickets by at least 18% within twelve weeks of full rollout.” That can be tested. That can be falsified. That earns its place in the checkpoint.
Item two: the cost trajectory of the work that has shipped. Unit cost per query, per resolved ticket, per generated artefact — whichever unit is the right one for the lead use case. Plotted against the year-one operating-budget target. Most programmes I audit do not measure this until month nine, by which point the unit economics have drifted and the budget conversation in month twelve is unwinnable. Measuring from month one and reviewing every six months is the discipline that makes the operating-budget transition survivable.
Item three: the budget-base classification. Discretionary, transitioning, or operating, with the named executive who owns the transition. The parent hub argues that the discretionary-to-operating transition is the single most predictable failure point in enterprise AI work. The checkpoint forces the question to have a current answer rather than a deferred one. If month six says “still discretionary, no named owner,” that is the issue to escalate, not the platform architecture.
Item four: the kill criterion for each in-flight workstream. For every workstream currently running, the data that would tell us to stop. Stated as a threshold, not as a feeling. “If contact-centre handle-time reduction is under 8% by month nine, we kill the workstream and reallocate the team.” The threshold is not the prediction; the prediction is item one. The threshold is the floor below which the workstream stops being worth its operating cost. Kill criteria written into the checkpoint at month zero are the operational discipline that makes stopping a workstream look like discipline rather than failure.
Two optional items round out the checkpoint when they are relevant. A vendor-concentration review — what share of the AI platform spend is going to a single vendor, and what is the substitution cost if that vendor’s economics change. And a regulatory-calendar update — the EU AI Act August 2026 deadlines, any sectoral regulator activity, the status of internal compliance work against the calendar.
The assumption-kill criterion, expanded
The parent hub introduced the assumption-kill criterion as the design choice that makes mid-programme pivots cheap. The execution-level version is sharper, and the examples are worth naming.
The kill criterion is the threshold at which the load-bearing assumption is judged to have failed. It has three properties. First, it is a number, not a narrative — “below 8% handle-time reduction,” not “if the impact is disappointing.” Numbers can be argued about with data; narratives can be argued about with seniority, and seniority wins arguments narratives can be made about. Second, the data source is named, not implied — “the contact-centre operations dashboard, weekly average of the last four weeks,” not “performance data.” When the kill conversation happens, the dashboard is open and the threshold is visible. There is nothing to relitigate. Third, the decision-maker is named, not assumed — “the COO, with thirty-six hours’ notice to the CEO,” not “leadership.” When the data crosses the threshold, the named decision-maker is the one to call the meeting. Nobody else can.
The kill criterion is not a prediction of failure. It is a pre-committed response to data, designed to be invoked before the political cost of stopping the work has compounded past the point where stopping is possible. The programmes that run kill criteria as a routine part of the checkpoint cycle are the programmes that pivot cheaply. The programmes that treat each potential stop as a unique political event are the programmes whose pivots get expensive — both in sunk cost and in team trust, which is the more lasting damage.
The honest version of this design — and the version that fails when it is not honest — requires the executive sponsor to actually invoke the criterion when the data crosses the threshold. I have watched executive sponsors look at a dashboard showing the threshold being missed by 40% and convince themselves that the next month’s data will recover. It does not. The threshold was the point at which the recovery story stopped being credible; the executive sponsor who waits past the threshold has chosen to keep the work alive for political reasons rather than evidentiary ones, and the cost of the eventual stop is now larger by the months they delayed.
What the eighteen-month outline looks like
The eighteen-month version of an enterprise AI roadmap is intentionally lighter than the six-month detail. Three things go in it: the lead use case (or use cases, if running the platform-extraction pattern), the named capabilities the platform will extend to by month eighteen, and the budget-base trajectory — when the move from discretionary to operating is scheduled, and which executive owns the case. Nothing else. The temptation to write the eighteen-month version with the same detail as the six-month version is the temptation to commit to specifics that cannot survive contact with a shifting capability landscape. The six-month checkpoint is the place to re-cut the eighteen-month outline; the outline itself should stay light enough to be re-cut.
The board presentation that survives the next planning cycle is the one that distinguishes the strategy (two- to three-year horizon, decisions, durable), the eighteen-month roadmap outline (durable-ish, sequencing, the budget-base trajectory), and the six-month detail (specific, dated, with named owners). The presentations that conflate the three are the presentations that produce the failure modes the parent hub describes — the roadmap as a contract between the AI team and the board, in which month-six commitments become political commitments that survive the month-three falsifying data.
Where to go next
If you have a strategy document and are starting roadmap work, the lead-use-case pattern is the right default until you have specific evidence that one of the other two patterns is a better fit. The specific evidence is rare; the temptation to skip to platform-extraction or federated is common.
If you are mid-programme and the month-nine budget transition is approaching, the cost-trajectory work needs to start now, not at month nine. The transformation-strategy piece covers the strategic version of this question; the operational version is the checkpoint discipline above.
If you are pre-strategy and somebody has handed you a brief titled “enterprise AI roadmap development” without an underlying strategy document, write the strategy first using the four-question diagnostic at the root hub, then come back. A roadmap without a strategy is a Gantt chart pretending to be a plan, and the gap will surface in the first board review.
The most leverage in this work is in the small artefacts — the six-month checkpoint, the assumption-kill criterion, the budget-base classification. They are unglamorous, they do not photograph well in a board presentation, and they are the difference between a programme that survives the eighteen-month mark and one that does not.
Sources & methodology
- Microsoft Cloud Adoption Framework — AI workloads — sequencing-pattern reference
- Google Cloud AI Adoption Framework — platform-extraction patterns
- NIST AI Risk Management Framework, v1.0 — kill-criterion and governance-checkpoint underpinnings
- Conway, M. E. (1968), “How do committees invent?” — federated-team architecture
- Brooks, F. (1975), “The Mythical Man-Month” — platform-extraction staffing arithmetic
- Methodology: pattern observations and checkpoint template drawn from fractional CTO and CIO engagements (2023–2026), ~20 operating-budget AI programmes, anonymised by sector and headcount. Checkpoint template is CC-BY-4.0.
If a claim looks wrong, send it and I will publish the correction with attribution.
