How many AI governance platforms should we actually evaluate?

Three, inside one archetype, against a four-criterion scoring sheet, run as a four-week proof of concept on a real model from your inventory. Not thirty, not nine months, not slide-ware. The mistake most procurement teams make is confusing the archetypes for variations of the same product; they are not. A model-lifecycle platform and a policy-and-risk platform answer different procurement questions, and putting them in the same RFP guarantees the wrong winner.

Which archetype should we buy first?

Whichever one answers the procurement question your organisation actually has. If you have a large MLOps-native model estate and no inventory, buy a model-lifecycle platform first. If you have a mature GRC programme and AI is the new domain, extend it with a policy-and-risk platform. If your shadow AI lives in third-party LLM APIs called by application engineers, buy LLM observability. If you are 95% on one hyperscaler, the native suite is the cheapest workable answer. Buying the wrong archetype first is the failure mode I see most.

What is the four-criterion scoring sheet?

Model coverage (does it actually see your real inventory, including third-party LLM APIs), deployment-gate integration (can it produce the artefacts a CISO's gate procedure requires), evidence-trail quality (is the audit trail an auditor will accept without remediation), and three-year total cost (licence plus integration plus the FTE time the platform will demand). Each scored one to five, weighted to your context. The full sheet is published under CC-BY-4.0 and linked from the governance hub.

Does the EU AI Act require a governance platform?

No. The Act requires documented processes, technical documentation, post-market monitoring, and a risk management system. A spreadsheet plus signed-off documents can satisfy every Article. Platforms make the documentation cheaper to maintain at scale; they do not satisfy the obligation on their own. Buying a platform in July 2026 to prepare for August 2026 is panic procurement and will not produce a working programme. The Act rewards organisations that started in 2024, not organisations that bought software in 2026.

What is the procurement category most teams miss?

Third-party LLM API governance. In 2026 the largest single source of shadow AI in most enterprises is application engineers calling OpenAI, Anthropic, Cohere, or Mistral APIs directly from product code, often through a serverless function the security team has never inventoried. None of the model-lifecycle platforms handle this well — they are built around MLOps-native pipelines that this traffic bypasses entirely. The LLM observability vendors handle it best, and the hyperscaler suites are starting to catch up. If your shadow AI lives here, do not buy a model-lifecycle platform first.

AI Governance Tools: The 35-Platform Landscape, Scored Honestly

Q: Will the 35-platform market consolidate?

Heavily. At least twenty of the thirty-five platforms in 2026 will be acquired or pivoted out of the category by 2028. The signal is already in the funding pattern — LLM observability is consolidating into the security stack, model-lifecycle platforms are being absorbed by MLOps platforms, and the policy-and-risk overlays are being acquired by the broader GRC vendors. Plan for the platform you buy today to either become part of a larger suite or disappear. Contract terms should reflect that — annual or two-year, not five.

Tom Prommer · CIO/CTOUpdated 2026-05-2926 min read

Executive summary

The four archetypes of the AI governance tooling market in 2026 — model-lifecycle, policy and risk, LLM observability, hyperscaler-native — with a single-paragraph verdict on every named vendor and the four-week proof-of-concept methodology that beats the nine-month RFP.

The procurement meeting I want to describe took place in a Zurich conference room three months ago, on the twentieth floor of a building belonging to a European insurer I will not name. There were nine people at the table. Six were vendor representatives — two from a model-lifecycle platform, two from a policy-and-risk overlay, two from an LLM observability tool — and three were the buyer side: the CISO, the Head of Data Risk, and the procurement lead. The agenda said AI governance platform selection, final round. By the second coffee break it was clear that the three vendor pairs were not competing for the same procurement. The model-lifecycle people were pitching against an MLOps estate the insurer did not yet have. The policy-and-risk people were pitching against the existing OneTrust deployment, which the buyer had not told them about. The LLM observability people were the only ones describing the actual problem — application engineers calling third-party APIs without inventory — but they were the smallest line item and the procurement lead kept trying to map them onto the model-lifecycle vendor’s scoring sheet. The meeting ended without a decision. The buyer commissioned another round. I watched €180,000 of vendor and buyer time evaporate because nobody in the room had named the archetype before the demos started.

That is the structural failure of AI governance tooling procurement in 2026, and it is the reason this page exists. There are roughly thirty-five platforms positioning themselves as enterprise AI governance solutions. They are not interchangeable. They do not solve the same procurement question. They cannot be evaluated against a single scoring sheet without producing a decision that satisfies nobody. The market splits cleanly into four archetypes — model-lifecycle, policy and risk, LLM-specific observability, hyperscaler-native — and the highest-leverage move a procurement team can make is to pick the archetype before picking the vendor. Everything else flows from that.

This page is the long version of the tooling discussion in the governance hub. It walks each archetype in detail, names the standout vendors, gives a one-paragraph verdict on each, and ends with the four-week proof-of-concept methodology that produces a defensible decision in less time than a single round of an enterprise RFP. The four-criterion scoring sheet — model coverage, deployment-gate integration, evidence-trail quality, three-year total cost — is published under CC-BY-4.0 and linked at the bottom. Fork it, change the weights, publish a fork with different verdicts and I will link it from the next refresh.

The four archetypes, named and bounded

Before any vendor, the archetype. The four are distinguished by what procurement question they answer, not by what features they ship. Feature overlap across archetypes is significant and rising; the question they answer is the durable distinction.

Model-lifecycle platforms answer the question how do we govern a large MLOps-native model estate from ideation through deprecation. They assume a CI/CD-shaped world where models are trained, registered, versioned, and deployed through a pipeline the platform can hook into. The strongest ones produce model cards, evaluation logs, drift monitoring, and approval workflows that integrate with MLflow, Kubeflow, SageMaker, or Vertex. They are weakest where most enterprise AI actually lives in 2026, which is outside MLOps pipelines: in third-party LLM API calls, in vendor-embedded AI features, in copilots bolted onto SaaS products the platform does not see.

Policy and risk management platforms answer the question how does AI extend the GRC programme we already run. They assume an existing risk register, an existing policy library, an existing control framework, and treat AI as a new domain in that framework. The strongest ones produce policy mappings, control attestations, third-party assessments, and board-level reporting that auditors recognise from the broader GRC world. They are weakest on the technical evaluation evidence the deployment gate actually needs — they record that a control exists, not whether a model passed a robustness test.

LLM-specific observability platforms answer the question what are the models we have deployed actually doing in production, and how do we catch the failure modes before a customer does. They assume models in production, traffic flowing through them, and the need to evaluate outputs continuously. The strongest ones produce real-time monitoring, evaluation harnesses, drift detection, hallucination scoring, prompt-injection defences, and red-team automation. They are weakest on the policy and evidence-trail side that auditors want — they tell you the model is misbehaving today, not that the model was approved through a documented procedure in March.

Hyperscaler-native suites answer the question we are already 95% on one cloud, what is the cheapest workable answer that does not require a separate procurement. They assume the deployment plane is theirs and bundle governance into the broader platform. The strongest ones — AWS Bedrock Guardrails, Google Vertex AI governance, Azure AI Foundry safety stack — produce model registries, guardrails, evaluation tooling, and audit logs that integrate with the rest of the cloud account at marginal cost, typically under €100k per year of incremental spend for large estates already committed to the platform. They are weakest as standalone governance — they govern what runs on their cloud, and they govern it less well than the best-of-breed alternatives, but the integration economics are unbeatable inside a single-cloud estate.

The procurement signal that tells you which archetype to buy is not in the feature list. It is in the question your CISO and CAIO are arguing about. If the argument is we cannot see what our data scientists are deploying, you need model-lifecycle. If the argument is we cannot demonstrate to the auditor that AI is in our GRC programme, you need policy and risk. If the argument is we shipped a chatbot and we do not know if it is hallucinating, you need LLM observability. If the argument is we just need this to be done and we are already on AWS, you need hyperscaler-native. Mapping the argument to the archetype takes ten minutes and saves nine months.

Model-lifecycle platforms

This is the archetype most commonly mis-bought, because it is the one the analyst grids treat as the centre of the category. The vendors here have the most polished demos, the most articulate marketing, and the most senior reference customers. They are also the archetype most likely to fail to address the actual shadow AI problem, because their architecture assumes a model estate the buyer often does not have.

Credo AI. The best-known of the model-lifecycle platforms and the one most often shortlisted on name recognition alone. Strongest on the policy-to-evidence mapping — its responsible AI policy packs translate into evaluable criteria more cleanly than most competitors, and the model registry workflow is well-designed. Weakest claim is the breadth of model coverage: in three engagements I have seen, the platform performed well against MLflow-native models and badly against the actual inventory once third-party LLM APIs were counted. The procurement question Credo answers is how do we govern a known MLOps estate with policy traceability. If that is your question, it is a strong choice. If your shadow AI lives in vendor APIs, it is not.

Holistic AI. Strongest in the EU regulatory-mapping work — the platform’s EU AI Act conformity workflows are the most thorough I have evaluated, and the fundamental-rights impact assessment tooling is genuinely useful for high-risk Annex III use cases. Weakest on operational monitoring; the runtime evaluation story is thinner than the policy story. Holistic AI answers the question how do we produce defensible EU AI Act documentation for a small portfolio of high-risk systems. For an EU-operating enterprise with a small number of high-risk models, the fit is excellent. For a large estate with mostly low-risk items, the platform is heavier than the use case requires.

Fairly AI. Smaller, focused on bias and fairness evaluation as the centre of gravity rather than as a feature. Strongest where the regulatory or reputational driver is fairness-specific — credit decisioning, hiring, insurance pricing. Weakest as a general-purpose governance platform; the policy and evidence-trail surfaces are narrower than Credo’s or Holistic’s. Fairly answers the question how do we evidence fairness for a small number of high-stakes decision systems. A precise tool for a precise problem.

IBM watsonx.governance. The enterprise-IT-incumbent play. Strongest where the buyer is already an IBM shop with existing watsonx workloads — the integration story is genuinely deep, the audit trail is mature, and the platform inherits IBM’s regulatory credibility in finance and healthcare. Weakest as a greenfield choice; if you are not already on the IBM stack, the platform’s gravitational pull toward the rest of the watsonx suite is significant. IBM answers the question how do we govern AI inside an IBM-anchored enterprise architecture. If that describes you, it is the obvious answer. If it does not, the procurement will keep widening.

Microsoft Purview AI Hub. Microsoft’s late entry, riding on the Purview compliance estate the enterprise already pays for. Strongest where Microsoft 365 Copilot and Azure-hosted models are the primary AI surface — the integration with Purview’s existing data-classification and DLP is real, and the marginal cost is low because Purview is already deployed. Weakest as a multi-cloud or third-party API governance tool; Purview sees what Microsoft sees, which is a lot inside the Microsoft estate and very little outside it. Purview answers the question how do we extend our existing Microsoft compliance posture to cover AI. For Microsoft-anchored enterprises, the fit is strong and the procurement is short.

The procurement signal for model-lifecycle is the presence of a real MLOps pipeline. If you have MLflow, Kubeflow, SageMaker Pipelines, or Vertex Pipelines in actual use — not just installed — the archetype fits. If you do not, the platform will sit underused and the inventory it was supposed to govern will continue to live in spreadsheets. Three-year total cost for an enterprise deployment of one of these platforms runs between €380,000 and €1.4M depending on scale, integration complexity, and how much of the FTE work the buyer absorbs versus contracts to the vendor’s professional services arm. Budget the integration cost as twice the licence cost in year one; the platforms that price licence-heavy and integration-light are usually under-pricing the integration.

Policy and risk management platforms

This archetype is the easiest to underestimate, because the platforms here do not look like AI tools. They look like the GRC software the risk team already runs, with an AI module added. That is precisely what they are, and for a large class of enterprises it is the correct procurement answer.

OneTrust AI Governance. The dominant player by installed base. Strongest where the buyer already runs OneTrust for privacy, third-party risk, or broader GRC — the AI module slots into existing workflows, existing assessment templates, and existing reporting cadences with low marginal effort. Weakest on the technical evaluation surface; the platform records that an AI risk assessment was completed, not whether the model passed a robustness test. OneTrust answers the question how do we add AI to the GRC programme without rebuilding the GRC programme. For OneTrust-anchored enterprises, this is the lowest-friction answer in the market.

Diligent AI. The board-and-audit-committee-facing variant. Strongest where the procurement driver is board-level reporting and audit-committee visibility into AI risk — Diligent’s strength has always been the board portal, and the AI module extends that. Weakest on the operational gate; it is not where the engineering team will live day to day. Diligent answers the question how does the board see AI risk in the same dashboard as the rest of enterprise risk. A precise answer for a specific buyer.

ServiceNow AI Governance. The workflow-engine play. Strongest where ServiceNow already runs the enterprise’s IT and risk workflows — the AI module piggybacks on existing approval chains, existing ticketing, existing CMDB integration. Weakest where ServiceNow is not already entrenched, because the value is in the workflow integration rather than in the AI-specific tooling. ServiceNow answers the question how do we route AI approvals through the workflow platform our enterprise already uses. Strong fit for ServiceNow-anchored enterprises, weak for everyone else.

RSA Archer AI extensions. The traditional-GRC incumbent’s response. Strongest where the broader Archer GRC programme is mature and well-instrumented; the AI extensions inherit the existing control library and the existing audit trail. Weakest where Archer itself is showing its age — and in 2026 it is showing its age, with several enterprise customers I have talked to actively evaluating migrations off the platform. Archer answers the question how do we extend a mature legacy GRC programme to cover AI. A reasonable answer in the short term; a risky bet for a five-year platform commitment.

The procurement signal for policy and risk is the maturity of the existing GRC programme. If you have a real risk register, real control attestations, real audit cycles, and a real internal-audit function that uses them, the archetype fits. If your GRC programme is a SharePoint folder, no AI overlay on top of it will produce a working governance posture; you need to fix the underlying GRC programme first. Three-year total cost in this archetype is harder to isolate because the AI module rides on a broader licence; the marginal cost ranges from €60,000 to €450,000 a year depending on the parent platform’s pricing tier.

LLM-specific observability platforms

This is the archetype most relevant to the procurement category most teams miss. Third-party LLM API governance — the OpenAI, Anthropic, Cohere, Mistral, and Google traffic that application engineers are routing from product code — is where shadow AI actually lives in 2026, and none of the first two archetypes handle it well. The platforms in this archetype were built for it, and they are the ones the CISO should care about most if the inventory exposes large vendor-API exposure.

Arize AI. The most mature of the LLM observability platforms. Strongest on the evaluation-harness story — Arize’s evaluation tooling, dataset management, and drift monitoring are the closest the market has to a production-grade ML observability suite extended cleanly to LLMs. Weakest on the policy and evidence-trail surface; it tells you what the model is doing, not what the procedure says it should be doing. Arize answers the question how do we monitor and evaluate LLM behaviour in production at enterprise scale. The strongest standalone answer in the archetype.

Fiddler AI. Strongest on the model-monitoring-with-explainability angle — Fiddler’s explanation tooling is genuinely useful when the regulatory or customer-trust driver requires interpretable outputs. Weakest on the LLM-native evaluation tooling, which has improved but lags Arize. Fiddler answers the question how do we monitor models including LLMs while preserving an explainability story for regulators. A strong choice for regulated-industry use cases.

WhyLabs. Strongest on the data-drift and feature-monitoring side, where WhyLabs has built a real platform with reasonable pricing for mid-sized deployments. Weakest on the LLM-specific evaluation surface, which is improving but is not the centre of the platform’s gravity. WhyLabs answers the question how do we monitor data and model drift at a price point that does not require a six-figure budget. The pragmatic choice for mid-market deployments.

Lakera. The prompt-injection and adversarial-defence specialist. Strongest where the LLM use case is customer-facing and the attack surface is the centre of the procurement concern — Lakera Guard’s runtime defences are the most production-tested in the category. Weakest as a general-purpose governance platform; it is a specific control for a specific failure mode. Lakera answers the question how do we defend a customer-facing LLM against prompt injection and adversarial inputs. A precise tool, often bought alongside one of the broader platforms rather than instead of them.

Cisco Robust Intelligence (formerly Robust Intelligence, acquired by Cisco in 2024). Strongest where the broader Cisco security stack is already deployed and the AI security extension is sold into the existing relationship. Weakest as a standalone evaluation. The acquisition has slowed the platform’s product roadmap noticeably; if you are evaluating in 2026, ask hard questions about the Cisco-integration path and what the standalone roadmap looks like for the next twenty-four months. Robust Intelligence answers the question how does AI security fit into our existing Cisco security posture. A strong fit for Cisco-anchored enterprises, a more difficult call for everyone else.

Guardrails AI. The open-source-led entrant. Strongest where engineering teams want to instrument their own LLM applications with output validation, type-checking, and policy enforcement at the library level rather than buying a platform. Weakest as a governance posture for a non-technical buyer; the value is real but it lives in the engineering team’s code, not in a dashboard a CISO can review. Guardrails answers the question how do our application engineers enforce LLM output policies in code. A real answer for engineering-led organisations, less of one for compliance-led organisations.

The procurement signal for LLM observability is the presence of LLM traffic in production. If you have shipped, or are about to ship, a customer-facing LLM-powered feature, this archetype is mandatory and the procurement question is which vendor, not whether to buy. Three-year total cost ranges from €120,000 for a Guardrails-led engineering-managed deployment to €900,000+ for an enterprise Arize deployment with full evaluation tooling and red-team integration.

Hyperscaler-native suites

This archetype is the one most under-discussed in the analyst literature, because the analysts are paid by the standalone vendors, not by the hyperscalers whose governance tooling ships as a footnote in a broader cloud bill. For single-cloud enterprises, it is often the correct answer and is consistently the cheapest workable one.

AWS SageMaker Model Cards plus Bedrock Guardrails. Strongest where the buyer has standardised on AWS and is using SageMaker for training or Bedrock for inference. Model Cards produce credible governance artefacts inside the AWS console, Bedrock Guardrails provides reasonable runtime defences for hosted-model traffic, and the integration with CloudTrail and CloudWatch gives auditors an audit trail they already understand. Weakest on multi-cloud coverage and on the specific evaluation tooling the LLM observability vendors do better. AWS answers the question how do we govern the AI we run inside AWS without a separate procurement. The right answer for AWS-anchored enterprises; insufficient as the only answer for everyone else.

Google Vertex AI Model Garden and governance tooling. Strongest where the buyer is on Google Cloud and using Vertex for model serving — the Model Garden provides a governance-aware model registry, the Responsible AI Toolkit gives reasonable evaluation tooling, and the integration with Google Cloud’s audit logging is mature. Weakest as a governance posture for non-Google AI — the platform is most useful when the models live inside the Vertex estate. Vertex answers the question how do we govern the AI we run on Google Cloud. A clean fit for Google-anchored enterprises.

Azure AI Foundry safety stack. The most recently rebranded of the three hyperscaler offerings — Foundry is the consolidation of what used to be Azure AI Studio, Azure ML, and the various safety services. Strongest where Microsoft 365 Copilot, Azure OpenAI, and the broader Microsoft estate are the primary AI surface. Combined with Purview (above), it produces the most integrated single-vendor governance posture in the market for Microsoft-anchored enterprises. Weakest on multi-cloud and on independent-evidence claims; the audit story is Microsoft says Microsoft did the right thing, which is fine until a regulator wants third-party validation. Foundry answers the question how do we govern AI inside the Microsoft cloud and productivity estate. For Microsoft-anchored enterprises, the answer is increasingly compelling.

The procurement signal for hyperscaler-native is the percentage of AI workload that lives on one cloud. Above 80% on a single cloud, the hyperscaler suite is the default answer and any other procurement needs a positive justification. Below 50%, the hyperscaler suites become point solutions for the part of the estate that lives on their cloud, and you need a multi-cloud overlay from one of the other archetypes. The cost story is the strongest of the four archetypes — marginal cost on top of an existing cloud bill is often under €100,000 a year for substantial AI estates — but the lock-in is the strongest too, and Conway’s Law applies brutally here: the governance posture will end up shaped like the cloud’s deployment model, whether or not that matches your organisation.

The category most teams miss: third-party LLM API governance

Here is the procurement question that does not appear on the analyst grids and that is the largest single source of shadow AI in most enterprises I have audited in the last eighteen months. Application engineers are calling OpenAI, Anthropic, Cohere, Mistral, and Google APIs directly from product code, often through serverless functions, often through a single shared API key, and almost always without the security or governance team knowing the calls exist. The model-lifecycle platforms do not see this traffic because it does not go through their pipeline. The policy-and-risk platforms do not see it because the AI risk assessment workflow assumes you knew the AI use was happening. The hyperscaler suites see only the calls routed through their own LLM services. The LLM observability platforms see this traffic if and only if you instrument it deliberately, and the instrumentation work is real.

The procurement category that handles this is LLM gateway tooling — a category that overlaps with LLM observability but is procurement-distinct. Platforms like Portkey, Helicone, LiteLLM Enterprise, and the LLM-specific features in cloud API gateways function as the chokepoint between application code and the upstream LLM provider. Once the chokepoint exists, the governance team gets visibility, rate-limiting, logging, redaction, and audit trail; without it, the governance team gets a year-end surprise when finance asks why the OpenAI bill is €240,000 and security asks whose models the company is fine-tuning.

If your inventory exposes substantial vendor-API exposure — and most 2026 inventories do — buy gateway tooling before, or alongside, the broader governance platform. The cost is small (typically €30,000–€120,000 a year depending on volume), the integration is engineering-led rather than procurement-led, and the value is immediate because you can see what traffic exists for the first time. This is the single highest-leverage tooling purchase I have seen in 2026, and it is the one analysts grids systematically under-weight because it does not fit the archetype taxonomy they built in 2023.

The four-week proof of concept that beats the nine-month RFP

Once the archetype is named and the shortlist is three, the next mistake to avoid is the nine-month RFP. Enterprise RFPs in this category produce a decision that is stale on signing day because the platforms iterate faster than the procurement cycle. The methodology I run instead is four weeks, structured against four criteria, executed on a real model from the buyer’s actual inventory.

Week one is criteria definition and inventory selection. The buyer’s CISO, CAIO, and procurement lead agree the four-criterion scoring sheet, weighted to context. Model coverage, deployment-gate integration, evidence-trail quality, three-year total cost. Pick one real model from the inventory that is representative of the procurement question — for model-lifecycle, an MLOps-deployed model; for LLM observability, a customer-facing LLM feature; for policy and risk, a high-risk decision system. The model is the test bed for all three vendors.

Week two is parallel vendor deployment. All three shortlisted vendors get the same scope, the same model, the same access, and the same success criteria. They get one week. The vendors that protest the timeline are signalling that their platform is not deployable in a week; that is useful procurement information. The vendors that adapt are signalling that their delivery organisation can move at enterprise speed; that is also useful procurement information.

Week three is evidence review. The buyer’s CISO walks through the artefacts each vendor produced against the deployment-gate criteria — model card, evaluation log, audit trail, monitoring evidence — and scores each on the four criteria. The CAIO scores model coverage against the inventory. Procurement scores three-year cost against the negotiated proposal. The scoring is done blind first (each scorer scores independently), then reconciled.

Week four is decision and contract. The winner is the vendor with the highest weighted score; ties are broken by three-year total cost. Contract terms are annual or two-year, never five — the consolidation timeline means a five-year commitment is locking in a vendor whose ownership may change within the contract. The methodology has produced a defensible decision in twenty working days, against a real model, with three vendor proposals and a documented scoring trail an internal-audit function can review.

This methodology costs the buyer about €40,000 in internal effort and €0 in external consulting. The savings against a typical nine-month enterprise RFP, where the consulting fees alone run €150,000–€400,000, are the largest single procurement saving I can point to in this category. The reason it works is that it forces the conversation onto the artefacts rather than the slides; the platforms that demo well and deliver badly fail in week two and self-eliminate.

The four-criterion scoring sheet, stated

The sheet I use is intentionally narrow. Four criteria, scored one to five, weighted to context. Total score out of twenty in the unweighted version; the weights move the math but not the structure.

Model coverage. Does the platform actually see your real inventory, including the parts of it that do not live in an MLOps pipeline. The test is not whether the platform supports your stack on paper; it is whether the platform produced governance artefacts for the test model within the four-week window. Five is covers everything in the inventory; one is covers the MLOps slice and nothing else.

Deployment-gate integration. Can the platform produce the artefacts the CISO’s deployment gate procedure requires — model card, evaluation log, data-source manifest, incident runbook reference — in a form that integrates with the gate’s evidence trail. Five is the gate criteria are configurable in the platform and the artefacts are produced automatically as a byproduct of normal use; one is the gate criteria are tracked in a separate spreadsheet and the platform produces marketing-shaped reports that the auditor will not accept.

Evidence-trail quality. Is the audit trail something an internal auditor or external regulator will accept without remediation. The test is to walk an audit-team-equivalent through the trail and ask whether they can answer the question how do you know this model was governed properly. Five is yes, the trail is self-explanatory and the auditor signs off without follow-up; one is the trail exists but every audit question produces a request for additional documentation.

Three-year total cost. Licence plus integration plus the FTE time the platform will demand to operate. The number must be a single euro figure with documented assumptions. Five is the platform’s three-year total cost is at the low end of the archetype’s range and the vendor has been honest about integration cost; one is the licence cost is low but the integration and FTE cost will be three times the licence and the vendor will not commit to a number.

The unweighted score works for most procurements. The weighting matters when one criterion dominates the procurement question — for an EU-operating enterprise approaching the August deadline, evidence-trail quality might be weighted double; for a Microsoft-anchored enterprise on Foundry, integration with the existing estate is implicit in cost and the weight comes off. The full Google Sheet with worked examples is published under CC-BY-4.0 and linked from the governance hub. Fork it.

The consolidation prediction, stated plainly

I will end with a forecast I am willing to be wrong about in public. Of the thirty-five platforms positioning themselves as enterprise AI governance solutions in 2026, at least twenty will be acquired, pivoted out of the category, or quietly wound down by the end of 2028. The mechanism is already visible. LLM observability is being absorbed into the security stack — Cisco’s acquisition of Robust Intelligence in 2024 was the first move, and at least two more in that category will go the same way before 2027. Model-lifecycle platforms are being absorbed by MLOps platforms — Databricks, Snowflake, and the hyperscalers are building or buying governance modules that erode the standalone vendors’ procurement position. Policy-and-risk overlays are being acquired by the broader GRC vendors who already have the distribution. The hyperscaler-native suites are getting better fast enough that the standalone vendors’ multi-cloud value proposition narrows every quarter.

What this means for procurement is straightforward. Buy annual or two-year contracts, not five-year. Avoid vendors whose financial position is opaque — several of the named platforms above are venture-funded and have not yet demonstrated the unit economics that would support standalone independence at scale, and at least three are visibly burning runway in 2026. Assume the platform you buy today will either be part of a larger suite within thirty-six months or will be deprecated. Plan for the data-portability question now; the platforms that make their export schemas open and well-documented are the ones whose ownership change will be survivable.

The one prediction I am most confident about is that the procurement category that will grow fastest in 2027 is LLM gateway tooling, because the shadow-AI inventory problem is not solved by any of the four archetypes and the gateway is the cleanest chokepoint anybody has built. If you are budgeting for AI governance tooling in 2027, weight the gateway category higher than the analyst grids will tell you to. The grids will catch up by 2028; the procurement saving from being early is the largest single available in this market.

If you are starting from scratch, the sequence I would run is: produce the inventory yourself in week one (the CISO piece covers the four-artefact methodology), tier the inventory by EU AI Act risk category in week two, name the archetype your inventory calls for in week three, run the four-week proof of concept in weeks four through seven, sign a one-or-two-year contract in week eight. Total elapsed time, eight weeks. Total cost, between €60,000 and €120,000 of internal effort plus the platform licence. The result is a working governance posture with a documented decision trail. The alternative — a nine-month RFP producing a five-year contract for a platform that will be acquired in eighteen months — is the procurement failure mode I have watched too many enterprises walk into in 2025 and 2026. Do not be one of them.

Sources

EU AI Act, Regulation (EU) 2024/1689 — Annex III high-risk categorisation; Article 72 post-market monitoring
NIST AI Risk Management Framework (AI RMF 1.0) and Generative AI Profile
ISO/IEC 42001:2023 — AI management systems
Credo AI Responsible AI Platform
Holistic AI Governance Platform
IBM watsonx.governance
Microsoft Purview AI Hub
OneTrust AI Governance
ServiceNow AI Governance
Arize AI
Fiddler AI
WhyLabs
Lakera Guard
AWS Bedrock Guardrails and SageMaker Model Cards
Google Vertex AI Model Garden and Responsible AI Toolkit
Azure AI Foundry safety stack
Related: governance hub, CISO governance responsibilities, enterprise governance framework

The four-criterion scoring sheet referenced throughout is published under CC-BY-4.0 as a Google Sheet, linked from the governance hub. Fork it, change the weights, publish a fork with different verdicts and send the link — I will reference it from the next refresh.

Thomas Prommer CIO / CTO · 20 years · Practitioner, not consultant

Tom Prommer writes The AI Strategy Guide from the operator's seat — every tool covered, tested with real money before forming a view. Connect on LinkedIn · prommer.net · X