Back to Resources

AI Agent Governance: The New Board-Level CFO Question

CFO Strategy

Share
Executive summary: AI agents that autonomously execute financial workflows — refunds, reconciliations, invoice matching, expense approvals — have moved from pilots to production at most growth-stage fintechs during 2025 and early 2026. The board question is no longer "should we use them" but "what happens when they get one wrong". This piece sets out a five-part governance framework that CFOs are being asked to present to their boards this year: taxonomy, RACI, spend controls, incident response and audit trail.

Why This Question Landed in 2026

Three developments have converged to make AI agent governance a board-level topic. First, agent frameworks (LangGraph, Claude Agents, OpenAI Assistants, Vercel AI SDK agent primitives) matured to the point where a growth-stage engineering team can put an autonomous workflow into production in a few weeks rather than a few months. Second, the cost of running these agents has fallen far enough that "run this task through an agent" is now defensible economics for tasks that previously did not warrant automation. Third, external auditors and regulators have caught up: the FRC's AI Assurance Framework, the PRA's SS 5/24 on model risk management, and the EU AI Act's phased entry into force during 2026 all now expect firms to be able to articulate what their AI is doing.

The gap most CFOs face is that AI agents were built and deployed by engineering and product teams before finance, risk or the board had a seat at the table. In the April 2026 piece on AI agents in finance we set out the operational picture; this piece answers the follow-up question that most boards are now asking: how do we govern this?

Step One: Build the AI Agent Taxonomy

You cannot govern what you cannot enumerate. The first governance step is a taxonomy of AI agents currently deployed, with three attributes for each: the workflow it executes, the level of autonomy it operates at, and the class of decisions it can take without human review.

Tier
Autonomy Description
Human Review
Example Workflow
Tier 1
Advisory onlySuggests, does not execute
100%
Coding assistant
Tier 2
Executes below thresholdAutonomous under a cash / risk cap
Above cap
Refund up to £50
Tier 3
Executes with samplingAutonomous, sample-audited weekly
Sample only
Bank reconciliation
Tier 4
Fully autonomousExecutes without review; needs strong monitoring
Post-hoc only
Fraud triage bot

Most growth-stage fintechs discover on running this exercise that they have between six and twelve production AI agents, of which two or three sit in Tier 3 or Tier 4 without anyone in finance being able to name them. That gap is the reason for the taxonomy.

Step Two: The RACI for Autonomous Spend

Once you have the taxonomy, assign a RACI for each agent. The clarifying test is: if this agent misfires and pays out £10,000 in error, who is responsible for detecting it, who is accountable for the consequences, who is consulted on remediation, and who is informed? For most Tier 2 and above agents, the answer at the point of first deployment is "no-one has thought about it".

The three roles that are typically missing:

  • Responsible for detection: The team that owns the monitoring signal for anomalous agent behaviour. This is usually engineering or data, and needs to be explicit rather than assumed.
  • Accountable for consequences: The functional owner (finance, ops, customer support) who is on the hook if the agent produces harm. This is typically NOT the engineering team that built it, but the operational team whose workflow was automated.
  • Informed at the board level: Which board-level metric surfaces if the agent goes wrong. For Tier 3 and Tier 4 agents, this needs a dedicated line in the risk dashboard, not a subsection of "engineering ops".
The accountability trap: When something goes wrong with an AI agent, the temptation is to attribute the failure to "the model". The FRC has been explicit that this is not an acceptable framing: accountability rests with the humans who deployed the agent under a governance framework. The board attestation needs to identify the named accountable individual for each Tier 2 or above agent.

Step Three: Spend Controls Around Autonomous Actions

For any agent that can move money or issue credits, there must be a spend control envelope. The default template is a three-part limit: per-transaction cap, aggregate daily cap, and aggregate monthly cap. Each cap should be small enough that a fully-corrupted agent misfiring for a full day cannot cause material damage. In practice, that means the daily aggregate cap should be less than 5 per cent of the equivalent human-approved budget.

Per-transaction cap
£50Default for customer-facing refund agents
Daily aggregate cap
£2kCircuit-breaker on runaway agent behaviour
Monthly cap
£25kRolling-30-day, escalates to CFO at 80%
Anomaly threshold
3σ from 90-day rolling mean triggers immediate pause

The critical piece is the anomaly threshold. Agents fail in one of two ways: gradually (a slow drift as inputs change) or catastrophically (a single input pushes the agent into an unexpected state and it starts approving every request). Static caps catch the catastrophic case. Statistical anomaly detection on daily spend patterns catches the gradual case. Both need to be in place.

Step Four: Incident Response Playbook

When an AI agent misbehaves, the response window is measured in hours, not days. Most engineering teams do not have an incident playbook for AI-specific failures because the failure modes are different from traditional software incidents. Four items belong in the playbook.

  1. Kill switch mechanism. Every Tier 2 and above agent must have a documented and tested kill switch. The person on call needs to be able to stop the agent without a code deploy.
  2. Rollback of executed actions. For agents that move money, a documented rollback procedure. What refunds can be reversed, what cannot, and who authorises non-standard reversals.
  3. Customer communication template. Pre-drafted communications for common failure scenarios (mis-issued refund, incorrect status update, wrongful account action). Draft these in advance, in the same tone as your standard customer support voice.
  4. Post-incident review with named remediation. Every incident requires a written review within 5 working days, with a named remediation owner and a completion deadline. Track these in the same system as security incidents.

"The question the board is really asking is not should we let the AI do this. It is if this agent misfires in the worst plausible way, who catches it, and how quickly do we stop it? Answering that question with specifics — named individual, named metric, named threshold, tested kill switch — is what CFO-grade AI governance looks like in 2026."

Step Five: The Audit Trail

External auditors have started to ask for AI-specific evidence during interim reviews. For 2026, the specific items to have on hand are:

  • Agent inventory with tier classification. The taxonomy from step one, updated within the past six months.
  • Sample decision logs. For Tier 2 and above agents, a sample of decisions with the input, the reasoning trace, and the executed action. Twelve samples per agent, per audit period.
  • Cap breach history. Any transaction that hit the per-transaction, daily, or monthly cap, and the resulting action.
  • Incident log. Every AI-specific incident and its remediation status.
  • Model change control. When was the underlying model or prompt last changed, by whom, with what testing, with what approval.

Firms that put this in place proactively find that the audit costs less and the audit exceptions are lower. Firms that scramble at year-end discover that the evidence they need is not being logged, and the remediation is expensive.

The proportionality point: This framework should scale to the risk. A Tier 1 advisory-only coding assistant does not need cap breach logging. A Tier 4 fraud triage bot with authority to freeze customer accounts absolutely does. Applying the framework uniformly to every agent creates governance theatre; applying it in proportion to autonomy tier creates real assurance.

Key Takeaways

  • AI agent governance is now a board-level CFO topic because agents have moved from pilot to production and regulators have caught up.
  • Start with a taxonomy: list every AI agent, classify by autonomy tier, and identify the workflow it executes. Most companies discover they have twice as many Tier 3 or Tier 4 agents as they thought.
  • Assign a RACI per agent. The three roles most often missing are Responsible-for-detection, Accountable-for-consequences, and Informed-at-board-level.
  • Implement three-tier spend controls: per-transaction, daily aggregate, monthly aggregate. Add statistical anomaly detection to catch gradual drift as well as catastrophic failure.
  • Build an incident response playbook that includes a tested kill switch, a rollback procedure, pre-drafted customer communications, and a post-incident review with a named remediation owner.
  • Keep an audit trail: agent inventory, sample decision logs, cap breach history, incident log, model change control. Auditors are now asking for this evidence during interim reviews.
  • Apply the framework in proportion to autonomy tier. Uniform governance across every agent creates theatre; tiered governance creates assurance.

Work Together

Need this applied to
your business?

AI agent governance frameworks, spend controls and audit trail design for growth-stage companies. We bring CFO-level rigour without the full-time cost.

Book a Free Discovery Call →