Why This Question Landed in 2026
Three developments have converged to make AI agent governance a board-level topic. First, agent frameworks (LangGraph, Claude Agents, OpenAI Assistants, Vercel AI SDK agent primitives) matured to the point where a growth-stage engineering team can put an autonomous workflow into production in a few weeks rather than a few months. Second, the cost of running these agents has fallen far enough that "run this task through an agent" is now defensible economics for tasks that previously did not warrant automation. Third, external auditors and regulators have caught up: the FRC's AI Assurance Framework, the PRA's SS 5/24 on model risk management, and the EU AI Act's phased entry into force during 2026 all now expect firms to be able to articulate what their AI is doing.
The gap most CFOs face is that AI agents were built and deployed by engineering and product teams before finance, risk or the board had a seat at the table. In the April 2026 piece on AI agents in finance we set out the operational picture; this piece answers the follow-up question that most boards are now asking: how do we govern this?
Step One: Build the AI Agent Taxonomy
You cannot govern what you cannot enumerate. The first governance step is a taxonomy of AI agents currently deployed, with three attributes for each: the workflow it executes, the level of autonomy it operates at, and the class of decisions it can take without human review.
Most growth-stage fintechs discover on running this exercise that they have between six and twelve production AI agents, of which two or three sit in Tier 3 or Tier 4 without anyone in finance being able to name them. That gap is the reason for the taxonomy.
Step Two: The RACI for Autonomous Spend
Once you have the taxonomy, assign a RACI for each agent. The clarifying test is: if this agent misfires and pays out £10,000 in error, who is responsible for detecting it, who is accountable for the consequences, who is consulted on remediation, and who is informed? For most Tier 2 and above agents, the answer at the point of first deployment is "no-one has thought about it".
The three roles that are typically missing:
- Responsible for detection: The team that owns the monitoring signal for anomalous agent behaviour. This is usually engineering or data, and needs to be explicit rather than assumed.
- Accountable for consequences: The functional owner (finance, ops, customer support) who is on the hook if the agent produces harm. This is typically NOT the engineering team that built it, but the operational team whose workflow was automated.
- Informed at the board level: Which board-level metric surfaces if the agent goes wrong. For Tier 3 and Tier 4 agents, this needs a dedicated line in the risk dashboard, not a subsection of "engineering ops".
Step Three: Spend Controls Around Autonomous Actions
For any agent that can move money or issue credits, there must be a spend control envelope. The default template is a three-part limit: per-transaction cap, aggregate daily cap, and aggregate monthly cap. Each cap should be small enough that a fully-corrupted agent misfiring for a full day cannot cause material damage. In practice, that means the daily aggregate cap should be less than 5 per cent of the equivalent human-approved budget.
The critical piece is the anomaly threshold. Agents fail in one of two ways: gradually (a slow drift as inputs change) or catastrophically (a single input pushes the agent into an unexpected state and it starts approving every request). Static caps catch the catastrophic case. Statistical anomaly detection on daily spend patterns catches the gradual case. Both need to be in place.
Step Four: Incident Response Playbook
When an AI agent misbehaves, the response window is measured in hours, not days. Most engineering teams do not have an incident playbook for AI-specific failures because the failure modes are different from traditional software incidents. Four items belong in the playbook.
- Kill switch mechanism. Every Tier 2 and above agent must have a documented and tested kill switch. The person on call needs to be able to stop the agent without a code deploy.
- Rollback of executed actions. For agents that move money, a documented rollback procedure. What refunds can be reversed, what cannot, and who authorises non-standard reversals.
- Customer communication template. Pre-drafted communications for common failure scenarios (mis-issued refund, incorrect status update, wrongful account action). Draft these in advance, in the same tone as your standard customer support voice.
- Post-incident review with named remediation. Every incident requires a written review within 5 working days, with a named remediation owner and a completion deadline. Track these in the same system as security incidents.
"The question the board is really asking is not should we let the AI do this. It is if this agent misfires in the worst plausible way, who catches it, and how quickly do we stop it? Answering that question with specifics — named individual, named metric, named threshold, tested kill switch — is what CFO-grade AI governance looks like in 2026."
Step Five: The Audit Trail
External auditors have started to ask for AI-specific evidence during interim reviews. For 2026, the specific items to have on hand are:
- Agent inventory with tier classification. The taxonomy from step one, updated within the past six months.
- Sample decision logs. For Tier 2 and above agents, a sample of decisions with the input, the reasoning trace, and the executed action. Twelve samples per agent, per audit period.
- Cap breach history. Any transaction that hit the per-transaction, daily, or monthly cap, and the resulting action.
- Incident log. Every AI-specific incident and its remediation status.
- Model change control. When was the underlying model or prompt last changed, by whom, with what testing, with what approval.
Firms that put this in place proactively find that the audit costs less and the audit exceptions are lower. Firms that scramble at year-end discover that the evidence they need is not being logged, and the remediation is expensive.
Key Takeaways
- AI agent governance is now a board-level CFO topic because agents have moved from pilot to production and regulators have caught up.
- Start with a taxonomy: list every AI agent, classify by autonomy tier, and identify the workflow it executes. Most companies discover they have twice as many Tier 3 or Tier 4 agents as they thought.
- Assign a RACI per agent. The three roles most often missing are Responsible-for-detection, Accountable-for-consequences, and Informed-at-board-level.
- Implement three-tier spend controls: per-transaction, daily aggregate, monthly aggregate. Add statistical anomaly detection to catch gradual drift as well as catastrophic failure.
- Build an incident response playbook that includes a tested kill switch, a rollback procedure, pre-drafted customer communications, and a post-incident review with a named remediation owner.
- Keep an audit trail: agent inventory, sample decision logs, cap breach history, incident log, model change control. Auditors are now asking for this evidence during interim reviews.
- Apply the framework in proportion to autonomy tier. Uniform governance across every agent creates theatre; tiered governance creates assurance.