Agentic AI and Evaluative Regulation

Agentic AI and Evaluative Regulation

Martyn Hopper Posted: 25 February 2026

Artificial intelligence is increasingly embedded across UK retail financial services. Debate around its use has largely focused on bias, explainability, and operational resilience. Those issues matter. But as AI systems become more agentic and autonomous, a different structural question becomes more salient.

As systems move beyond prediction into customer-facing and judgement-linked functions — advice, targeted support, arrears management, bereavement handling, and complaint resolution — how should regulators and industry ensure that optimisation-driven systems do not narrow evaluative conduct standards in practice, particularly where those standards depend on reason-giving and contestability?

The risk is not that AI systems fail technically. It is that they succeed in ways that subtly redefine what compliance means.

1. Agentic AI and the Shift from Prediction to Decision

Traditional models in retail financial services primarily supported discrete functions such as credit scoring or fraud detection.

Emerging agentic AI systems may (for example):

These systems do not merely predict; they operationalise regulatory obligations in real time. This creates a qualitatively different governance challenge.

Machine learning systems:

Large language models, when embedded into workflows, are no different in principle. They optimise sequence prediction and can be configured to pursue downstream objectives.

Critically, although they can mimic human reasoning, these systems do not perform such reasoning: they do not reason about legal standards; they do not interpret fairness or good faith; they do not apply reasonableness tests; they do not understand the authority structure of rules vs guidance vs industry practice. They optimise measurable proxies.

2. Historical Retail Failures as Stress Tests

Past systemic failures — including pensions mis-selling, PPI, discretionary commission arrangements in motor finance, and widespread deficiencies in bereavement and arrears handling — share common characteristics:

In many cases, aggregate indicators (including those established under the FSA/FCA Treating Customers Fairly rules and guidance) did not immediately reveal detriment.

The failure was not absence of systems, but misalignment between incentives, metrics, and substantive regulatory standards such as suitability, fairness, and good faith.

As firms adopt agentic AI in advice, customer support, and complaint resolution, the same structural risk arises in a new form: optimisation objectives may become the operative definition of compliance.

3. The "Objective Function" Response — and Its Limits

It may be argued that this concern is misplaced because firms can define AI objectives to optimise directly for regulatory compliance.

This response is partly correct. Firms can and should:

However, three structural limits remain in evaluative regulatory domains.

3.1 Underspecification of Evaluative Standards

Core retail obligations — including acting in good faith, avoiding foreseeable harm, ensuring fair value, and providing suitable advice — are deliberately open-textured.

They require:

No optimisation objective can exhaustively encode these standards. Objective functions necessarily rely on measurable proxies. Metrics are evidence of compliance. They are not equivalent to the standard itself.

There is ample historic and academic work exploring the limits of measurable proxies and performance metrics and the perverse incentives to which they can give rise. Embedding them in agentic AI risks similar problems at increased scale and velocity and greater opacity.

3.2 Reason-Giving and Contestability

Retail regulatory legitimacy depends not only on aggregate outcomes but on the ability to justify individual decisions. Under DISP and Financial Ombudsman Service (FOS) review, for example, the question is typically: was the firm's decision fair and reasonable in the circumstances?

An optimisation system produces outputs calibrated against defined objectives. It does not, by itself, provide normative justification. If a consumer challenging an adverse decision is met with the explanation "the system applied our trained model in line with policy," contestability may become formal rather than substantive.

Meaningful review requires:

3.3 Objective Function Design Is Itself Normative

Defining the system's objective function involves choices about which harms are counted; how trade-offs are weighted; what level of detriment is tolerable; and how commercial constraints are embedded. These are governance decisions, not purely technical matters.

The risk is not that objectives cannot be specified. It is that they may narrow evaluative standards in ways that are operationally efficient but hidden in system design and therefore insufficiently visible to boards, supervisors, and consumers.

4. Practical Consumer-Facing Risks

The structural issues above have concrete implications.

4.1 Advice and Targeted Support

An AI advice agent may be trained to optimise for:

However, suitability requires consideration of:

If optimisation objectives shape recommendation boundaries, suitability may be defined by model parameters rather than by open-textured judgement.

4.2 Complaint Handling

Agentic systems may:

If systems are calibrated to minimise FOS referrals or complaint volumes, there is a risk that:

Complaint rates may fall while contestability and fairness weaken.

4.3 Arrears and Vulnerability

These areas are particularly sensitive. AI may improve detection of vulnerability signals and consistency of treatment.

However, if repayment plans or support pathways are optimised within recovery or impairment constraints, marginal cases may be resolved by reference to statistical efficiency rather than contextual good faith or fair treatment.

Where systems become determinative and human override becomes operationally difficult, the firm may struggle to demonstrate that evaluative standards have genuinely been applied.

5. Aggregate Metrics vs Individual Fairness

AI systems are typically assessed at portfolio level. Retail consumer protection frequently operates at individual level.

The FCA may wish to consider whether supervisory expectations should ensure that:

6. Distinguishing Rule Types

The interaction between AI and regulation differs by rule type.

Explicit recognition of this distinction may assist firms and supervisors in calibrating expectations.

7. Conclusion

AI has significant potential to enhance consistency, efficiency, and early harm detection in retail financial services.

The long-term risk is not technological failure alone. It is that optimisation logic may, over time, narrow the practical meaning of evaluative regulatory standards if not accompanied by governance structures that preserve reason-giving and contestability.

Agentic AI does not eliminate the structural dynamics that produced past retail failures. It may intensify them if optimisation objectives quietly redefine compliance in practice.

By addressing these issues proactively, regulators and industry can support innovation while preserving the normative foundations of retail consumer protection.

This paper was originally submitted to the FCA's Mills Review.

This Insight reflects our independent perspective only and is not legal advice. Full disclaimer →