Agentic AI and Evaluative Regulation — Optimisation Logic and Conduct Standards

Artificial intelligence is increasingly embedded across UK retail financial services. Debate around its use has largely focused on bias, explainability, and operational resilience. Those issues matter. But as AI systems become more agentic and autonomous, a different structural question becomes more salient.

As systems move beyond prediction into customer-facing and judgement-linked functions — advice, targeted support, arrears management, bereavement handling, and complaint resolution — how should regulators and industry ensure that optimisation-driven systems do not narrow evaluative conduct standards in practice, particularly where those standards depend on reason-giving and contestability?

The risk is not that AI systems fail technically. It is that they succeed in ways that subtly redefine what compliance means.

1. Agentic AI and the Shift from Prediction to Decision

Traditional models in retail financial services primarily supported discrete functions such as credit scoring or fraud detection.

Emerging agentic AI systems may (for example):

Interact directly with customers
Generate recommendations
Determine escalation pathways
Draft complaint responses
Propose redress outcomes
Structure repayment plans

These systems do not merely predict; they operationalise regulatory obligations in real time. This creates a qualitatively different governance challenge.

Machine learning systems:

Are trained on historical data
Optimise against measurable objectives
Adjust internal parameters to improve predictive performance
Produce outputs that maximise defined reward functions

Large language models, when embedded into workflows, are no different in principle. They optimise sequence prediction and can be configured to pursue downstream objectives.

Critically, although they can mimic human reasoning, these systems do not perform such reasoning: they do not reason about legal standards; they do not interpret fairness or good faith; they do not apply reasonableness tests; they do not understand the authority structure of rules vs guidance vs industry practice. They optimise measurable proxies.

2. Historical Retail Failures as Stress Tests

Past systemic failures — including pensions mis-selling, PPI, discretionary commission arrangements in motor finance, and widespread deficiencies in bereavement and arrears handling — share common characteristics:

Formal compliance processes existed
Metrics were monitored
Documentation was extensive
Incentives operated beneath formal structures

In many cases, aggregate indicators (including those established under the FSA/FCA Treating Customers Fairly rules and guidance) did not immediately reveal detriment.

The failure was not absence of systems, but misalignment between incentives, metrics, and substantive regulatory standards such as suitability, fairness, and good faith.

As firms adopt agentic AI in advice, customer support, and complaint resolution, the same structural risk arises in a new form: optimisation objectives may become the operative definition of compliance.

3. The "Objective Function" Response — and Its Limits

It may be argued that this concern is misplaced because firms can define AI objectives to optimise directly for regulatory compliance.

This response is partly correct. Firms can and should:

Define compliance-aligned objectives
Encode constraints
Monitor harm indicators
Retrain systems where issues emerge

However, three structural limits remain in evaluative regulatory domains.

3.1 Underspecification of Evaluative Standards

Core retail obligations — including acting in good faith, avoiding foreseeable harm, ensuring fair value, and providing suitable advice — are deliberately open-textured.

They require:

Contextual judgement
Balancing competing considerations
Interpretation of evolving circumstances

No optimisation objective can exhaustively encode these standards. Objective functions necessarily rely on measurable proxies. Metrics are evidence of compliance. They are not equivalent to the standard itself.

There is ample historic and academic work exploring the limits of measurable proxies and performance metrics and the perverse incentives to which they can give rise. Embedding them in agentic AI risks similar problems at increased scale and velocity and greater opacity.

3.2 Reason-Giving and Contestability

Retail regulatory legitimacy depends not only on aggregate outcomes but on the ability to justify individual decisions. Under DISP and Financial Ombudsman Service (FOS) review, for example, the question is typically: was the firm's decision fair and reasonable in the circumstances?

An optimisation system produces outputs calibrated against defined objectives. It does not, by itself, provide normative justification. If a consumer challenging an adverse decision is met with the explanation "the system applied our trained model in line with policy," contestability may become formal rather than substantive.

Meaningful review requires:

Articulation of the relevant regulatory standard
Explanation of how it was applied in context
Capacity for principled departure from model output

3.3 Objective Function Design Is Itself Normative

Defining the system's objective function involves choices about which harms are counted; how trade-offs are weighted; what level of detriment is tolerable; and how commercial constraints are embedded. These are governance decisions, not purely technical matters.

The risk is not that objectives cannot be specified. It is that they may narrow evaluative standards in ways that are operationally efficient but hidden in system design and therefore insufficiently visible to boards, supervisors, and consumers.

4. Practical Consumer-Facing Risks

The structural issues above have concrete implications.

4.1 Advice and Targeted Support

An AI advice agent may be trained to optimise for:

Long-term portfolio performance
Predicted complaint minimisation
Retention
Risk-adjusted suitability scores

However, suitability requires consideration of:

Customer preferences
Risk appetite nuance
Non-financial objectives
Changing life circumstances

If optimisation objectives shape recommendation boundaries, suitability may be defined by model parameters rather than by open-textured judgement.

4.2 Complaint Handling

Agentic systems may:

Draft responses
Propose redress
Predict escalation risk
Optimise early settlement strategies

If systems are calibrated to minimise FOS referrals or complaint volumes, there is a risk that:

Redress is calibrated to escalation risk rather than fairness
Narrative reasoning defends model outputs rather than engages with substantive standards

Complaint rates may fall while contestability and fairness weaken.

4.3 Arrears and Vulnerability

These areas are particularly sensitive. AI may improve detection of vulnerability signals and consistency of treatment.

However, if repayment plans or support pathways are optimised within recovery or impairment constraints, marginal cases may be resolved by reference to statistical efficiency rather than contextual good faith or fair treatment.

Where systems become determinative and human override becomes operationally difficult, the firm may struggle to demonstrate that evaluative standards have genuinely been applied.

5. Aggregate Metrics vs Individual Fairness

AI systems are typically assessed at portfolio level. Retail consumer protection frequently operates at individual level.

The FCA may wish to consider whether supervisory expectations should ensure that:

Individual outcomes remain reviewable by reference to regulatory standards, not merely model integrity
Human override mechanisms are meaningful rather than formal
Proxy metrics (complaint rates, retention, comprehension testing) are treated as indicators, not definitions, of compliance

6. Distinguishing Rule Types

The interaction between AI and regulation differs by rule type.

Tolerance-based rules (e.g. capital ratios, impact tolerances in operational resilience) align naturally with optimisation.
Managerial governance regimes (e.g. operational resilience mapping, and Consumer Duty requirements on product design and governance) can be strengthened by AI modelling.
Evaluative conduct standards (e.g. Consumer Duty rules requiring firms to act in good faith and to avoid foreseeable harm) require additional safeguards because legitimacy depends on reason-giving and contestability.

Explicit recognition of this distinction may assist firms and supervisors in calibrating expectations.

7. Conclusion

AI has significant potential to enhance consistency, efficiency, and early harm detection in retail financial services.

The long-term risk is not technological failure alone. It is that optimisation logic may, over time, narrow the practical meaning of evaluative regulatory standards if not accompanied by governance structures that preserve reason-giving and contestability.

Agentic AI does not eliminate the structural dynamics that produced past retail failures. It may intensify them if optimisation objectives quietly redefine compliance in practice.

By addressing these issues proactively, regulators and industry can support innovation while preserving the normative foundations of retail consumer protection.

This paper was originally submitted to the FCA's Mills Review.