Artificial intelligence is increasingly embedded across UK retail financial services. Debate around its use has largely focused on bias, explainability, and operational resilience. Those issues matter. But as AI systems become more agentic and autonomous, a different structural question becomes more salient.
As systems move beyond prediction into customer-facing and judgement-linked functions — advice, targeted support, arrears management, bereavement handling, and complaint resolution — how should regulators and industry ensure that optimisation-driven systems do not narrow evaluative conduct standards in practice, particularly where those standards depend on reason-giving and contestability?
The risk is not that AI systems fail technically. It is that they succeed in ways that subtly redefine what compliance means.
1. Agentic AI and the Shift from Prediction to Decision
Traditional models in retail financial services primarily supported discrete functions such as credit scoring or fraud detection.
Emerging agentic AI systems may (for example):
- Interact directly with customers
- Generate recommendations
- Determine escalation pathways
- Draft complaint responses
- Propose redress outcomes
- Structure repayment plans
These systems do not merely predict; they operationalise regulatory obligations in real time. This creates a qualitatively different governance challenge.
Machine learning systems:
- Are trained on historical data
- Optimise against measurable objectives
- Adjust internal parameters to improve predictive performance
- Produce outputs that maximise defined reward functions
Large language models, when embedded into workflows, are no different in principle. They optimise sequence prediction and can be configured to pursue downstream objectives.
Critically, although they can mimic human reasoning, these systems do not perform such reasoning: they do not reason about legal standards; they do not interpret fairness or good faith; they do not apply reasonableness tests; they do not understand the authority structure of rules vs guidance vs industry practice. They optimise measurable proxies.
2. Historical Retail Failures as Stress Tests
Past systemic failures — including pensions mis-selling, PPI, discretionary commission arrangements in motor finance, and widespread deficiencies in bereavement and arrears handling — share common characteristics:
- Formal compliance processes existed
- Metrics were monitored
- Documentation was extensive
- Incentives operated beneath formal structures
In many cases, aggregate indicators (including those established under the FSA/FCA Treating Customers Fairly rules and guidance) did not immediately reveal detriment.
The failure was not absence of systems, but misalignment between incentives, metrics, and substantive regulatory standards such as suitability, fairness, and good faith.
As firms adopt agentic AI in advice, customer support, and complaint resolution, the same structural risk arises in a new form: optimisation objectives may become the operative definition of compliance.
3. The "Objective Function" Response — and Its Limits
It may be argued that this concern is misplaced because firms can define AI objectives to optimise directly for regulatory compliance.
This response is partly correct. Firms can and should:
- Define compliance-aligned objectives
- Encode constraints
- Monitor harm indicators
- Retrain systems where issues emerge
However, three structural limits remain in evaluative regulatory domains.
3.1 Underspecification of Evaluative Standards
Core retail obligations — including acting in good faith, avoiding foreseeable harm, ensuring fair value, and providing suitable advice — are deliberately open-textured.
They require:
- Contextual judgement
- Balancing competing considerations
- Interpretation of evolving circumstances
No optimisation objective can exhaustively encode these standards. Objective functions necessarily rely on measurable proxies. Metrics are evidence of compliance. They are not equivalent to the standard itself.
There is ample historic and academic work exploring the limits of measurable proxies and performance metrics and the perverse incentives to which they can give rise. Embedding them in agentic AI risks similar problems at increased scale and velocity and greater opacity.
3.2 Reason-Giving and Contestability
Retail regulatory legitimacy depends not only on aggregate outcomes but on the ability to justify individual decisions. Under DISP and Financial Ombudsman Service (FOS) review, for example, the question is typically: was the firm's decision fair and reasonable in the circumstances?
An optimisation system produces outputs calibrated against defined objectives. It does not, by itself, provide normative justification. If a consumer challenging an adverse decision is met with the explanation "the system applied our trained model in line with policy," contestability may become formal rather than substantive.
Meaningful review requires:
- Articulation of the relevant regulatory standard
- Explanation of how it was applied in context
- Capacity for principled departure from model output
3.3 Objective Function Design Is Itself Normative
Defining the system's objective function involves choices about which harms are counted; how trade-offs are weighted; what level of detriment is tolerable; and how commercial constraints are embedded. These are governance decisions, not purely technical matters.
The risk is not that objectives cannot be specified. It is that they may narrow evaluative standards in ways that are operationally efficient but hidden in system design and therefore insufficiently visible to boards, supervisors, and consumers.
4. Practical Consumer-Facing Risks
The structural issues above have concrete implications.
4.1 Advice and Targeted Support
An AI advice agent may be trained to optimise for:
- Long-term portfolio performance
- Predicted complaint minimisation
- Retention
- Risk-adjusted suitability scores
However, suitability requires consideration of:
- Customer preferences
- Risk appetite nuance
- Non-financial objectives
- Changing life circumstances
If optimisation objectives shape recommendation boundaries, suitability may be defined by model parameters rather than by open-textured judgement.
4.2 Complaint Handling
Agentic systems may:
- Draft responses
- Propose redress
- Predict escalation risk
- Optimise early settlement strategies
If systems are calibrated to minimise FOS referrals or complaint volumes, there is a risk that:
- Redress is calibrated to escalation risk rather than fairness
- Narrative reasoning defends model outputs rather than engages with substantive standards
Complaint rates may fall while contestability and fairness weaken.
4.3 Arrears and Vulnerability
These areas are particularly sensitive. AI may improve detection of vulnerability signals and consistency of treatment.
However, if repayment plans or support pathways are optimised within recovery or impairment constraints, marginal cases may be resolved by reference to statistical efficiency rather than contextual good faith or fair treatment.
Where systems become determinative and human override becomes operationally difficult, the firm may struggle to demonstrate that evaluative standards have genuinely been applied.
5. Aggregate Metrics vs Individual Fairness
AI systems are typically assessed at portfolio level. Retail consumer protection frequently operates at individual level.
The FCA may wish to consider whether supervisory expectations should ensure that:
- Individual outcomes remain reviewable by reference to regulatory standards, not merely model integrity
- Human override mechanisms are meaningful rather than formal
- Proxy metrics (complaint rates, retention, comprehension testing) are treated as indicators, not definitions, of compliance
6. Distinguishing Rule Types
The interaction between AI and regulation differs by rule type.
- Tolerance-based rules (e.g. capital ratios, impact tolerances in operational resilience) align naturally with optimisation.
- Managerial governance regimes (e.g. operational resilience mapping, and Consumer Duty requirements on product design and governance) can be strengthened by AI modelling.
- Evaluative conduct standards (e.g. Consumer Duty rules requiring firms to act in good faith and to avoid foreseeable harm) require additional safeguards because legitimacy depends on reason-giving and contestability.
Explicit recognition of this distinction may assist firms and supervisors in calibrating expectations.
7. Conclusion
AI has significant potential to enhance consistency, efficiency, and early harm detection in retail financial services.
The long-term risk is not technological failure alone. It is that optimisation logic may, over time, narrow the practical meaning of evaluative regulatory standards if not accompanied by governance structures that preserve reason-giving and contestability.
Agentic AI does not eliminate the structural dynamics that produced past retail failures. It may intensify them if optimisation objectives quietly redefine compliance in practice.
By addressing these issues proactively, regulators and industry can support innovation while preserving the normative foundations of retail consumer protection.
This paper was originally submitted to the FCA's Mills Review.