AI Risk Lens — Actuarial Explainability for the AI Black Box

The contribution

An inversion, not another governance checklist

The IAA AI Governance Framework, the SOA AI Task Force, and the NIST AI RMF all describe actuaries governing AI. None proposes the actuarial measurement toolkit as the explainability standard itself. That inversion — using credibility, reserving, A/E studies and TVaR to quantify model risk — is the idea developed here.

The software-engineering default

One accuracy number (e.g. "94% AUC") — a portfolio average that hides who gets hurt.
"95% confidence" — discards the 5% tail that contains the catastrophe.
SHAP / LIME — attribute which features mattered, not how much the error costs.
One-time validation — a snapshot, blind to cohort drift after deployment.
No reserve, no capital, no signature — nobody is accountable for tail risk.

The actuarial standard (this framework)

Frequency × Severity — a full loss distribution: how often and how badly.
VaR / TVaR (CTE) — explicitly price the tail "95% confidence" ignores.
Credibility (Z = n/(n+k)) — how much trust each output earns from its evidence.
Actual-to-Expected studies — continuous, cohort-level drift monitoring.
IBNR reserve + economic capital + a signed opinion — accountable governance.

Why now

The law already made actuarial reasoning the yardstick

Under Colorado SB21-169 / Regulation 10-1-1, an insurer's AI is permissible only where differential outcomes have a "legitimate actuarial basis." The NAIC Model AI Bulletin (adopted by 20+ states) demands a governed, risk-commensurate AI program. In insurance, actuarial reasoning is already the statutory test for acceptable AI. This framework formalizes that test into measurable metrics — and shows the same metrics generalize beyond insurance.

See the regulatory anchor and prior-art positioning in the paper →

The intellectual heart

Seven actuarial methods, mapped to AI governance questions

Actuarial method	The governance question it answers for an AI system
Frequency × Severity	Not one accuracy number, but how often the model errs × how badly each error hurts — a full loss distribution.
Credibility (Bühlmann)	How much weight a given AI output deserves vs. the base rate, given how much relevant experience backs it.
IBNR / reserving	"Incurred but not reported" model failures — errors baked in but not yet surfaced — and the reserve to hold.
Actual-to-Expected study	Cohort-level, statistically rigorous drift / miscalibration monitoring vs. one-time validation.
VaR / TVaR (CTE)	Sizing catastrophic tail failure — exactly what "95% confidence" discards in its 5% tail.
Economic capital	The buffer to hold against AI tail risk (TVaR − expected loss).
Control cycle + signed opinion	A credentialed accountability layer: a "Statement of Actuarial AI Opinion" attesting to model risk.

Compute them live → Read the expanded mappings

Not just insurance

One lens, any agentic system

The framework needs only four things from a system: a stream of decisions, a definition of an error, a dollar severity, and cohorts to watch. Any AI or agentic system that decides at volume supplies all four — so its errors are an insurable book. The interactive tool measures eight of them, computed live from the same actuarial functions.

Regulated decisioning

Health insurer — prior authorization

Wrongful-denial book; A/E flags the 85+ cohort at 1.40; $15.7M IBNR reserve.

Autonomous operations

BPO — customer-ops agent

Mis-resolutions under client SLAs; A/E flags the KYC queue at 1.43; $3.2M reserve.

Autonomous growth

Marketing — campaign agent

Tiny per-action cost, but the fattest tail (TVaR ≈ 19× expected): the rare brand/compliance blowup.

Autonomous engineering

Software — coding agent

Latent defects: IBNR (453) > reported (302); a $27M reserve a green test suite never shows.

Financial crime / AML

Bank — fraud & AML agent

A/E catches a new laundering typology (crypto, trade-based) before the regulator; $11.2M reserve.

Regulated lending

Lender — credit-underwriting agent

Insurance's near-twin (ECOA fair-lending); seasoning defaults ⇒ IBNR > reported; $23.8M reserve.

Clinical decisioning

Health system — triage agent

Highest stakes: mean severity $15.8k, a $42.9M reserve; A/E flags geriatric & mental-health under-triage.

Customer experience

SaaS — customer-support agent

High-frequency, low-severity, but the tail is a churned enterprise account or a viral security miss.

Switch domains in the tool → Read the cross-domain section