Today's AI governance inherits software-engineering defaults: one accuracy number, a "95% confidence" figure, and feature attributions (SHAP / LIME). Those answer which inputs mattered and how often the model is right on average. They are silent on the questions a regulator, a board, or a denied patient actually asks: how badly does each error hurt, how much trust does this output deserve, how much capital should we hold against tail failures, and will a credentialed professional sign their name to it?
Actuarial science answers exactly those questions for uncertain future losses — and has for a century. This project makes the case, with math and a live worked example, that the actuarial toolkit should be the standard for measuring and governing the AI black box.
Open the AI Risk Lens → Read the frameworkFigures computed live in the tool from a 10,000-decision synthetic book.
The IAA AI Governance Framework, the SOA AI Task Force, and the NIST AI RMF all describe actuaries governing AI. None proposes the actuarial measurement toolkit as the explainability standard itself. That inversion — using credibility, reserving, A/E studies and TVaR to quantify model risk — is the idea developed here.
Under Colorado SB21-169 / Regulation 10-1-1, an insurer's AI is permissible only where differential outcomes have a "legitimate actuarial basis." The NAIC Model AI Bulletin (adopted by 20+ states) demands a governed, risk-commensurate AI program. In insurance, actuarial reasoning is already the statutory test for acceptable AI. This framework formalizes that test into measurable metrics — and shows the same metrics generalize beyond insurance.
See the regulatory anchor and prior-art positioning in the paper →
| Actuarial method | The governance question it answers for an AI system |
|---|---|
| Frequency × Severity | Not one accuracy number, but how often the model errs × how badly each error hurts — a full loss distribution. |
| Credibility (Bühlmann) | How much weight a given AI output deserves vs. the base rate, given how much relevant experience backs it. |
| IBNR / reserving | "Incurred but not reported" model failures — errors baked in but not yet surfaced — and the reserve to hold. |
| Actual-to-Expected study | Cohort-level, statistically rigorous drift / miscalibration monitoring vs. one-time validation. |
| VaR / TVaR (CTE) | Sizing catastrophic tail failure — exactly what "95% confidence" discards in its 5% tail. |
| Economic capital | The buffer to hold against AI tail risk (TVaR − expected loss). |
| Control cycle + signed opinion | A credentialed accountability layer: a "Statement of Actuarial AI Opinion" attesting to model risk. |
The framework needs only four things from a system: a stream of decisions, a definition of an error, a dollar severity, and cohorts to watch. Any AI or agentic system that decides at volume supplies all four — so its errors are an insurable book. The interactive tool measures eight of them, computed live from the same actuarial functions.