The AI Risk Lens

Any agentic system, measured the actuarial way

Same actuarial lens, different book of risk.

Where VaR / TVaR start measuring the tail.
Bigger k ⇒ more evidence needed before a cohort is "believed."
Set to 0% to see A/E ≈ 1.
Common-cause failure clustering. Drives diversified capital.
Error frequency
Mean severity | error
average $ damage per error
Expected loss / decision
Tail loss TVaRα
VaRα = · the tail "95% confidence" hides
1 · Frequency × Severity loss distribution
replaces the single accuracy number

2 · Actual-to-Expected drift study
replaces one-time validation

3 · Credibility of a cohort's signal
how much trust an output earns
Z = —

What this means: Z = n/(n+k) is the weight a cohort's own experience earns. Rather than leave k a free dial, is estimated from the cohorts' own variance components (empirical Bühlmann-Straub) — click “Use k̂” to snap the slider to the data-driven value.

4 · IBNR reserve & economic capital
replaces "no reserve, no capital"

Aggregate capital: economic capital = TVaR − expected loss read from a seeded Monte-Carlo of the whole book (mixed-Poisson compound with a shared systemic factor). The ρ slider sets common-cause clustering: ρ = 0 is near-independent (maximal diversification); raising ρ fattens the aggregate tail toward the undiversified bound (per-decision TVaR × N). See paper §5.6.

5 · Stochastic reserving — development triangle
chain-ladder · Bornhuetter-Ferguson · bootstrap

Cumulative reported errors by accident period (rows) × development lag (columns). The shaded diagonal is “reported to date”; dotted cells (·) are the future the triangle projects.

What this means: Instead of one loss-development factor, a real triangle projects the ultimate by chain-ladder, cross-checks it with Bornhuetter-Ferguson (using the model's expected error count as the a-priori — more stable when the latest periods are immature, where chain-ladder can over-project), and an over-dispersed-Poisson bootstrap gives a full reserve distribution — so the risk margin is a defensible 75th-percentile (IFRS-17 style), not the ad-hoc tail load shown in panel 4. The reserve is severity-aware: each IBNR claim's cost is drawn from the empirical body spliced with a generalized-Pareto (GPD) tail fitted peaks-over-threshold, so the tail index ξ̂ and severity uncertainty flow into the percentiles. See paper §5.5.

6 · Ground-truth censoring & reject inference
corrects selection bias

What this means: The true outcome is only observed for some decisions — a confidently-denied case is rarely appealed, a declined applicant never gets to default — and that censoring is informative (it rises with the model's score, which tracks error), so the naive observed rate is biased low. Reject inference fits P(error | score) on the observed decisions and imputes the censored ones, recovering an estimate near the truth. It works because censoring is (mostly) explained by the score; whatever remains purely unobserved (MNAR) is the residual gap — see paper §9.