The AI Risk Lens

Any agentic system, measured the actuarial way

Choose an AI / agentic system to measure

— Same actuarial lens, different book of risk.

—

Tail level α0.95

Where VaR / TVaR start measuring the tail.

Credibility constant k2,500

Bigger k ⇒ more evidence needed before a cohort is "believed."

Production drift35%

Set to 0% to see A/E ≈ 1.

Systemic correlation ρ0.15

Common-cause failure clustering. Drives diversified capital.

Credibility cohort

Error frequency

—

Mean severity | error

—

average $ damage per error

Expected loss / decision

—

Tail loss TVaR_α

—

VaR_α = — · the tail "95% confidence" hides

1 · Frequency × Severity loss distribution

replaces the single accuracy number

2 · Actual-to-Expected drift study

replaces one-time validation

3 · Credibility of a cohort's signal

how much trust an output earns

Z = —

—

What this means: Z = n/(n+k) is the weight a cohort's own experience earns. Rather than leave k a free dial, k̂ is estimated from the cohorts' own variance components (empirical Bühlmann-Straub) — click “Use k̂” to snap the slider to the data-driven value.

4 · IBNR reserve & economic capital

replaces "no reserve, no capital"

Aggregate capital: economic capital = TVaR − expected loss read from a seeded Monte-Carlo of the whole book (mixed-Poisson compound with a shared systemic factor). The ρ slider sets common-cause clustering: ρ = 0 is near-independent (maximal diversification); raising ρ fattens the aggregate tail toward the undiversified bound (per-decision TVaR × N). See paper §5.6.

5 · Stochastic reserving — development triangle

chain-ladder · Bornhuetter-Ferguson · bootstrap

Cumulative reported errors by accident period (rows) × development lag (columns). The shaded diagonal is “reported to date”; dotted cells (·) are the future the triangle projects.

What this means: Instead of one loss-development factor, a real triangle projects the ultimate by chain-ladder, cross-checks it with Bornhuetter-Ferguson (using the model's expected error count as the a-priori — more stable when the latest periods are immature, where chain-ladder can over-project), and an over-dispersed-Poisson bootstrap gives a full reserve distribution — so the risk margin is a defensible 75th-percentile (IFRS-17 style), not the ad-hoc tail load shown in panel 4. The reserve is severity-aware: each IBNR claim's cost is drawn from the empirical body spliced with a generalized-Pareto (GPD) tail fitted peaks-over-threshold, so the tail index ξ̂ and severity uncertainty flow into the percentiles. See paper §5.5.

6 · Ground-truth censoring & reject inference

corrects selection bias

What this means: The true outcome is only observed for some decisions — a confidently-denied case is rarely appealed, a declined applicant never gets to default — and that censoring is informative (it rises with the model's score, which tracks error), so the naive observed rate is biased low. Reject inference fits P(error | score) on the observed decisions and imputes the censored ones, recovering an estimate near the truth. It works because censoring is (mostly) explained by the score; whatever remains purely unobserved (MNAR) is the residual gap — see paper §9.

🔍

Reproducibility & invariants. The portfolio is generated from a fixed seed, so these numbers are byte-for-byte reproducible and match the paper. By construction the tool always satisfies TVaR ≥ VaR, reserve ≥ 0, Z increasing in n, and A/E ≈ 1.0 when drift = 0. Verify them yourself: drag Production drift to 0% and watch every cohort collapse onto the A/E = 1.0 line.