Calibration · scored in public

Are we calibrated?

A reliability diagram: the x-axis is what we predicted; the y-axis is how often that actually happened. A perfectly calibrated forecaster sits on the dashed diagonal — when we say 70%, it comes true 70% of the time. This is the chart that proves the calibration claim — or honestly shows it can't be proven yet. Only a handful of markets have resolved, so right now the points are few and their error bars span almost the whole axis. That emptiness is the point: a full, confident-looking curve drawn from one resolved market would be exactly the over-claim this project exists to refute. It fills in honestly as markets resolve.

How to read it

perfect calibration (y = x)

a resolved bin: predicted vs realized

95% confidence range (Wilson)

bigger dot = more markets in the bin

reading the record…

Each cyan dot is one probability bin (deciles). Its height is the realized rate; the vertical bar is the 95% interval — at one resolved market it nearly fills the axis, which is honest, not broken. Points sit on the diagonal when we're calibrated. We'd need ~30 resolved markets to read a trend, ~200 to trust it.

How to read it

we landed closer to reality

the market did

we agreed with the market (y = x)

bigger dot = bigger disagreement

reading the record…

Each dot is one settled call. Distance from the line is how far we broke from the crowd; color is whether that break paid off. A green cloud off the diagonal is a real, repeatable edge — not a lucky streak.

Are we calibrated?

How to read it

And vs. the market

How to read it