Calibration · scored in public

Are we calibrated?

A reliability diagram: the x-axis is what we predicted; the y-axis is how often that actually happened. A perfectly calibrated forecaster sits on the dashed diagonal — when we say 70%, it comes true 70% of the time. This is the chart that proves the calibration claim — or honestly shows it can't be proven yet. Only a handful of markets have resolved, so right now the points are few and their error bars span almost the whole axis. That emptiness is the point: a full, confident-looking curve drawn from one resolved market would be exactly the over-claim this project exists to refute. It fills in honestly as markets resolve.

How to read it

perfect calibration (y = x)
a resolved bin: predicted vs realized
95% confidence range (Wilson)
bigger dot = more markets in the bin
reading the record…

Each cyan dot is one probability bin (deciles). Its height is the realized rate; the vertical bar is the 95% interval — at one resolved market it nearly fills the axis, which is honest, not broken. Points sit on the diagonal when we're calibrated. We'd need ~30 resolved markets to read a trend, ~200 to trust it.

And vs. the market

A different cut: every decisive call as our probability against what the crowd priced at the same moment — colored by who landed closer to reality when it resolved.

How to read it

we landed closer to reality
the market did
we agreed with the market (y = x)
bigger dot = bigger disagreement
reading the record…

Each dot is one settled call. Distance from the line is how far we broke from the crowd; color is whether that break paid off. A green cloud off the diagonal is a real, repeatable edge — not a lucky streak.

More views — World Pulse · The Living Graph · Explore all →