A reliability diagram: the x-axis is what we predicted; the y-axis is how often that actually happened. A perfectly calibrated forecaster sits on the dashed diagonal — when we say 70%, it comes true 70% of the time. This is the chart that proves the calibration claim — or honestly shows it can't be proven yet. Only a handful of markets have resolved, so right now the points are few and their error bars span almost the whole axis. That emptiness is the point: a full, confident-looking curve drawn from one resolved market would be exactly the over-claim this project exists to refute. It fills in honestly as markets resolve.
Each cyan dot is one probability bin (deciles). Its height is the realized rate; the vertical bar is the 95% interval — at one resolved market it nearly fills the axis, which is honest, not broken. Points sit on the diagonal when we're calibrated. We'd need ~30 resolved markets to read a trend, ~200 to trust it.
A different cut: every decisive call as our probability against what the crowd priced at the same moment — colored by who landed closer to reality when it resolved.
Each dot is one settled call. Distance from the line is how far we broke from the crowd; color is whether that break paid off. A green cloud off the diagonal is a real, repeatable edge — not a lucky streak.