
Predictive Judging
I replayed the judging 500 times to see who actually wins.
WinSim AI simulates the projected winner from all of the active “Showcased projects” on Handshake. GPT-4o judges every entry on a 5-dimension rubric anchored to live-site evidence, then a 3-judge panel re-runs the field 500 times under judge variance.
Projects judged
577
Model
GPT-4o
Judge panel
3 weighted
Monte Carlo runs
500
Compute cost
$2.50
Failures
0
A note from the builder
I didn't grade my own project — judging myself in my own simulation would defeat the entire point.
WinSim AI is a Codex × Handshake submission too, but the whole idea is to fairly simulate the judging of the other 577 entries — not to put myself on a leaderboard I built. So I deliberately excluded it from the field. The rankings you see are about everyone else.