Live Leaderboard
577 Codex × Handshake projects · GPT-4o judge · 500 Monte Carlo runs · seed 1337
How the Simulation Works
Three steps.
Score the field, add variability, replay the contest.
AI judge scores each project
GPT-4o reads the writeup plus a live-site review (HTTP status, page title, meta description, error/login flags) and produces calibrated 1–100 scores across the 5 rubric dimensions.
Three-judge panel adds variability
A balanced, technical, and UX/novelty judge each apply slightly different rubric weights and ±2-point noise, mirroring how real judges disagree at the margins.
500 deterministic runs
The panel re-judges the field 500 times under fixed seed. Win probability is the share of runs where each project finishes first.
Formula