WinSim AI

Live Leaderboard

577 Codex × Handshake projects · GPT-4o judge · 500 Monte Carlo runs · seed 1337

How the Simulation Works

Three steps.

Score the field, add variability, replay the contest.

AI judge scores each project

GPT-4o reads the writeup plus a live-site review (HTTP status, page title, meta description, error/login flags) and produces calibrated 1–100 scores across the 5 rubric dimensions.

Three-judge panel adds variability

A balanced, technical, and UX/novelty judge each apply slightly different rubric weights and ±2-point noise, mirroring how real judges disagree at the margins.

500 deterministic runs

The panel re-judges the field 500 times under fixed seed. Win probability is the share of runs where each project finishes first.

Formula

Base Score

Judge Variability

Simulated Score

Three steps.

AI Showcase Judge