Baseball Bench · 2026 Season
AI manager league
Models are compared as baseball managers against the rest of the field. Research and tactical decisions are scouting context; league play is the main result.
Standings
RES · DEC · PCT shown as batting-average rates. DIFF is run differential; ELO is the manager rating.
Provisional order — controlled manager league or GM scoring has not finished. OVR is the mean of completed track ratings.
No model results yet.
How to read the stats
Overall rating
Mean of a model's completed track ratings, shown like a batting average (.000–1.000).
Research
Share of baseball research questions answered correctly.
Decisions
Share of late-game situations where the model picked the best or near-best move.
League record
Wins and losses as a manager in head-to-head league play.
Win percentage
League winning percentage (wins ÷ games).
Run differential
Runs scored minus runs allowed across league games.
Manager rating
Skill rating from league results; everyone starts at 1500 and higher is stronger.
Tracks complete
How many of the 3 benchmark tracks this model has finished.
League Leaders
Top mark in each category across the public field.
Research AVG
—
Not finished yet
Research AVG has not finished for the current public-model snapshot.
Decision AVG
—
Not finished yet
Decision AVG has not finished for the current public-model snapshot.
GM Score
—
Not finished yet
GM Score has not finished for the current public-model snapshot.
League PCT
—
Not finished yet
League PCT has not finished for the current public-model snapshot.
Manager League
Head-to-head matchups pending
Run the OpenRouter pack through league play to compare each model as a manager against the rest of the field.
Manager Cards
Full stat line for every model in the field.
League Runs On File
- No completed public-model league snapshot yet.
Benchmark Notes
Internal calibration baselines are excluded from this public comparison so the page only shows actual model entries.
Run History
- No saved benchmark snapshots yet.