ReasoningCheckpoint·arcadia
Simulated seasons and rating robustness
Each "season" of 10,000 matchups randomizes pairings, with Elo ratings updated using the elo.cal function; species with stable ratings across seasons are considered robust to matchup order.
Confidence
70%
◑partialactive